August 11: Support for Azure AI Document Intelligence by default, language-aware summaries

New Features

  • Added support for language-aware summaries when using LLM-based document extraction. Now the summaries for tables and sections generated by the LLM will follow the language of the source text.

  • Added support for language-aware entity descriptions with using LLM-based entity extraction. Now the entity descriptions generated by the LLM will follow the language of the source text.

  • We have changed the default document preparation method to use Azure AI Document Intelligence, rather than our built-in document parsers. We have found that the fidelity of Azure AI is considerably better for complex PDFs, and provides better support for table extraction, so we have made this the default. Note: this does come with increased credit usage per-page, for PDF, DOCX and PPTX documents, but the quality of the extracted documents are noticeably higher for use in RAG pipelines.

Bugs Fixed

  • GPLA-3070: Not getting slide count assigned to metadata for PPTX files.

Last updated