August 11: Support for Azure AI Document Intelligence by default, language-aware summaries
Last updated
Last updated
Added support for language-aware summaries when using LLM-based document extraction. Now the summaries for tables and sections generated by the LLM will follow the language of the source text.
Added support for language-aware entity descriptions with using LLM-based entity extraction. Now the entity descriptions generated by the LLM will follow the language of the source text.
We have changed the default document preparation method to use Azure AI Document Intelligence, rather than our built-in document parsers. We have found that the fidelity of Azure AI is considerably better for complex PDFs, and provides better support for table extraction, so we have made this the default. Note: this does come with increased credit usage per-page, for PDF, DOCX and PPTX documents, but the quality of the extracted documents are noticeably higher for use in RAG pipelines.
GPLA-3070: Not getting slide count assigned to metadata for PPTX files.