April 7: Support for Discord feeds, Cohere reranking, section-aware chunking and retrieval

New Features

  • Added support for section-aware text chunking and retrieval. Now, when using section-aware document preparation, such as Azure AI Document Intelligence, Graphlit will store the extracted text according to the semantic chunks (i.e. sections). The text for each section will be individually chunked and embedded into the vector index.

  • Added support for retrievalStrategy in Specification type. Graphlit now supports CHUNK, SECTION and CONTENT retrieval strategies. Chunk retrieval will use the search hit chunk, section retrieval will expand the search hit chunk to the containing section (or page, if not using section-aware preparation). Content retrieval will expand the search hit chunk to the text of the entire document.

  • Added support for rerankingStrategy in Specification type. You can now configure the reranking of content sources, using the Cohere reranking model, by assigning serviceType to COHERE. More reranking models are planned for the future.

  • Added isSynchronous flag to content ingestion mutations, such as ingestUri, so the mutation will wait for the content to complete the ingestion workflow (or error) before returning. This is useful for utilizing the API in a Jupyter notebook or Streamlit application, in a synchronous manner without polling.

  • Added includeAttachments flag to SlackFeedProperties. When enabled, Graphlit will automatically ingest any attachments within Slack messages.

Bugs Fixed

  • GPLA-2469: Failed to ingest PDF hosted on GitHub

  • GPLA-2390: Claude 3 Haiku not adhering to JSON schema

  • GPLA-2474: Prompt rewriting should ignore formatting instructions in prompt

  • GPLA-2462: Missing line break after table rows

  • GPLA-2417: Not extracting images from PPTX correctly

Last updated