February 21: Support for OneDrive and Google Drive feeds, extract images from PDFs, bug fixes

New Features

💡 Graphlit now supports OneDrive and Google Drive feeds. Files can be ingested from OneDrive or Google Drive, including shared drives where the authenticated user has access. Both OneDrive and Google Drive support the reading of existing files, and tracking new files added to storage with recurrent feeds.
💡 Graphlit now supports email backup files, such as EML or MSG, which will be assigned the EMAIL file type. During email file preparation, we will automatically extract and ingest any file attachments.
💡 Graphlit now automatically extracts embedded images in PDF files, ingests them as content objects, and links them as children of the parent PDF.
💡 Graphlit now supports recursive Notion feeds. When the isRecursive flag is true in the Notion feed properties, we will crawl child pages and databases, and recursively ingest them in addition to the specified pages and databases.
Added support for assigning collections to content ingested with the ingestPage, ingestFile or ingestText mutations. This saves a step where the content will automatically be added to the collection(s) without requiring another mutation call.
Added support for the CODE file type for a wide variety of source code formats, i.e. Python .py, Javascript .js. Code files use optimized text splitting for enhanced search and retrieval.
Added support for customGuidance in Specification object, which can be used for injecting a guidance prompt during the RAG process. For example, you can instruct the LLM to return a default response string if no content sources are found via semantic search.
Added tenants field to Project object, which returns a list of all tenant IDs which have been used to create an entity in Graphlit.
Added email metadata, separate from document metadata. Now emails will contain indexed metadata such as to, from, or subject.
⚡ The contents field for content objects has been replaced with children and parent fields. For example, when a ZIP file is unpacked, the unpacked files will be added as children of the ZIP file, and the ZIP file will be the parent of each of the unpacked files.
⚡ Removed enableImageAnalysis field from image preparation properties in workflow object. Now is enabled by default.
⚡ Moved disableSmartCapture field to preparation workflow stage from page preparation properties. This is used to disable the use of headless Chrome browser to capture HTML from web pages. It is enabled by default, and if disabled, Graphlit will simply download the HTML from the web page rather than rendering on headless Chrome browser.

Bugs Fixed

GPLA-2099: Failed to ingest ArXiV PDF. Fixed PDF parsing error.
GPLA-2174: LLM response is incorrect with conversation history, but no content sources.
GPLA-2199: ZIP package left in Indexed state after content workflow.

PreviousMarch 10: Support for Claude 3, Mistral and Groq models, usage/credits telemetry, bug fixes NextFebruary 2: Support for Semantic Alerts, OpenAI 0125 models, performance enhancements, bug fixes

Last updated 1 year ago