August 3: New data model for Observations, new Category entity

New Features

  • 💡 Revised data model for Observations, Occurrences and observables (i.e. Person, Organization). Now after entity extraction, content will have one Observation for each observed entity, and a list of occurrences. Occurrence now supports text, time and image occurrence types. (Text: page index, time: start/end timestamp, image: bounding box) Observations now have ObservableType and Observable fields, which specify the observed entity type and entity reference.

  • 💡 Added Category entity to GraphQL data model, which supports PII categories such as Phone Number or Credit Card Number.

  • Added probability field to model properties, for the LLM's token probability. (See OpenAI documentation for more detail.)

  • Added error field to feeds. If a feed fails to read from the data source, and is marked as ERRORED state, the error field will have the error description.

  • Support reingestion of changed files from feeds. For feeds, such as SharePoint or Web, where we can recognize that a file or page was updated, we will now reingest the content in-place. Content will keep the same ID, and will restart the content workflow by re-downloading the updated content from the data source. Existing observations will be deleted, and new observations will be created from the updated content.

  • ℹī¸ Ingestion of content is now idempotent, meaning if you ingest content again from the same URI, we will reingest the content in-place, while keeping the same ID. (If we can recognize the content has not changed, such as by ETag, we will return the existing content object.)

  • ℹī¸ Changed GraphQL data type of SharePoint tenantId, libraryId and siteId to ID rather than String.

  • ✨ Performance optimization of entity extraction, and the creation of observations.

Bugs Fixed

  • GPLA-1130: Only was extracting text from first column of PDF tables.

  • GPLA-1140: Text from DOCX tables was not extracted properly.

  • GPLA-1154: Audio content ingested from RSS feed was not deleted when feed was deleted.

