August 3: New data model for Observations, new Category entity
Last updated
Last updated
Revised data model for Observations, Occurrences and observables (i.e. Person, Organization). Now after entity extraction, content will have one Observation for each observed entity, and a list of occurrences. Occurrence now supports text, time and image occurrence types. (Text: page index, time: start/end timestamp, image: bounding box) Observations now have ObservableType and Observable fields, which specify the observed entity type and entity reference.
Added Category entity to GraphQL data model, which supports PII categories such as Phone Number or Credit Card Number.
Added probability
field to model properties, for the LLM's token probability. (See OpenAI documentation for more detail.)
Added error
field to feeds. If a feed fails to read from the data source, and is marked as ERRORED
state, the error
field will have the error description.
Support reingestion of changed files from feeds. For feeds, such as SharePoint or Web, where we can recognize that a file or page was updated, we will now reingest the content in-place. Content will keep the same ID, and will restart the content workflow by re-downloading the updated content from the data source. Existing observations will be deleted, and new observations will be created from the updated content.
Ingestion of content is now idempotent, meaning if you ingest content again from the same URI, we will reingest the content in-place, while keeping the same ID. (If we can recognize the content has not changed, such as by ETag, we will return the existing content object.)
Changed GraphQL data type of SharePoint tenantId
, libraryId
and siteId
to ID rather than String.
Performance optimization of entity extraction, and the creation of observations.
GPLA-1130: Only was extracting text from first column of PDF tables.
GPLA-1140: Text from DOCX tables was not extracted properly.
GPLA-1154: Audio content ingested from RSS feed was not deleted when feed was deleted.