# August 3: New data model for Observations, new Category entity

### New Features

* :bulb: Revised data model for **Observations**, **Occurrences** and observables (i.e. Person, Organization).  Now after entity extraction, content will have one Observation for each observed entity, and a list of occurrences.  Occurrence now supports text, time and image occurrence types.  (Text: page index, time: start/end timestamp, image: bounding box)  Observations now have ObservableType and Observable fields, which specify the observed entity type and entity reference.
* :bulb: Added **Category** entity to GraphQL data model, which supports [PII](https://en.wikipedia.org/wiki/Personal_data) categories such as *Phone Number* or *Credit Card Number*.
* Added `probability` field to model properties, for the LLM's token probability.  (See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create#chat/create-top_p) for more detail.)
* Added `error` field to feeds.  If a feed fails to read from the data source, and is marked as `ERRORED` state, the `error` field will have the error description.
* Support reingestion of changed files from feeds.  For feeds, such as SharePoint or Web, where we can recognize that a file or page was updated, we will now reingest the content in-place.  Content will keep the same ID, and will restart the content workflow by re-downloading the updated content from the data source.   Existing observations will be deleted, and new observations will be created from the updated content.
* :information\_source: Ingestion of content is now idempotent, meaning if you ingest content again from the same URI, we will reingest the content in-place, while keeping the same ID.  (If we can recognize the content has not changed, such as by ETag, we will return the existing content object.)
* :information\_source: Changed GraphQL data type of SharePoint `tenantId`, `libraryId` and `siteId` to ID rather than String.
* :sparkles: Performance optimization of entity extraction, and the creation of observations.

### Bugs Fixed

* GPLA-1130: Only was extracting text from first column of PDF tables.
* GPLA-1140: Text from DOCX tables was not extracted properly.
* GPLA-1154: Audio content ingested from RSS feed was not deleted when feed was deleted.
