> For the complete documentation index, see [llms.txt](https://changelog.graphlit.dev/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://changelog.graphlit.dev/august-2023/august-3-new-data-model-for-observations-new-category-entity.md).

# August 3: New data model for Observations, new Category entity

### New Features

* :bulb: Revised data model for **Observations**, **Occurrences** and observables (i.e. Person, Organization).  Now after entity extraction, content will have one Observation for each observed entity, and a list of occurrences.  Occurrence now supports text, time and image occurrence types.  (Text: page index, time: start/end timestamp, image: bounding box)  Observations now have ObservableType and Observable fields, which specify the observed entity type and entity reference.
* :bulb: Added **Category** entity to GraphQL data model, which supports [PII](https://en.wikipedia.org/wiki/Personal_data) categories such as *Phone Number* or *Credit Card Number*.
* Added `probability` field to model properties, for the LLM's token probability.  (See [OpenAI documentation](https://platform.openai.com/docs/api-reference/chat/create#chat/create-top_p) for more detail.)
* Added `error` field to feeds.  If a feed fails to read from the data source, and is marked as `ERRORED` state, the `error` field will have the error description.
* Support reingestion of changed files from feeds.  For feeds, such as SharePoint or Web, where we can recognize that a file or page was updated, we will now reingest the content in-place.  Content will keep the same ID, and will restart the content workflow by re-downloading the updated content from the data source.   Existing observations will be deleted, and new observations will be created from the updated content.
* :information\_source: Ingestion of content is now idempotent, meaning if you ingest content again from the same URI, we will reingest the content in-place, while keeping the same ID.  (If we can recognize the content has not changed, such as by ETag, we will return the existing content object.)
* :information\_source: Changed GraphQL data type of SharePoint `tenantId`, `libraryId` and `siteId` to ID rather than String.
* :sparkles: Performance optimization of entity extraction, and the creation of observations.

### Bugs Fixed

* GPLA-1130: Only was extracting text from first column of PDF tables.
* GPLA-1140: Text from DOCX tables was not extracted properly.
* GPLA-1154: Audio content ingested from RSS feed was not deleted when feed was deleted.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://changelog.graphlit.dev/august-2023/august-3-new-data-model-for-observations-new-category-entity.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
