Links

July 15: Support for SharePoint feeds, new Conversation features

New Features

  • 💡
    Added support for SharePoint feeds: now can create feed to ingest files from SharePoint document library (and optionally, folder within document library)
  • 💡
    Added support for PII detection during entity extraction from text documents and audio transcripts: now we will create labels such as PII: Social Security Number automatically when PII is detected
  • 💡
    Added support for developer's own OpenAI API keys and Azure OpenAI deployments in Specifications
  • ℹ️
    Changed semantics of deleteFeed to delete the contents ingested by the feed; since contents are linked to feeds, now feeds can be disabled, while keeping the lineage to the feed, and if feeds are deleted, they will delete the linked contents, so we never lose the feed-to-content lineage
  • Added GraphQL query for SharePoint consent URI, for registered Graphlit Platform Azure AD application
  • Better handling of web sitemap indexes: now if a sitemap.xml contains a sitemapindex element, we will load all linked sitemaps for evaluating web pages to ingest from Web feed
  • Added new GraphQL mutations for openConversation, closeConversation and undoConversation
  • Added timestamps to Conversation messages
  • Added new GraphQL mutations for openCollection and closeCollection
  • Added more configuration for content search: now can specify searchType (KEYWORD, VECTOR, HYBRID) and queryType (SIMPLE, FULL - aka Lucene syntax)
  • Better parsing of iTunes podcast metadata
  • Renamed listingLimit field on feeds to readLimit
  • Renamed topK to numberSimilar for content vector search type
  • Changed GraphQL feed properties: split out azure into azureBlob and azureFile properties
  • Changed GraphQL specification properties: split out openAI into openAI and azureOpenAI properties
  • Removed count fields on query results, and replaced with explicit count{Entity} queries, which support search and filtering.

Bugs Fixed

  • GPLA-1043: Reddit readLimit not taking effect: now the specified limit of Reddit posts will be leveraged for Reddit feeds
  • GPLA-1064: Performance on entity extraction and observation creation for large PDFs was under expectations: now able to build knowledge graph from large PDFs much faster (4x speed improvement)
  • GPLA-1053: If rendition generation errored during content workflow, the content was not properly marked as errored
  • GPLA-1102: Large Web sitemaps were slow to load; rewrote sitemap index handling, and now can process sitemaps with 150K+ entries in seconds.