KB Extraction Tools

DRUID provides you with two extraction tools that operate as content extraction tools during conversations, particularly valuable for document analysis using Large Language Models (LLMs).

The KB extraction tools during conversations use the content extraction functionality found in the DRUID KB Engine but operate in isolation from the KB processing pipeline, focused on content extraction without introducing changes to the underlying KB infrastructure.

Extract content from documents

Extract content from a document (structured or unstructured) using the internal action KBExtractDocumentContent. This action accepts a document from a [[File]] entity field and provides output in the following format:

For unstructured documents: content is returned in [[KBOperation]].ExtractedDocument[i].Content.
For structured documents: content is divided into [[KBOperation]].ExtractedDocument[i].Question and [[KBOperation]].ExtractedDocument[i].Answer.

Parameter	Description
SourceFile	The file property of the [[Entity]] from which the document content is extracted. This parameter is mandatory. Note: The internal action imposes a source file size limit of 5 MB. Extracted content is limited to 1 MB. If the document exceeds 5 MB or the extracted content surpasses 1 MB, the internal action returns an error and does not proceed with the extraction.
DocumentType	Indicates if the document is structured or not. Default value is "unstructured". This parameter is optional.

Copy

Syntax

{
  "SourceFile": "[[Entity]].FileProperty",
  "DocumentType": "structured" //optional, default value = unstructured
}

Copy

Example

{
  "SourceFile": "[[Employee]].Resume",
  "DocumentType": "structured" 
}

Extract content from websites

Crawl a specific website and extract paragraphs (unstructured data) using the internal action KBExtractUrlContent. internal action crawls a specified website and extracts unstructured content as paragraphs. The extracted content is returned as a collection of strings in [[KBOperation]].ExtractedDocument[i].Question and [[KBOperation]].ExtractedDocument[i].Answer.

Copy

Syntax

{
  "Url": "",
  "CrawlHttpRequests": "true|false"
}

Parameter	Description
Url	The URL of the website starting with 'https://'. E.g. https://druidai.com. Note: DRUID currently supports content extraction from data sources in English; therefore, make sure to provide the URL of a website that contains content written in English
CrawlHttpRequests	Set to 'true' if the website is static HTML site; otherwise, set to 'false'.