KB Extraction Tools
DRUID provides you with two extraction tools that operate as content extraction tools during conversations, particularly valuable for document analysis using Large Language Models (LLMs).
The KB extraction tools during conversations use the content extraction functionality found in the DRUID KB Engine but operate in isolation from the KB processing pipeline, focused on content extraction without introducing changes to the underlying KB infrastructure.
Extract content from documents
Extract content from a document (structured or unstructured) using the internal action KBExtractDocumentContent. This action accepts a document from a [[File]] entity field and provides output in the following format:
- For unstructured documents: content is returned in [[KBOperation]].ExtractedDocument[i].Content.
- For structured documents: content is divided into [[KBOperation]].ExtractedDocument[i].Question and [[KBOperation]].ExtractedDocument[i].Answer.
Parameter | Description |
---|---|
SourceFile |
The file property of the [[Entity]] from which the document content is extracted. This parameter is mandatory. Note:
If the document exceeds 5 MB or the extracted content surpasses 1 MB, the internal action returns an error and does not proceed with the extraction. |
DocumentType | Indicates if the document is structured or not. Default value is "unstructured". This parameter is optional. |
{
"SourceFile": "[[Entity]].FileProperty",
"DocumentType": "structured" //optional, default value = unstructured
}
Extract content from websites
Crawl a specific website and extract paragraphs (unstructured data) using the internal action KBExtractUrlContent. internal action crawls a specified website and extracts unstructured content as paragraphs. The extracted content is returned as a collection of strings in [[KBOperation]].ExtractedDocument[i].Question and [[KBOperation]].ExtractedDocument[i].Answer.
Parameter | Description |
---|---|
Url |
The URL of the website starting with 'https://'. E.g. https://druidai.com. Note: DRUID currently supports content extraction from data sources in English; therefore, make sure to provide the URL of a website that contains content written in English
|
CrawlHttpRequests | Set to 'true' if the website is static HTML site; otherwise, set to 'false'. |