Knowledge Base

The Knowledge Base is a powerful tool that enhances your virtual assistant’s ability to provide accurate and relevant responses by serving as a centralized repository of information.

With the Knowledge Base, you can compile and manage a collection of text-based articles, enabling your chatbot to deliver the best possible answers when a user’s intent isn’t covered in predefined conversation flows.

To create a comprehensive Knowledge Base, you can integrate multiple data sources, including structured Excel and PDF files, file repositories, websites, SharePoint documents, and network shared drives. DRUID processes these sources to extract relevant content, organizing it into Q&A pairs or articles that your chatbot can use to improve user interactions.

Important! The Knowledge Base engine has undergone a significant upgrade to the DRUID version 7.0, bringing you enhanced performance across all bots. During this upgrade, your Knowledge Base content seamlessly migrated to the new infrastructure. To have an operational Knowledge Base, it's essential to manually retrain all your bots before making any predictions in the Knowledge Base.

In this guide, you’ll learn how to access the Knowledge Base, how to add and manage data sources, and how to effectively integrate your chatbot with the Knowledge Base to improve its ability to handle user queries.

Accessing the Knowledge Base

To access the Knowledge Base, select the desired bot and solution and from the NLU menu, click Knowledge Base.

When you access the Knowledge Base for the first time, the page is empty. To create your bot knowledge base, add as many data sources as you want, extract the data and train the KB.

Adding data sources

DRUID extracts text articles / paragraphs from the following data sources:

Structured data sources (structured Excel and PDF files)
File repository (Word, Excel and PDF, both structured and unstructured)
Paragraphs from websites
Documents from SharePoint libraries (Word, Excel and PDF files)
Network shared drives.

For instructions on adding different types of data sources, refer to the relevant topic.

Use the Knowledge Base on the bot

By default, if during the conversation, the bot NLP model does not match the user input with any of the existing flows, it will execute the Intent not recognized flow set on your bot (if any).

To provide your chatbot with the capability to search through the Knowledge Base when the user input does not match with any of the existing flows, go to the bot Details page and in the Dialogue management section, tap on Use Knowledge Base.

By default, only the answer corresponding to the article with the higher probability will be shown to the user.

Hint: You can set the chatbot to predict only within the Knowledge Base instead of using the NLP model (that is, all flows and QnAs) by setting the NLP Provider to None (bot Details page, Conversational AI section, Provider field).

Note: DRUID offers a comprehensive set of Knowledge Base solution templates to address various scenarios, including the use of generative AI. For more information, explore the Solution Library.

Return top 5 articles matching the user intent

To show top 5 articles matching the user’s question, from the Solutions Library, import solution Knowledge Base Starter. This solution template contains the flow Knowledge-Base-response-flow dedicated to displaying responses to users when the Flow Engine predicts against the Knowledge Base.

Go to the bot Details page and in the Dialogue management section, from the Knowledge Base response flow drop-down, select Knowledge-Base-response-flow.

When the users ask a question and no flow is matched in the bot model, the question will be searched within the Knowledge Base. The answer corresponding to the question with the higher probability will be shown to the user, along with Related topics that contains the first 5 topics with the higher probability, displayed in a card with repeater buttons.

Rephrase user question to provide incremental, contextual KB search

Improve your Knowledge Base search by rephrasing user questions to provide incremental, contextual results. From the Solutions Library, import the solution named "Knowledgebase with GPT V 2_0 - Azure". This solution combines the DRUID Knowledge Base and GPT from Azure to deliver a highly intelligent, human-like conversation experience. It includes two flows dedicated to rephrasing user intent and responses using Azure OpenAI when the Flow Engine predicts against the Knowledge Base.

Go to the bot's Details page and in the Dialogue Management section:

From the Knowledge Base response flow field, select Knowledge-Base-response-flow-refine-question-azure.com.
From the Intent rephrase flow field, select Intent rephrase flow.

Note: To enable the KB Engine to rephrase user intents, select the solution "Knowledgebase with GPT V 2_0 - Azure", click Apps and set the connection strings for the app GPT-azure.com.

Knowledge Base Basic Settings

To access the Knowledge Base settings, in the Knowledge base page, click the Settings button.

The KB General Settings appear.

The table below provides the description of the Knowledge Base general settings.

Setting	Description.
Embeddings Provider	There are three providers available: DRUID (the default provider). DRUID uses a non-generative Large Language Model (LLM) for embeddings. If you select DRUID as embeddings provider, you can choose the Embeddings Model (either English or Multilingual). Open AI. To use OpenAI’s text embeddings, select OpenAI as embeddings provider and enter the password of your OpenAI account. Microsoft Azure. To use the Azure OpenAI embeddings, select Microsoft Azure as embeddings provider and enter the Azure OpenAI embeddings endpoint (https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/embeddings?api-version={api-version}) and the password of your Azure OpenAI account.
Embeddings Model	An embeddings model is a machine learning model that transforms data, such as text or images, into a vector of numbers (an embedding). This vector representation captures the semantic meaning or relationships within the data, allowing for more efficient comparisons, searches, and analysis. Note: This parameter is available in DRUID 8.3 and higher. The following embedding models are available in DRUID: English: Creates embeddings optimized for English text, capturing semantic meaning in English language inputs. English – score correction: Generates English embeddings with additional score correction, improving the accuracy and relevance of similarity scores. HigherEducation.v1 (English): This specialized model is trained on a rich dataset of higher education content in English only, including university-specific information, leading to improved accuracy when dealing with queries related to this domain. MultiAspect: An enhanced version of our existing Multilingual model, MultiAspect offers improved performance across multiple languages and incorporates score correction for more accurate results. Multilingual: Produces embeddings for multiple languages, enabling cross-lingual semantic understanding. Multilingual – score correction: Offers multilingual embeddings with score correction, enhancing the precision of similarity scores across languages. Paraphrase: Specializes in identifying semantically similar phrases, even if they differ in wording, useful for paraphrase detection tasks. Note: The HigherEducation.v1 (technology preview) and MultiAspect embedding models are available in Druid version 8.13 and later. Hint: The Paraphrase embeddings model processes up to 125 tokens per paragraph and is ideal for short sentences, while other models support up to 512 tokens per paragraph.
Set results threshold	Note: This feature is available in DRUID version 7.14 onwards. The Results Threshold settings determine how matching utterances are evaluated against the Knowledge Base (KB). These settings vary depending on whether the bot is new or existing. For new bots, Use Bot NLU Thresholds is enabled by default. The KB uses the NLU thresholds configured on the bot (NLU menu > Configurations > Intents tab > Thresholds and Parameters section). For existing bots, the behavior varies based on the NLU thresholds: Both 'Min Match Score' and 'Target Match Score' are null: 'Use Bot NLU Thresholds' is enabled, and the slider shows the NLU thresholds. The slider is disabled. Either 'Min Match Score' or 'Target Match Score' has a value: 'Use Bot NLU Thresholds' is disabled. The slider becomes editable, letting you adjust the threshold for the parameter with a value. The empty parameter uses the NLU thresholds set on the bot. Both 'Min Match Score' and 'Target Match Score' have values: 'Use Bot NLU Thresholds' is disabled, and you can set values for both parameters using the slider. To control how the KB evaluates and matches user input, disable Use Bot NLU Thresholds and adjust the 'Min Match Score' and 'Target Match Score' using the slider, ensuring that it aligns with your desired performance thresholds. Note: If you enable Use Bot NLU Thresholds, the threshold values set on the slider will be lost.
Search balance	By default, the search within the knowledge base is performed using a mix of the following two algorithms: the keyword (Text) search algorithm and the semantic (Vector) search algorithm. Additionally, you can use the reranker to perform further analysis and enhance the result quality. If you don't use the reranker, the recommend value for the search balance is Vector 80% , Text 20% , meaning that the search will use 80% the semantics search algorithm (the search uses the semantic search algorithm that returns more accurate results) and 20% the text search algorithm (the search uses the keyword /text algorithm, which might return a lot of noise). Move the slider to set the search balance based on your needs. Hint: In DRUID version 7.14 and later, the values you set for the Search balance slider and the Score Calculator strategy in Advanced Settings are synchronized. Any changes made to one will be reflected in the other.
Search inside answers	Tap on if you want the user says to be matched against both the question and answer pairs available in structured data sources. If this option is off, the user says is matched only against the questions available in the structured data sources.
Use Knowledge Base	Tap on to provide your chatbot with the capability to search through the Knowledge Base when the user input does not match with any of the existing flows. From the Knowledge Base response flow drop-down, select Knowledge-Base-response-flow.
Intent rephrase flow	If you want to rephrase / improve user intent by using an external service (e.g., GPT) to a user intent that is optimal for your bot model, select the flow you specifically designed for rephrasing user intent. After the user sends the intent, the bot first executes the "Intent rephrase flow" that rephrases the utterance and then the bot uses the result (the rephrased intent stored in [[Intent]].Text) to predict in the model. Important! The "Intent rephrase flow" is executed only when the user input is sent while the conversation is in idle mode. Once in a flow, the flow no longer executes (the user intent is not altered). Hint: This is particularly useful for Knowledge Base with ChatPT.

Save the settings.

Training or retraining the Knowledge Base

When managing your Knowledge Base, you can choose between Train and Retrain actions, depending on your needs:

Train - This option trains only the data sources that require training (e.g., newly added or updated sources that haven’t been trained yet). It's the default option when you're adding new content or making updates that trigger a training need.
Retrain - Use this option to retrain all already-trained data sources and train once any that require training. This is especially useful after making changes to the Knowledge Base’s general settings (e.g., Embeddings model, Embeddings provider, or Search inside answers).

Note: The Retrain feature is available starting with DRUID version 8.15 and higher and applies only to unstructured data sources.

At the data source level, you can retrain an individual source after modifying its Trainable Elements, even if its training status doesn’t change. You also have the option to Retrain all data sources directly from a data source.

Testing the Knowledge Base performance

Testing the performance of your Knowledge Base is important because it ensures that the Knowledge Base is delivering accurate and relevant responses, helping to identify and address any issues to improve overall performance.

To test the performance of your knowledge base, on the Knowledge Base page, enter a question in the User Says area and select the language. The model will search across all the data sources in the Knowledge Base and list the articles with a matching score higher than 0.5, along with the data source where each article was found. If you have changed the threshold ([[Intent]].KBQnAItems[0].Score) in the solution configuration, only the articles meeting that threshold will be listed.

If you selected a score calculator strategy from the KB settings, for each result, you will see the total matching score and the weights of the algorithms used. You can view the graphical representation of these weights by clicking the Info icon.

Search within the KB

Note: This feature is available in DRUID 8.1 and higher.

You can perform exact match searches across the entire Knowledge base or within a data source. The search results will return all data source elements (node, leaf) and articles that exactly match your specified keywords.

When searching for specific keywords at the KB level, a maximum of 30 matching records (if available) will be displayed under the corresponding data source name.

Starting with release 8.10, you can refine your Knowledge Base search results using filters for data source type, document type, and the option to exclude specific data sources or elements. These filtering capabilities help you perform more precise and efficient searches, giving you greater control over the results.

When searching at the data source level, up to 30 results will be shown if they exist.