Network Shared Drive Data Sources

A Network Shared Drive data source enables you to enrich your KB with internal documents, policies, or FAQs, improving the accuracy and comprehensiveness of your bot's responses. By incorporating Network Shared Drive data sources, you will extract paragraphs from Excel, Word, and PDF files stored on your network.

Note:  This KB data source is available in DRUID version 7.7 onwards.

DRUID supports adding and extracting data from the following storage types:

  • Local File: Extract data from files located on the same machine as the KB Agent.
  • Local Share: Access and process information from files stored on a shared drive within your network, accessible to the KB Agent.
  • FTP: Integrate data from an FTP server using TLS implicit encryption (for cloud deployments only).
  • SFTP: Integrate data from an SFTP server using TLS implicit encryption (for cloud deployments only).

Note:  While all three storage options are available for DRUID hybrid (KB Agent installed on premises) and full on-premise deployments, FTP and SFTP storage stand as the sole option available for KB in the cloud.

Adding a Network shared drive data source

This section will guide you through the process of adding a shared drive data source:

Step 1: Create the data source

Follow these instructions to create a data source based on your storage type.

Step 2. Crawl the data source

On the Knowledge base page click the edit icon to edit the data source. The data source configuration page appears by default on the Extracted Paragraphs tab. Upon reaching the configuration page, you'll notice that the content of the root reflects the file structure from the Uri link you provided during data source creation. By default, all folders and files are excluded from scraping. To include files / folders for scrapping, click the three dots displayed at the right-side of the item and click Include.

Click the Start crawling button (). The Start Crawling Parameters page appears.

Define the crawling policy by setting the parameters described in the table below.

Parameter Description
URL Automatically populated with the Uri (or the Host for FTP storage) you specified when adding the data source.
Depth

The number of directory levels the crawler will explore from the URL.

Note:  To improve crawling efficiency, crawl each node individually instead of the entire root, especially if the storage has a deep structure. Set the depth to '0' to achieve this.

After you define the crawling policy, click Start.

Hint:  Based on the crawling policy set, it might take up to a few minutes for the crawling to complete. You might want to refresh from time to time to see when the action has completed.

By default all nodes are excluded from scrapping. To crawl specific nodes, click the dots next to the desired node in the file repository explorer and select Crawl Path.

When the crawling completes, the extracted articles display under the Extracted Paragraphs tab.

Note:  Starting with DRUID 7.15, the platform extracts data from Excel files with table headers that include spaces (e.g., "Question " and "Answer "). The platform automatically trims these spaces to ensure accurate data extraction. Additionally, it extracts data from Excel documents with multiple sheets, capturing the sheet name in the "sheetName" property for each extracted article.

Step 3. Train the data source

To ensure the KB Engine searches through the data source paragraphs, train your data source by clicking the Train button at the top-left corner of the data source.

Testing the data source performance

Testing the performance of a data source is important because it ensures that the extracted paragraphs are relevant. This process helps identify and rectify any issues, improving the overall quality and effectiveness of your bot's responses. By validating the data source performance, you can enhance user satisfaction.

To test the performance of the data source, on the Extracted Paragraphs page, in the User Says area, enter a question and select the language. All matched paragraphs will be displayed along with their scores.

You can improve the performance of the data source by reviewing and editing the paragraphs based on your needs.

Editing paragraphs

To ensure your Knowledge base high quality, we recommend you to review the extracted paragraphs and take the proper actions to improve them: open the URL from where the crawler extracted the paragraph and compare the content, edit or delete the paragraph. Refine your paragraphs by transforming unstructured data into a question-and-answer format.

To edit a paragraph, click the Action icon displayed inline with the paragraph and click Edit. Edit the paragraph Title and / or Content and save the changes.

Important!   After making updates to your paragraphs, it's crucial to retrain your data source. This ensures the KB Engine recognizes these updates and provides accurate responses to user queries. Click the Train button at the top-left corner of the data source.

Fine-tuning Predictions

You can configure Advanced Settings at both the data source and node/leaf levels to achieve more precise predictions. This approach offers granular control, allowing you to adjust the extractors and trainable elements, resulting in better accuracy and performance. Unlike KB-level settings, which apply changes broadly, this targeted method adapts configurations to the unique needs of each data source or element, streamlining your authoring process.

Fine-tuning at the data source level

  1. Navigate to the desired data source.
  2. Select the Advanced Settings tab.
  3. Modify advanced parameters as needed and save the settings.

Fine-tuning at the node or leaf level

  1. In the tree explorer, select the desired node or leaf.
  2. On the right side, select the Advanced Settings tab.
  3. Modify advanced parameters as needed and save the settings.

Reset advanced settings

Note:  This feature is available in DRUID version 7.16 onwards.

To reset advanced configurations at the data source and node/leaf levels to match the KB Advanced settings, go to Knowledge Base > Advanced Settings and click the Save to All button. This action streamlines your settings management by applying consistent KB Advanced settings across your entire configuration with just one click.

Enhance KB prediction

Refine your articles by transforming unstructured data into a question-and-answer format. Edit articles and add question / title / short description.

Access the Knowledge Base Advanced Settings, set the "trainableColumns" parameter to "Question,Answer", then train the Knowledge Base. The KB Engine will leverage both questions and answers from unstructured data sources during the prediction process, ultimately leading to improved prediction accuracy.

Note:   For new bots created in DRUID 7.10 onwards, the engine will predict against both the question and answer by default. For existing bots, the engine will only predict against the answer ("trainableColumns": null) until you update the setting.