Network Shared Drive Data Sources

A Network Shared Drive data source enables you to enrich your KB with internal documents, policies, or FAQs, improving the accuracy and comprehensiveness of your bot's responses. By incorporating Network Shared Drive data sources, you will extract paragraphs from Excel, Word, and PDF files stored on your network.

Note: This KB data source is available in DRUID version 7.7 onwards.

DRUID supports adding and extracting data from the following storage types:

Local File: Extract data from files located on the same machine as the KB Agent.
Local Share: Access and process information from files stored on a shared drive within your network, accessible to the KB Agent.
FTP: Integrate data from an FTP server using TLS implicit encryption (for cloud deployments only).
SFTP: Integrate data from an SFTP server using TLS implicit encryption (for cloud deployments only).

Note: While all three storage options are available for DRUID hybrid (KB Agent installed on premises) and full on-premise deployments, FTP and SFTP storage stand as the sole option available for KB in the cloud.

Adding a Network shared drive data source

This section will guide you through the process of adding a shared drive data source:

Step 1: Create the data source

Follow these instructions to create a data source based on your storage type.

FTP storage

To create data sources from FTP storage via TLS implicit encryption, follow these steps:

Click the Add New button. The Add New Data Source page opens.
In the Name field, provide a name for the data source. This helps you identify and search for the data source easily.
From the Language drop-down, select the language of the data you upload. It must be one of the bot languages.
From the Type drop-down, select Shared drive.
Select FTP as Storage Type.
In the Uri field, enter the relative path to the folder (on the FTP server) you want to crawl.
In the Host field, enter the host name of the FTP server.
Enter the FTP login ID (User name) and the FTP login password (Password).
If the FTP server uses a self-signed certificate or one not issued by a recognized Certificate Authority, select Disable Certificate Validation. Failure to do so will result in unsuccessful data crawling and extraction.
Enter the FTP Port for data transfers.

Optionally, set the Min score threshold and the Target match score for the data source. If not set, the thresholds from the Knowledge Base will apply.
To verify the FTP credentials, click the Test button. If the check fails, check and review the FTP credentials to ensure they are correct. You can also verify the FTP credentials later by going to the Details tab of the data source and clicking the Test button at the bottom of the page.
Click Create. The new data source appears on the Knowledge base page.

SFTP storage

Note: This feature is available in DRUID 8.2 and later. Currently, connecting to the SFTP server is only supported via SFTP credentials (username and password). Public key authentication will be supported in a future release.

To create data sources from SFTP storage via TLS implicit encryption, follow these steps:

Click the Add New button. The Add New Data Source page opens.
In the Name field, provide a name for the data source. This helps you identify and search for the data source easily.
From the Language drop-down, select the language of the data you upload. It must be one of the bot languages.
From the Type drop-down, select Shared drive.
Select SFTP as Storage Type.
In the Uri field, enter the relative path to the folder (on the SFTP server) you want to crawl.
In the Host field, enter the host name of the SFTP server.
Enter the SFTP login ID (User name) and the SFTP login password (Password).
Enter the SFTP Port for data transfers.

Optionally, set the Min score threshold and the Target match score for the data source. If not set, the thresholds from the Knowledge Base will apply.
To verify the SFTP credentials, click the Test button. If the check fails, check and review the SFTP credentials to ensure they are correct. You can also verify the SFTP credentials later by going to the Details tab of the data source and clicking the Test button at the bottom of the page.
Click Create. The new data source appears on the Knowledge base page.

Step 2. Crawl the data source

On the Knowledge base page click the edit icon to edit the data source. The data source configuration page appears by default on the Extracted Paragraphs tab. Upon reaching the configuration page, you'll notice that the content of the root reflects the file structure from the Uri link you provided during data source creation. By default, all folders and files are excluded from scraping. To include files / folders for scrapping, click the three dots displayed at the right-side of the item and click Include.

Click the Start crawling button (). The Start Crawling Parameters page appears.

Define the crawling policy by setting the parameters described in the table below.

Parameter	Description
URL	Automatically populated with the Uri (or the Host for FTP storage) you specified when adding the data source.
Depth	The number of directory levels the crawler will explore from the URL. Note: To improve crawling efficiency, crawl each node individually instead of the entire root, especially if the storage has a deep structure. Set the depth to '0' to achieve this.

After you define the crawling policy, click Start.

Hint: Based on the crawling policy set, it might take up to a few minutes for the crawling to complete. You might want to refresh from time to time to see when the action has completed.

By default all nodes are excluded from scrapping. To crawl specific nodes, click the dots next to the desired node in the file repository explorer and select Crawl Path.

When the crawling completes, the extracted articles display under the Extracted Paragraphs tab.

Note: Starting with DRUID 7.15, the platform extracts data from Excel files with table headers that include spaces (e.g., "Question " and "Answer "). The platform automatically trims these spaces to ensure accurate data extraction. Additionally, it extracts data from Excel documents with multiple sheets, capturing the sheet name in the "sheetName" property for each extracted article.

Step 3. Train the data source

To ensure the KB Engine searches through the data source paragraphs, train your data source by clicking the Train button at the top-left corner of the data source or select Train data source from the actions menu. Alternatively, you can Train all data sources.

Hint: If you've updated the Trainable Elements in a data source's Advanced Settings, the data source status doesn't change. You need to Retrain the data source to apply those changes. You also have the option to Retrain all data sources directly from the data source. The Retrain feature is available in DRUID 8.15 and higher.

Testing the data source performance

Testing the performance of a data source is important because it ensures that the extracted paragraphs are relevant. This process helps identify and rectify any issues, improving the overall quality and effectiveness of your bot's responses. By validating the data source performance, you can enhance user satisfaction.

To test the performance of the data source, on the Extracted Paragraphs page, in the User Says area, enter a question and select the language. All matched paragraphs will be displayed along with their scores.

You can improve the performance of the data source by reviewing and editing the paragraphs based on your needs.

Editing paragraphs

To ensure your Knowledge base high quality, we recommend you to review the extracted paragraphs and take the proper actions to improve them: open the URL from where the crawler extracted the paragraph and compare the content, edit or delete the paragraph. Refine your paragraphs by transforming unstructured data into a question-and-answer format.

To edit a paragraph, click the Action icon displayed inline with the paragraph and click Edit. Edit the paragraph Title and / or Content and save the changes.

Important! After making updates to your paragraphs, it's crucial to retrain your data source. This ensures the KB Engine recognizes these updates and provides accurate responses to user queries. Click the Train button at the top-left corner of the data source.

Fine-tuning Predictions

You can configure Advanced Settings at both the data source and node/leaf levels to achieve more precise predictions. This approach offers granular control, allowing you to adjust the extractors and trainable elements, resulting in better accuracy and performance. Unlike KB-level settings, which apply changes broadly, this targeted method adapts configurations to the unique needs of each data source or element, streamlining your authoring process.

Fine-tuning at the data source level

Navigate to the desired data source.
Select the Advanced Settings tab.
Modify advanced parameters as needed and save the settings.

Fine-tuning at the node or leaf level

In the tree explorer, select the desired node or leaf.
On the right side, select the Advanced Settings tab.
Modify advanced parameters as needed and save the settings.

Reset advanced settings

Note: This feature is available in DRUID version 7.16 onwards.

To reset advanced configurations at the data source and node/leaf levels to match the KB Advanced settings, go to Knowledge Base > Advanced Settings and click the Save to All button. This action streamlines your settings management by applying consistent KB Advanced settings across your entire configuration with just one click.

Enhance KB prediction

Refine your articles by transforming unstructured data into a question-and-answer format. Edit articles and add question / title / short description.

Access the Knowledge Base Advanced Settings, set the "trainableColumns" parameter to "Question,Answer", then train the Knowledge Base. The KB Engine will leverage both questions and answers from unstructured data sources during the prediction process, ultimately leading to improved prediction accuracy.

Note: For new bots created in DRUID 7.10 onwards, the engine will predict against both the question and answer by default. For existing bots, the engine will only predict against the answer ("trainableColumns": null) until you update the setting.