Extracting Data from Video Files

DRUID can process video files to extract data from video content, making it searchable and usable within your knowledge base.

Note:  This tenant feature is available in technology preview in DRUID 9.1 and higher. It requires Druid Knowledge Base Multimedia Extractor activation on your tenant. For activation, reach out to DRUID Support.
Important!  Extracting data from video files will incur additional costs.

DRUID supports data extraction from video files only for the following sources:

  • SharePoint
  • Custom Data Sources
  • Website: Videos must be hosted directly on the website. Extraction is not supported for videos embedded from third-party platforms (e.g., YouTube, Vimeo).

How Video Data Extraction Works

The process of extracting data from video files involves several automated steps:

  1. Discovery: During the crawling process, the Knowledge Base (KB) Agent identifies and discovers video files.
  2. Video to Audio Conversion: DRUID utilizes a robust multimedia framework to convert the video file into an audio format (MP4).
  3. Audio Transcription (ASR): Automatic Speech Recognition (ASR) technology is then applied to generate a transcript from the audio. This transcript is temporarily stored within DRUID. You can download the transcript file from the data source.
  4. Data Extraction: The KB Agent extracts relevant data from the generated transcript. This is performed using the MP4 extractor configured under KB Advanced settings > File Extractor > MP4. The default MP4 extractor is DRUID's Basic extractor. You can select a LLM extractor based on your preferences for enhanced extraction.

Important Considerations and Limitations

Please be aware of the following when extracting data from video files:

  • Video File Size Limits: DRUID can extract data only from video files with a maximum size of 350 MB or an audio duration of 2 hours, whichever is exceeded first. If either of these limits is surpassed, the video file extraction will fail.

  • Processing Time: Extracting data from larger video files can be a time-consuming process and may take up to several hours to complete.

  • Original Video File Access: You cannot download the original video file directly from the data source after extraction.