Filter Data using DRUID Custom Query

The DRUID Custom Query allows you to filter request data for Query entities and Query-related entity integration tasks. With this feature, you can retrieve and analyze data from entities beyond the request entity, without relying on the Query-related entity integration task. The DRUID Custom Query supports a range of SQL statements, functions, operators, and clauses, as detailed below.

Basic statements

Aliases

Filtering and conditions

Aggregate Functions

Joins

Set Operations

Expressions and Operators

Expressions allow you to perform calculations or operations on fields, literals, and constants. These expressions are often combined with operators to manipulate the data within your queries.

HINT: You can also use expressions in the WHERE clause to filter results.

String Functions

String functions are used to manipulate and process text (string) data. These functions help format, extract, replace, and analyze string values.

NOTE: The string functions are supported in DRUID 7.2 and higher.

DATETIME Functions

You can use the following DATETIME functions in a DRUID Custom Query to manipulate and filter data based on date and time values. These functions support use cases such as rounding to specific time intervals, shifting dates, and extracting specific parts of a timestamp.

Advanced Functions

Records limiting clauses

Joining with Entity Fields for Large Data Volumes

When working with large data volumes, using the GROUP BY clause correctly is essential for both accurate results and good query performance. Aggregating data by specific groupings, especially when joining related entity fields, allows you to combine data from multiple entities while ensuring correct aggregation across joined entities. Incorrect grouping can lead to inaccurate results or inefficient queries.

IMPORTANT! Indexing is critical when working with large data sets. Make sure that entities and fields used in joins and aggregation are properly indexed. For more information, see Add Data Service Indexes.
NOTE: Joining data from referenced entities is available starting with Druid 19.16.

Important considerations

When joining entity fields and aggregating data, keep the following in mind:

  • Bracket notation is used to expose related entity fields in the result set.
  • All non-aggregated fields in the SELECT clause must also appear in the GROUP BY clause.
  • Aliases help avoid ambiguity when referencing fields from joined entities.
  • All date and time values are stored in Druid as UTC. Time offsets may be required to account for user time zones. Truncating the adjusted timestamp removes the time component so all records for the same day are grouped together.

Syntax

Copy

Syntax

SELECT 
  aggregate_function(field_1) AS Alias, 
  field_2, 
  RelatedEntity.Field 
FROM 
  Entity AS alias 
GROUP BY 
  field_2, 
  RelatedEntity.Field 
ORDER BY 
  field_2 ASC

Example 1: Aggregating data using related entity fields

The following example counts books and groups the results by author. The Book entity is joined with the related Author entity, and author fields are included in the aggregation.

Copy

Example 1 query

SELECT 
COUNT(*) AS Counter, 
author.Id   [BookAuthor.Id] // BookAuthor is field Book for Authors
author.FirstName   [BookAuthor.FirstName]   // 
FROM Book AS book
JOIN Author AS author on author.Id = book.BookAuthorId
GROUP BY field_2, RelatedEntity.Field 
ORDER BY field_2 ASC

Example 2: Aggregating data using date normalization

The following example aggregates books by name and by day. The ReceivedOn timestamp is adjusted by 120 minutes and then truncated to the start of the day to ensure correct daily grouping.

Copy
Example 2 query
SELECT
    b.BookName,
    TRUNC(day, DATEADD(minute, 120, b.ReceivedOn)) AS ReceivedOn,
    COUNT(*) Counter
FROM Book b
GROUP BY b.BookName, b.ReceivedOn
ORDER BY b.BookName, b.ReceivedOn