High Availability On-Premises Deployment

Druid HA deployments leverage industry-standard Kubernetes technology. This setup is designed to handle light to moderate chat traffic, averaging 100 messages per minute, with occasional spikes up to 300 messages per minute, and no significant load on the Druid Connector.

Standard Deployment Architecture Diagram

Components Description

Name

Description

Type

 

Provisioning

Used for provisioning bots related resources.

 

Druid

It will provision: the bot itself, channels, and manages export/import of authored elements: dialogs, integrations, entities, etc.

APC

Admin Portal – used for administration of bot solutions, users, tenants etc.

Druid

It hosts the web portal interface for chatbot authoring and user management.

 

 

Kibana

Used for logs investigation

 

 

Third-party

The web application used to explore the technical Druid applications’ logs which are stored in elasticsearch database.

Image: docker.elastic.co/kibana/kibana

 

Elasticsearch

Used for logs storage

 

Third-party

The elasticsearch (timeseries type) database where all Druid applications’ logs are gathered.

Image: docker.elastic.co/elasticsearch/elasticsearch

 

RabbitMQ

Message broker solution for intercommunication of Druid applications.

 

Third-party

The communication protocol is AMQPS.

Image: rabbitmq

 

 

Redis

Used for memory cache.

 

 

Third-party

Druid is using it for in-memory data store of several applications, also for multi-instance synchronization (High Availability) of our applications and internal notifications system.

Image: redis/redis-stack-server

 

 

Nginx

Inbound traffic to Druid platform.

 

 

Third-party

The only way to interact with Druid platform.

Images:

registry.k8s.io/ingress-nginx/controller

registry.k8s.io/ingress-nginx/kube-webhook-certgen

 

Grafana

Used for dashboards

 

Third-party

The GUI used to explore the monitoring KPIs.

Image: grafana/Grafana

 

 

 

Prometheus

Used for metrics collector and storage

 

 

 

Third-party

It manages a timeseries database, which is automatically updated by Druid applications.

Images:

quay.io/prometheus/node-exporter k8s.gcr.io/kube-state-metrics/kube-state-metrics

quay.io/prometheus/alertmanager jimmidyson/configmap-reload

quay.io/prometheus/prometheus

 

BotService

Message manager for chat bot

 

Druid

This is main messaging endpoint for DirectLine channel (public webchats communicate with this service for user and chatbot messages transfer).

FlowEngine

Used for chat session flows

Druid

The main dialog management engine for executing the configured dialogs in all chat conversations.

 

Endpoints

Flow starter – external apps

 

Druid

This application hosts the endpoints to allow the external integration from third-party applications to Druid conversational engine (e.g., RPAs, Electronic Signature solutions, etc.).

 

BotApp

The chatbot

 

Druid

This application is the message dispatcher between public communication channels (e.g., WhatsApp, Facebook, Viber, etc.) and our conversational engine (the Flow Engine). Practically, all conversation will pass through BotApp and forward to the right channel.

Connector

Used for integration with enterprise services

Druid

 

The main automation service which performs all activities related to data exchange between the conversational engine and third-party applications, databases, etc., through specific interfaces, e.g., REST, SOAP, SQL, MSCRM, AZ Blob Storage, document generator, file download, etc.

Druidconnector also persists the conversations’ transcripts to the history database.

 ML Api Gateway Proxying the calls between ML services and their clients. Druid

Practically the application is proxying the calls from FlowEngine and APC to ML Model Serving and ML Model Training.

ML Model Serving Resolves NLU predict requests Druid
The application acts as an active NLP engine providing responses to intent/entity predict requests based on the NLU models provided by ML Model Training.
ML Model Training Creates NLU models. Druid

Generated NLU models based on training phrases from APC.

  Ignite Persisted cache for conversational engine Druid
Especially used by conversation user’s management.
Antimalware File signature checker Druid
This component is used by druidflowengine component to verify file signature versus its extension and validate extension against supported extensions: pdf, png, jpg, jpeg, doc, docx, xls, xlsx, odt, ods, tiff, tif, mp3, mp4, mkv, webm, txt, json, csv. Also, it can be integrated with any 3rd party antimalware system which is AMSI interface compliant
API Conversational authorizer and live agent notification service Druid
Exposes web sockets for Druid live agent webpage, to manage live chat notifications. It also hosts light web resources for certain chat functionality like sensitive data input, SSO auth, etc.
BotApi

Used for managing messages status

Druid

Statuses: Sent, Received, Read

Dataservice

Druid proprietary solution for conversational context storage.

Druid

Used to persist DRUID entity records created and managed within the DRUID Platform simplifying records authoring.

Webview

Conversational Business Applications

Druid

Hosts the Druid CBA interface.

ContactCenterIntegration

Integrations Flow Engine with 3rd party Contact Centre solutions

Druid

e.g., Oracle B2C, Amazon Connect, FreshChat, SalesForce, etc.

Knowledgebase Agent

Main knowledgebase engine.

Druid

Manages KB related requests (web-crawl, document-extraction, embedding, train and predict).

Knowledgebase API

Proxying the calls between KB services and their clients.

Druid

Practically, the application is proxying the calls from FlowEngine and APC to Knowledgebase Agent and Connector

Service Gateway

Proxying the calls between KB agent and embeddings servers (Tritor, etc.).

Druid

Through Service Gateway, embeddings services are offered "as a service" to requesting clients (KB agent, Model Serving, and other)

MongoDB

Databases used by Knowledgebase Agent and Dataservice.

Third-party

Image: mongo

Triton

AI Nvidia model

Druid
 

Generates semantic embeddings for ML and KB services.

vLLM

Generative AI server

Third-party
 

Used with Druid Knowledgebase service to generate completions over KB responses.

H/W and S/W requirements - Non-Cloud Specifications

Production Environment

#

Item

Qty (Nodes)

OS

CPU

(Intel Xeon)

RAM

SSD

Data

Notes

1

App Server - The host of the Druid platform

5

Linux min kernel 3.10 i.e., Ubuntu 18.04 LTS, RedHat 7.4 (newer,

equivalent)

8 vCPU

32 GB

OS 120 GB

100 GB

(Scale as required)

Kubernetes Cluster (min version 1.19)

2

App Server – Druid semantic classification machine

1

Linux min kernel 3.10 i.e., Ubuntu 18.04 LTS, RedHat 7.4 (newer,

equivalent)

4 vCPU

8 GB

OS 120 GB

50 GB

(Scale as required)

NVIDIA 16 GB GPU with

compute capability 7.5 (e.g., T4, V100, P100)

3

App Server – LLM Service for Gen.AI

1

Linux min kernel 3.10 i.e., Ubuntu 18.04 LTS, RedHat 7.4 (newer,

equivalent)

8 vCPU

32 GB

OS 120 GB

200 GB

(Scale as required)

2 X NVIDIA A100 80GB GPU

4

Microsoft server (App server + Land bot page)

1

Windows 2019+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

-

ASP.NET

4.6.1. Hosting IIS is required

(Dedicated or shared)

5

Microsoft SQL server (DB server)

1

Windows 2019+; Updates “up to date”

4 vCPU

16 GB

OS 120 GB

400 GB

(Scale as required)

Microsoft SQL Server Enterprise 2019+ Enterprise Database Service

(Dedicated or shared)

6

Dedicated storage –

container and infrastructure storage

 

 

 

 

 

100 GB

(Scale as required)

Dedicated or shared - NFS

Note:   For disaster recovery (DR), the Druid platform supports only an active-passive DR mechanism; active-active DR is not supported. In an active-passive DR setup, the requirements are the same as those for a production environment. Additionally, you must implement mechanisms to replicate both SQL databases and storage.

Testing Environment

#

Item

Qty (Nodes)

OS

CPU

(Intel Xeon)

RAM

SSD

Data

Notes

1

App Server - The host of the Druid platform

1

Linux min kernel 3.10 i.e., Ubuntu 18.04

LTS, RedHat 7.4

(newer, equivalent)

10 vCPU

40 GB

OS 120 GB

100 GB

(Scale as required)

Kubernetes Cluster (min version 1.19)

2

App Server – Druid semantic classification machine

1

Linux min kernel 3.10 i.e., Ubuntu 18.04

LTS, RedHat 7.4

(newer, equivalent)

4 vCPU

8 GB

OS 120 GB

50 GB

(Scale as required)

NVIDIA 16 GB GPU with

compute capability 7.5 – Optional for testing Env

3

App Server – LLM Service for Gen.AI

1

Linux min kernel 3.10 i.e., Ubuntu 18.04

LTS, RedHat 7.4

(newer, equivalent)

8 vCPU

32 GB

OS 120 GB

200 GB

(Scale as required)

2 X NVIDIA A100 80GB GPU

4

Microsoft test server (App server + Land bot page)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

-

ASP.NET

4.6.1. Hosting IIS is required.

(Dedicated or shared)

5

Microsoft SQL server (DB server)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

50 GB

(Scale as required)

Microsoft SQL Server Standard 2019+ Database Service

(Dedicated or shared)

Note:  For non-GPU semantic classification machines used in the testing environment, the above table can be replaced with the following one. Please note that LLM machines require a GPU and therefore have no alternative.

Testing Environment non-GPU specs

#

Item

Qty (Nodes)

OS

CPU

(Intel Xeon)

RAM

SSD

Data

Notes

1

App Server - The host of the Druid platform

1

Linux min kernel 3.10 i.e., Ubuntu 18.04

LTS, RedHat 7.4

(newer, equivalent)

16 vCPU

64 GB

OS 120 GB

150 GB

(Scale as required)

Kubernetes Cluster (min version 1.19)

2

App Server – LLM

Service for Gen.AI

N/A

N/A

N/A

N/A

N/A

N/A

N/A

3

Microsoft test server (App server + Land bot page)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

-

ASP.NET

4.6.1. Hosting IIS is required.

(Dedicated or shared)

4

Microsoft SQL server (DB server)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

50 GB

(Scale as required)

Microsoft SQL Server Standard 2019+ Database Service

(Dedicated or shared)

H/W and S/W requirements - Cloud (Azure, EKS, etc.)

Production Environment

#

Item

Qty (Nodes)

OS

CPU

(Intel Xeon)

RAM

SSD

Data

Notes

1

App Server - The host

of the Druid platform

5

Cloud specific

8 vCPU

32 GB

Cloud

specific

-

Kubernetes Cluster (min

version 1.19)

2

App Server – Druid semantic

classification machine

1

Cloud specific

4 vCPU

8 GB

Cloud specific

-

NVIDIA 16 GB GPU with

compute capability 7.5 (e.g., T4, V100, P100)

3

App Server – LLM

Service for Gen.AI

1

Cloud specific

8 vCPU

32 GB

Cloud

specific

-

2 X NVIDIA A100 80GB

GPU

4

Microsoft server (App server + Land bot page)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

-

ASP.NET

4.6.1. Hosting IIS is required.

(Dedicated or shared)

5

Microsoft SQL server (DB server)

1

Windows Server 2016+; Updates “up to date”

4 vCPU

16 GB

OS 120 GB

400 GB

Microsoft SQL Server Enterprise 2019+ Enterprise Database Service

(Dedicated or shared)

6

Network disks

-

-

-

-

-

700 GB

Cumulated for entire

platform.

Testing Environment

#

Item

Qty (Nodes)

OS

CPU

(Intel Xeon)

RAM

SSD

Data

Notes

1

App Server - The host

of the Druid platform

1

Cloud specific

10 vCPU

40 GB

Cloud

specific

-

Kubernetes Cluster (min

version 1.19)

2

App Server – Druid semantic

classification machine

1

Cloud specific

4 vCPU

8 GB

Cloud specific

-

NVIDIA 16 GB GPU with

compute capability 7.5

(e.g., T4, V100, P100)

3

App Server – LLM

Service for Gen.AI

1

Cloud specific

8 vCPU

32 GB

Cloud

specific

-

2 X NVIDIA A100 80GB

GPU

4

Microsoft test server (App server + Land bot page)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

-

ASP.NET

4.6.1. Hosting IIS is required.

(Dedicated or shared)

5

Microsoft SQL server (DB server)

1

Windows Server 2016+; Updates “up to date”

2 vCPU

8 GB

OS 120 GB

50 GB

(Scale as required)

Microsoft SQL Server Standard 2019+ Database Service

(Dedicated or shared)

6

Network disks

-

-

-

-

-

300 GB

(Scale as required)

Cumulated for entire platform.

DRUID Platform DB Server - Additional software requirements

  • SQL Server instance attributes:
    • Collation: Latin1_General_CI_AS
    • Windows and SQL Server Authentication mode enabled.
    • TCP Protocol enabled (in SQL Server Configuration Manager)
    • SQL Server port is open in the firewall of the DB Server
      • Must be fixed port, not on a dynamically allocated one.
  • SQL Server Management Studio (SSMS). Alternatively Azure Data Studio or osql utility can be used to run T-SQL statements necessary in installation process.

Detailed components CPU and memory requests and limits

Pod Name

Mem Req.

CPU Req.

Mem Lim.

CPU Lim.

Aantimalware

512

100

512

1000

ApcBack

1536

500

4096

2000

Apc

100

100

384

250

Api

512

100

1024

1000

BotApi

512

100

1024

1000

BotApp

768

100

1536

1000

BotService

512

100

1024

1000

Connector

768

200

2048

2000

Dataservice

512

100

1024

1000

Endpoints

512

100

1024

1000

Flow Engine

1024

250

2048

2000

Ignite

512

100

5120

1500

Knowledgebase API

512

100

1024

1000

Knowledgebase Agent

3584

600

15360

6000

ML Api Gateway

512

100

1024

1000

ML Model Serving

512

100

2048

1000

ML Model Training

2048

500

4096

2000

Migrator

Best Effort

Provisioning

512

50

1024

400

Service Gateway

512

100

1024

1000

Webview

512

100

1024

1000

RabbitMQ

2048

1000

2048

1000

Redis

256

200

1024

1000

Elasticsearch

2048

500

2048

1000

Kibana

512

100

1024

500

Nginx

90

100

 

 

MongoDB

2048

250

2048

1000

Triton Server

512

100

6144

3500

Triton Models

Best Effort

Grafana

512

300

1024

2000

Prometheus Node Exporter

Best Effort

Prometheus Server

Best Effort

Network Communication Matrix

Source (Name, IP, URL, etc.)

Destination

(Name, IP, URL, etc.)

Protocol

Port

Function

Used For

 

 

 

 

 

 

 

 

 

App Server*

druidcontainerregistry.azurecr.io

 

 

 

 

 

 

 

 

 

HTTPS

 

 

 

 

 

 

 

 

 

443

Druid Container

Registry

 

 

 

 

 

 

 

 

 

Installation

api.dso.docker.com api.segment.io auth.docker.io cdn.auth0.com cdn.segment.com desktop.docker.com

docker-pinata-support.s3.amazonaws.com docker.elastic.co

hub.docker.com k8s.gcr.io login.docker.com mcr.microsoft.com notify.bugsnag.com nvcr.io

production.cloudflare.docker.com quay.io

registry-1.docker.io

sessions.bugsnag.com

 

 

 

 

 

 

 

 

Third-party Containers

 

 

WebApp (public)

druidapc.{{domain}}*

 

 

HTTPS

 

 

443

 

 

Chatbot interaction

 

 

Utilization

druidapcback.{{domain}}

druidapi.{{domain}}

druidbapi.{{domain}}

druidbs.{{domain}}

 

 

 

 

Intranet***

druidapc.{{domain}}

 

 

 

 

HTTPS

 

 

 

 

443

 

 

 

 

Platform administration

 

 

 

 

Utilization

druidapcback.{{domain}}

druidapi.{{domain}}

druidbapi.{{domain}}

druidbapp.{{domain}}

druidbs.{{domain}}

druidep.{{domain}}

druidkib.{{domain}}

druidrmq.{{domain}}

App Server

(Connector)

<TBD>

<TBD>

<TBD>

Enterprise

Services

Utilization

* This entry is necessary at installation or upgrade time for Kubernetes engine to automatically download needed binaries.

** In case the client doesn’t want to expose APC component, some specific files must be downloaded (from APC) and made them accessible (as resources) to WebApp. DRUID team will provide the list. There is only one downside: the files must be copied to WebApp within any DRUID Platform’s upgrade process.

*** Dedicated names for Intranet access only can be accommodated; this will require additional certificates.

Applications’ Technical Users

Application

User

Notes

Druid APC

admin

Used for platform administration.

{{WEB-API-USER-NAME}}

Used for programmatic access to platform API.

Password parameter: {{WEB-API-USER-PASSWORD}}

RabbitMQ

{{RMQ-USER}}

Used for queues admin. Main usage is for troubleshooting.

Password parameter: {{RMQ-PASSWORD}}

Kibana

{{KIBANA-USER}}

Used for logs exploring, mainly troubleshooting. Password parameter:

{{KIBANA-PASSWORD}}

BotApp BotService

****

Only password. It is used by Bot App to authenticate on Bot Service (two of the Druid components). It cannot be used from outside.

Parameter: {{BOTSERVICE-PASSWORD}}

Redis

****

Only password. It cannot be used from outside.

Parameter: {{REDIS-PASSWORD}}

Endpoints

****

Only password.

Parameter: {{ENDPOINTS-PASSWORD}}

DNS entries

DNS registration of druid services FQDNs: Please register in your DNS and provide us with the list of the following FQDNs (example provided for the first few, please extrapolate for the rest).

Domain

Type

Name

Value (IP addresses)

FQDN

 

 

 

 

{{DOMAIN}}

 

 

 

 

A

Kibana

 

 

 

 

{{APP-SERVER-IP}}

druidkib.example.com

RabbitMQ

druidrmq.example.com

Apc

druidapc.{{domain}}

ApcBack

druidapcback.{{domain}}

Api

druidapi.{{domain}}

BotAPI

druidbapi.{{domain}}

BotApp

druidbapp.{{domain}}

BotService

druidbs.{{domain}}

EndPoints

druidep.{{domain}}

SSL Certificate

To access Druid platform via HTTPS protocol, the SSL certificate(s) must be prepared. The certificate(s) must cover all names defined in section “DNS Entries” documented above.

You can provide one or more certificates. The following approaches are valid for the Druid platform use case (we strongly recommend the last two options):

  • Multiple certificates: One certificate for each service in the list of names.
  • A single certificate with multiple hosts (CN or SANs).
  • A wildcard certificate.

Specific components need

Component

 

Storage Class

Ingress

Load Balancer

Special configs/reqs

RWO RWM

nginix/traefik/

other

No

No

No

Yes

No

rabbitmq

Yes

No

Yes

No

No

redis

Yes

No

No

No

optional:

sysctl -w net.core.somaxconn=10000

echo never >

/sys/kernel/mm/transparent_hugepage/e nabled (+ adding in /etc/rc.local)

elasticsearch

Yes

No (opt. Yes)

No

No

mandatory:

sysctl -w vm.max_map_count=262144 for OpenShift:

https://developers.redhat.com/blog/2019

/11/12/using-the-red-hat-openshift- tuned-operator-for-elasticsearch

kibana

No

No

Yes

No

No

druid

components

Yes

Yes

Yes

No

No