VoiceText
The VoiceText Channel enables voice integration in Contact Centers, supporting scenarios where voice services and voice AI Agents are provided by third-party platforms. It allows seamless interactions between a user and a AI Agent by handling speech-to-text and text-to-speech conversions.
By default, VoiceText uses a synchronous request-response model: when a user message is sent to Druid, the system responds immediately with the full reply (or replies) in a single API response. This approach works well for simple conversations but may introduce delays when integrations or Proactive AI Agent messages are set on flow steps. To address this, Druid supports an asynchronous long polling mechanism that improves responsiveness during more complex interactions.
The diagram below provides a high-level overview on how to integrate voice in Contact centers using external Voice AI Agents.
Third-party Voice AI Agent delivers SIP connection with the Contact Center (asynchronous communication). Sometimes, the Voice AI Agent role can be delivered by the Contact Center too.
- The Voice AI Agent delivers Speech to Text and Text to Speech services.
- The Voice AI Agent sends text request to Druid.
- Druid responds the Voice AI Agent text response (synchronous response communication).
Synchronous Request-Response
This is the default and traditional interaction model for the VoiceText channel. In this mode, when the Voice AI Agent sends a user's input to Druid via the POST *.druidplatform.com/api/voicetext/{botId}/messages API, the connection remains open, and the Voice AI Agent waits for a direct response from Druid on the same API call.
Once Druid processes the request and generates a response (or multiple responses), these messages are typically concatenated into a single string and returned to the Voice AI Agent. The Voice AI Agent then converts this text response to speech and plays it back to the user through the Contact Center.
While straightforward for simple, single-turn interactions, this synchronous model can introduce latency. If a Druid flow involves complex integrations, database lookups, or proactive messages that take time to generate, the Voice AI Agent (and thus the user) will experience a delay until all the AI Agent responses for that conversation turn are ready and returned concatenated in a string. This can lead to a less fluid user experience, as the user might perceive the AI Agent as slow or unresponsive while it waits for a complete response.
The figure below describes the detailed integration sequence.
- When there is an incoming call, the Contact Center initiates the conversation with Druid AI Agent to optimize the response with the Welcome message.
- The Contact Center initiates the conversation with the Voice AI Agent.
- The Voice AI Agent initiates the conversation with Druid AI Agent, which authenticates the conversation.
- Druid AI Agent responds with the Welcome Message.
- The Contact Center picks up the call and responds with the welcome message.
- Voice AI Agent responds with the Welcome Message.
- The Contact Center gives the Welcome message to the user.
- The Contact Center captures the “User says”.
- The Contact Center sends SIP to the Voice AI Agent.
- The Voice AI Agent converts Speech to Text and sends the text user says to Druid AI Agent, calling Druid API CreateActivity (user says).
- Druid AI Agent responds.
- Druid AI Agent responds synchronously with the AI Agent response (in text format).
- The Voice AI Agent checks if the response contains instructions to transfer the call to human agent. If the response does not contain instructions to route the call to human, the Voice AI Agent transforms Text to Speech; otherwise, it performs the next step.
- The Voice AI Agent responds to the Contact Center with voice (SIP connection).
- The Contact Center responds to the user with voice.
- The User closes the call.
- The Contact Center announces the VoiceBot that the conversation is closed.
- The Voice AI Agent announces Druid AI Agent that the conversation is closed.
- Transfers the call to Human Agent.
- The Voice AI Agent announces the Contact Center to transfer the call to Human.
- The Contact Center executes the transfer.
- Helpdesk agent and User are now talking directly.
Long Polling mechanism
Long Polling offers an asynchronous alternative to the default synchronous request-response model, designed to significantly improve responsiveness and user experience, especially in scenarios involving multi-message replies or delayed AI Agent processing (e.g., due to integrations or proactive messages).
In this mode, when the Voice AI Agent sends a user input to Druid using the POST *.druidplatform.com/api/voicetext/{botId}/messages API, it does not wait for a direct response. Instead, immediately after sending the user's input, the Contact Center initiates a continuous polling mechanism by making repetitive POST requests to the dedicated Long Polling endpoint: POST *.druidplatform.com/api/voicetext/{botId}/messages/getMessages.
Druid holds these connections open until a AI Agent message or event becomes available. As soon as a message is ready, Druid sends it back to the Contact Center, closing that specific /getMessages request. Immediately after that, the Contact Center will initiate a new request to fetch the following messaged from the AI Agent, and will continue to do so, until the phone call is terminated.
This asynchronous approach ensures that messages are delivered to the Contact Center as soon as they are generated by Druid, without waiting for an entire set of responses or for a slow integration to complete. The GET request will time out after a set period (typically 30 seconds) if no messages are available, at which point the Contact Center should re-initiate the /getMessages call to continue polling.
The steps below describe the detailed integration sequence:
- Initiating the Conversation:
- When there's an incoming call, the Contact Center (CC) initiates the conversation with the Voice AI Agent.
- The Voice AI Agent initiates the conversation with Druid AI Agent, which authenticates the conversation.
- The Druid AI Agent responds with the Welcome Message.
- Contact Center Handles Welcome Message:
- The CC picks up the call.
- The Voice AI Agent responds with the Welcome Message (received from Druid AI Agent).
- The CC gives the Welcome message to the user.
- Contact Center Captures User Input:
- The CC captures the “User says” (speech from the user).
- The CC sends this audio via SIP to the Voice AI Agent.
- The Voice AI Agent converts Speech to Text and sends the text user input to the DVA, calling the Druid API CreateActivity (
POST *.druidplatform.com/api/voicetext/{botId}/messages). - Contact Center Polling for AI Agent Responses:
- Immediately after the Voice AI Agent sends the user's input, the CC begins polling for responses by making periodic POST requests to the Long Polling endpoint:
*.druidplatform.com/api/voicetext/{botId}/messages/getMessages. - The Druid AI Agent sends available AI Agent responses (in text format) to the CC via these
getMessagescalls, as soon as they are ready. EachgetMessagescall returns one message/event. - If the Druid AI Agent has more messages, the CC immediately makes another POST request to
getMessagesto retrieve the next one. This continues until no more messages are received. - The CC passes the text response to the Voice AI Agent.
- The Voice AI Agent checks if the response contains instructions to transfer the call to a human agent.
- If it does not contain transfer instructions, the Voice AI Agent transforms Text to Speech.
- The Voice AI Agent responds to the CC with voice (SIP connection).
- The CC responds to the user with voice.
- User Closes the Call:
- The CC announces to the Voice AI Agent that the conversation is closed.
- The Voice AI Agent announces to the Druid AI Agent that the conversation is closed (e.g., by sending a close_conversation message).
- Transferring the Call to a Human Agent. If a "route-to-human" event is received via the getMessages polling:
- The Voice AI Agent announces to the CC to transfer the call to a human.
- The CC executes the transfer.
- The helpdesk agent and user are now talking directly.
For each response received:
Prerequisites
- For DRUID on premise deployments, make sure that you provide inbound access from the following messaging endpoint: DRUID.BotApp.
VoiceText Channel Integration
The VoiceText channel is active by default in Druid. To integrate your AI Agent with a third-party Voice AI Agent, from Druid Portal, AI Agent settings, go to channels, click VoiceText and copy the values of Authorize URL and AI Agent URL.
If you want to use the long polling asynchronous mechanism, tap on Enable long polling and copy the URL as well.
Use these values to configure your Voice AI Agent and make API calls to DRUID.
The following fields are available in DRUID:
- [[ChatUser]].ChannelId = "voicetext"
- [[ChatUser]].UserId - Stores the user's unique identifier.
DRUID API Reference
This section describes the DRUID APIs you should use for VoiceText channel integration.
Authorize
POST: Use the Authorize URL *.druidplatform.com/api/services/app/Chat/AuthorizeAnonymousAsync
Request Body
Response
Response Example
{
"botId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"userId": "string",
"conversationId": "string",
"token": "string"
}
The returned token has a validity of 1 hour.
Send Messages (Create Activity)
Use this API to send send User Says to the Druid AI Agent.
Syntax
POST: Use the AI Agent URL *.druidplatform.com/api/voicetext/{botId}/messages
Request Header
In the request header, map the Authorize key to the bearer token obtained from the Authorize API. Use the DRUID-specific CONCAT function for the Authorize key value with the following syntax:
Here, [[Entity]].StringField represents the field that stores the token obtained from the Authorize API.
Request Body
Request Body template
{
"type": "message",
"channelId": "voicetext",
"conversation": {
"id": "<conversationId>"
},
"from": {
"id": "<userId>"
},
"text": "<user says>",
"timeout": 50
}
The timeout parameter, available in DRUID 8.13 and higher, enables you to specify a duration in seconds for the system to wait for a response from the Flow Engine. If the Flow Engine doesn't respond within the configured timeout period, the system will log an error in the Conversation History.
Responds with Message
{
"type": "message",
"channelId": "voicetext",
"conversation": {
"id": "<conversationId>"
},
"to": {
"id": "<userId>"
},
"text": "<bot response>",
"speak": "<bot response>"
}
Responds with Event (transfer to human)
{
"type": "event",
"channelId": "voicetext",
"conversation": {
"id": "<conversationId>"
},
"to": {
"id": "<userId>"
},
"name": "route-to-human",
"value": "input_mapping_json_object"
}
Get Messages (Long Polling)
Get messages from the DRUID AI Agent when using the long polling mechanism.
Syntax
POST *.druidplatform.com/api/voicetext/{botId}/messages/getMessages
Request Header
In the request header, map the Authorize key to the bearer token obtained from the Authorize API. Use the DRUID-specific CONCAT function for the Authorize key value with the following syntax:
Here, [[Entity]].StringField represents the field that stores the token obtained from the Authorize API.
Request Body
{
"conversationId": "your-conversation-id",
"mergedResponse": true
}
Parameters
| Parameter | Type | Description | Required |
|---|---|---|---|
| conversationId | String | The unique identifier for the ongoing conversation, obtained from the Authorize API response. | Yes |
| mergedResponse | Boolean |
Set to true to receive all available AI Agent messages for the current turn combined into a single response payload. This can reduce the number of If false or omitted, DRUID will return messages one by one, requiring a separate |
No |
Response
The response structure for getMessages is identical to the Responds with Message and Responds with Event (transfer to human) sections described for SendMessages API response, containing the AI Agent response or an event.




