Voice AudioCodes

The Voice channel through VoiceAI Connect from AudioCodes enables you to deliver a seamless voice experience to the users talking to your DRUID virtual assistants.

VoiceAI Connect acts like a hub connecting different telephony systems (telephony channel, public telephony provider, contact center, enterprise communication platform, or any platform communicating via WebRTC) to the DRUID bot framework and voice AI cognitive services.

Hint: Through AudioCodes, you can integrate DRUID with various telephony systems like: Genesys, Avaya, Cisco. For information on the list of available contact center solutions you can integrate DRUID with, see SBC Interoperability List.

In a typical bot deployment, VoiceAI Connect receives a phone call and connects it to your bot.

Prerequisites

For DRUID on premise deployments, make sure that you provide inbound access from the following messaging endpoint: DRUID.BotApp.

Activate the Voice AudioCodes Channel

To activate the channel, follow these steps:

In the DRUID Portal go to your bot settings. Click the Channels tab, then click Voice, AudioCodes – VoiceAi Connect. The channel info section expands.
Generate a token by clicking the Generate button.
By default, the communication between DRUID and VoiceAi Connect is done via the WebSocket protocol. For special deployments where network special restrictions may deny communication via WebSocket, do not use Web Socket, disable the checkbox.
In the Reply timeout in seconds field, enter the threshold the bot can respond before the call is automatically disconnected. For more information, see Handle Conversation Disconnect.
In the Language map JSON field, provide a one-to-one mapping between the language codes used by the Speech-To-Text (STT) service provider (key on the left) and DRUID-specific language codes, that is, ISO 639-1(key on the right), providing the Text-to-speech voice DRUID will be using. For reference, consult the locales and voices supported for Text-to-speech provided by Azure Cognitive Services.

Use the following format for the language codes mapping:

"<STT Provider language code/locale>": "<DRUID-specific language code>|<Text-to-speech voice>"

For example:

Copy

{
  "ro-RO": "ro|ro-RO-AlinaNeural",
  "en-US": "en-US|en-US-AshleyNeural",
  "th-TH": "th-TH|th-TH-AcharaNeural"
}

Note: To provide users with a flawless conversation and avoid interrupting the conversation due to language change as DRUID takes for granted the language identified by the STT language detection, map all bot languages: bot default language, bot additional languages (if any) and the languages used on the bot at Conversation Time translation. For situations when you want to trigger the Language changed flow, you can activate DRUID language detection on the bot.

Hint: If the STT service provider has voice only en-UK and not for en-US, you can map “en-UK” to “en-US”.

Send the token and DRUID URL to your DRUID representative and DRUID Team in partnership with AudioCodes Professional Services will set up the connection between your virtual assistant and VoiceAI Connect.

After the channel’s activation, the following fields are available in DRUID:

[[ChatUser]].ChannelId = “audiocodes”. It identifies the channel.
[[ChatUser]].Phone – Stores the user’s phone number.
[[ChatUser]].CalleePhoneNumber – Stores the bot’s phone number.

Author Flows for Voice

DRUID provides authors with a simple way to configure flows for Voice channel by providing the SpeakMessage in the Voice section on flow steps.

Note: In the Voice channel, the SpeakMessage has priority over the step message.

Best Practices

To make the bot’s voice sound more natural, follow these best practices:

For multi-channel bots do not customize voice messages per channel, instead check the steps message spelling and punctuation marks and correct the mistakes if any. Also properly use diacritics and accented characters.
Provide short messages in the SpeakMessage field on steps.
Use properly the punctuation marks. This way you make it easier for the bot to read using pauses and the pitch of the voice to make the message clear.
For hero, thumbnail and choice steps, if you want the bot to speak what’s in the cards, buttons, etc., provide the desired voice message in the SpeakMessage field on these steps.

Sending events to VoiceAI Connect

You can send events (playUrl, transfer, hangup, etc.) from DRUID to VoiceAI Connect by using DRUID Backchannel Flow Steps to generate any supported VoiceAi Connect event.

In Input mapping on the backchannel step, provide the entity that stores activity parameters of that event.

Set session parameters using the "config" event

Note: Use the "config" event with priority against the SetVoiceSessionParams.

To set the parameters for the entire call (session parameters), at the beginning of the call, on the welcome flow, add a backchannel step named config and in Input mapping provide the entity which stores the timeouts and actions you want to set.

On the flow step, click the Metadata section, tap on Advanced Editing and in the JSON field add the "sessionParams" object and set the desired parameters.

For information on the config event and general parameters for the config event, see VoiceAI Connect documentation, section General bots parameters and Changing call settings.

If later on you decide to update some session parameters, use a backchannel step called SetVoiceSessionParamsand in Input mapping provide the entity which stores the timeouts and actions you want to address and in the Metadata section add the object specific to the action you want to perform and provide the parameters.

Modifying call parameters

Any backchannel type flow step is sent as event to VoiceAI Connect (transfer, hangup, etc.). In the activityParams property of that specific VoiceAi Connect event, DRUID sends the JSON object of the entity specified in Input mapping on the backchannel step.

If you want to modify activity parameters without sending a specific event to AudioCodes’ VoiceAI Connect (detect language change on conversation activity), you need to use a backchannel step called SetVoiceActivityParams. In the activityParams property, DRUID sends the JSON object of the entity specified in Input mapping on the backchannel step.

If you want to modify session parameters (handle bot delay, detect language change on session level, etc.), you need to use a backchannel step called SetVoiceSessionParams. In the sessionParams property, DRUID sends the JSON object of the entity specified in Input mapping on the backchannel step.

Capture a Collection of Dual Tone Multi Frequency (DTMF) Digits

You can configure flow steps to capture a collection of digits the user presses at the phone's keypad, either within a specific time frame between digits, or a maximum number of digits or the digits pressed prior a specific digit set on the flow step.

To capture digits, on the flow step, click the Metadata section, tap on Advanced Editing and in the JSON field add the "audioCodesDTMF" object and set the parameters described in the table below.

Parameter	Type	Description	Mandatory
sendDTMF	Boolean	To capture a collection of digits, set the parameter sendDTMF to false; otherwise, the bot will capture only the first digit pressed by the user on the phone’s keypad.	Yes
bargeInOnDTMF	Boolean	When set to true, allows the user to interrupt the bot by pressing a DTMF digit which terminates the bot response. Note: To prevent users from interrupting the bot, we strongly recommend you to set the parameter bargeInOnDTMF to false.	No
dtmfCollect	Boolean	Set this parameter to true to capture all the DTMF digits entered by the user. Default value is false; that is, the bot captures only the first digit pressed by the user.	Yes
dtmfCollectInterDigitTimeoutMS	Number	The timeout in milliseconds the bot waits for the user to press another digit before it captures the digits. The timeout is triggered after the user enters the first DTMF digit and is reset after each digit. The default value is 2000ms. Note: The parameter is applicable only when the dtmfCollect parameter is set to true. If the timeout is reached, the parameters dtmfCollectMaxDigits and dtmfCollectSubmitDigit are disregarded.	Yes*
dtmfCollectMaxDigits	Number	The maximum number of DTMF digits the user is expected to press on the phone’s keypad. The default is 5. Note: The parameter is applicable only when the dtmfCollect parameter is configured to true. If the timeout is reached, the parameters dtmfCollectInterDigitTimeoutMS and dtmfCollectSubmitDigit are disregarded. If dtmfCollectMaxDigits is set to 0, digits are captured based on the parameters dtmfCollectInterDigitTimeoutMS and dtmfCollectSubmitDigit.	Yes*
dtmfCollectSubmitDigit	String	Defines a special DTMF "submit" digit that when received from the user, the bot captures the digits pressed before it without waiting for the timeout to expire or for the maximum number of expected digits. The valid value is any symbol on a phone keypad. The default is # (pound key). Note: The parameter is applicable only when the dtmfCollect parameter is configured to true.	Yes*

*The parameter controls how the bot captures the digits. You can use these parameters in any combination, but at least one is mandatory.

Example: Capture the user’s CNP on a prompt step

This section describes how to configure a prompt step to capture the CNP provided by the users prior to pressing # (pound key) on their phone keypad.

Enter [[Account]].ClientCNP in Input mapping on the step.

On the prompt step Metadata section, click Advanced editing and in the JSON editor add the following code:

Copy

"audioCodesDTMF":{
    "sendDTMF":false,
    "bargeInOnDTMF":false, 
    "dtmfCollect":true,
    "dtmfCollectSubmitDigit":"#"
}

Extracting data using STT services in AudioCodes

You can configure your DRUID flows (using prompt steps) to extract specific data from user says using Speech-to-Text (STT) service integrated with AudioCodes. For example, authors can instruct ZevoTech Speech-to-Text (STT) service integrated with AudioCodes to extract dates from spoken input.

Note: This feature is available in DRUID 8.17 and higher.

How it Works

When you expect a user to provide a user will provide specific data (e.g., a birthdate), you send instructions to the STT service. This guides the STT engine to extract the specified data as metadata, returning both the full transcribed text and the extracted value.

Configure data extraction

Before configuring the data extraction, your entity has:

A string field to store the full transcribed text.
A field of the appropriate type to accommodate the data you want to extract. If you are using ZevoTech Speech-to-Text (STT), currently, only birth_date is supported for extraction; therefore, make sure the field name is birth_date.

To configure a prompt step for data extraction, follow these steps:

In Input mapping enter the string field that will store the full transcribed text. For example, [[Address]].userInput.
Click on the Metadata section, tap on Advanced Editing and in the JSON field add the "audioCodesSTTinstructions" object. This object requires the following parameters:

Parameter	Type	Description	Mandatory
task	String	Set this to "data_extraction" to instruct the STT service to extract specific data from the user's spoken response based on the structure defined in the extracted_data object.	Yes
question	String	Provide context for interpreting the user's response and guide the STT service on what information to extract. This can be the same question as in the prompt step. For example, `"question": "What is your birth date?"`	Yes
extracted_data	JSON	Define the structure of the data to be extracted by the STT service. For ZevoTech Speech-to-Text, currently only birth_date is supported, so ensure the field name within this object is `birth_date`.	Yes

Parameter

Type

Description

Mandatory

task

String

Set this to "data_extraction" to instruct the STT service to extract specific data from the user's spoken response based on the structure defined in the extracted_data object.

Yes

question

String

Provide context for interpreting the user's response and guide the STT service on what information to extract. This can be the same question as in the prompt step.

For example, "question": "What is your birth date?"

Yes

extracted_data JSON Define the structure of the data to be extracted by the STT service. For ZevoTech Speech-to-Text, currently only birth_date is supported, so ensure the field name within this object is birth_date. Yes

Save the prompt step.

Example: Extract dates with ZevoTech STT in AudioCodes

Let's say you want to extract a birthdate.

Before configuring the date extraction, make sure that your entity (e.g., [[Address]]) has:

A string field named userInput (e.g., [[Address]].userInput) to store the full transcribed text.
A field named birth_date of type Date/DateTime (e.g., [[Address]].birth_date).

To extract data from user says, configure a prompt step as follows:

In Input mapping enter [[Address]].userInput.
Click on the Metadata section, tap on Advanced Editing and in the JSON field add the following "audioCodesSTTinstructions" object:

Copy

audioCodesSTTinstructions Example

  "audioCodesSTTinstructions": {
    "task": "data_extraction",
    "question": "What is your birth date?",
    "extracted_data": {
      "birth_date": ""
    }
  }

The structure of the audioCodesSTTinstructions object is similar to ZevoTech STT sttGenericData object.

Save the prompt step.

When the STT service processes the user's speech, the ZevoTech Live STT API will return a JSON string with the following structure:

Copy

ZevoTech STT API response

{
  "type": "recognition",
  "alternatives": [
    {
      "text": "două zeci și patru a cincia nouăzeci și șapte",
      "confidence": 0.8355
    },
    {
      "text": "24 a cincia 97",
      "confidence": 0.8355
    }
  ],
  "data_extraction": {
    "response": {
      "birth_date": "24-05-1997"
    },
    "status": "success"
  }
}

In this response:

The value of "text" (from alternatives) will be stored in your DRUID entity in [[Address]].userInput.
The value of "birth_date" (from data_extraction.response) will be stored in [[Address]].birth_date.

Handling Bot Delay

Handling bot delays is particularly useful when the bot executes an integration which might take longer to complete.

By setting timeouts, you can configure the following actions to address situations when the bot takes time to respond to a message sent to it:

Play a textual prompt to the user
Play an audio file to the user
Disconnect the call
Resume speech recognition (so the call will not remain hanging).

To handle not delay, add a backchannel step called SetVoiceSessionParams. In Input mapping provide the entity which stores the timeouts and actions you want to address.

Make sure that the entity you provide in Input mapping contains fields named exactly as the parameters expected by VoiceAI Connect. For the complete list of parameters, see VoiceAI Connect documentation.

In the SetVariables section of the backchannel step, configure the timeouts and define the actions based on your needs.

Handle Conversation Disconnect

A call disconnects if the bot does not respond within the Reply timeout in seconds threshold set on the channel or if the user says nothing for 120 seconds.

You can configure what happens on conversation disconnect. Go to the bot details, click the Dialogue management section header and from the Voice call terminate flow field, select the flow to be triggered on disconnect. If no such flow is set, the call disconnects.

When the call disconnects, the following data is logged in the conversation context:

[[ChatUser]].VoiceConversationTerminatedReason - The reason for which the call is disconnected. E.g., “Client Side”. The reason of the disconnect can be one of the following:

SocketInterrupted – The connection was interrupted.
UserBecameSilent – The user said nothing for 120 seconds.
CallTerminatedByCaller – The user terminated the call.

[[ChatUser]].VoiceConversationTerminatedReasonCode – The code (text) associated to the disconnect reason. E.g., “client-disconnected”.

Transferring the call

There are cases when the bot cannot handle the call by itself, so it needs to escalate the call to a call center live agent. By default, once VoiceAI Connect performs the transfer, it immediately disconnects the call with the bot, regardless of whether the transfer succeeded or not.

For the bot to escalate the call to contact center live agent, add a backchannel step named transfer. You need to configure the backchannel step so that the bot provides the transferTarget that is the URI to where the call should be transferred call to. Typically, the URI is a "tel" or "sip" URI. You can configure the backchannel step so that upon transfer the bot provides additional SIP headers.

In Input mapping on the transfer backchannel flow step, provide the entity which stores the desired values. In the figure below, we created an example entity, [[VoiceParams]], which stores the values of the parameters associated with the transfer event.

Make sure that the entity you provide in Input mapping contains fields named exactly as the parameters expected by VoiceAI Connect for the transfer event. For the complete list of transfer event parameters, see VoiceAI Connect documentation.

Note: DRUID system entity fields storing the name of a SIP header (Name) will be automatically converted to lowercase (name) when sent over to VoiceAI Connect.

In the SetVariables section of the respective transfer backchannel step, set the transferTarget and the desired additional transfer parameters and SIP headers (if you want to send any).

Disconnecting the call

At any stage of the conversation, the bot can disconnect the conversation. For the bot to disconnect the call, add a backchannel step named hangup.

You can configure the backchannel step so that upon disconnect the bot provides a textual reason that will be passed to the peer on the SIP Reason header and will appear in the CDR of the call. In addition, you can also configure the backchannel step to add SIP headers and their values, which will be included in the SIP BYE message.

To add the disconnect reason and additional SIP headers, on the hangup backchannel step, in Input mapping provide the entity which stores the desired values.

Make sure that the entity you provide in Input mapping contains fields named exactly as the parameters expected by VoiceAI Connect. For the complete list of hangup event parameters, see VoiceAI Connect documentation.

Note: DRUID system entity fields storing the name of a SIP header (Name) will be automatically converted to lowercase (name) when sent over to VoiceAI Connect.

In the SetVariables section of the hangup backchannel step, set the disconnect reason and/or additional SIP headers.

Conversation History

All voice conversations begin with “[Voice start event]”. This is particularly useful for debugging purposes to measure the time from the moment when the call was initiated (bot picks up the call) until the bot says the first message.

For this channel, DRUID also logs in the Conversation History the Speech-to-Text Confidence Score, that is, the value representing the confidence level of the recognition received from the speech-to-text provider.

When the call disconnects due to an error, the disconnect reason logged in the Conversation History is “PlatformIntegrationError” (the message status is Platform Integration error).

Store call initiation metadata in [[QueryParams]] for future usage

By default, VoiceAI Connect sends an initial event to the bot when the call is initiated together with specific SIP headers. By default, [[ChatUser]].Phone stores the user’s phone number and [[ChatUser]].CalleePhoneNumber stores the bot’s phone number.

Storing incoming custom SIP headers (metadata sent by the contact center solution together with initiation call event) can be particularly useful for by companies when running outbound dialing campaigns, whereby the dialer automatically initiates calls with potential customers.

You can store additional incoming custom SIP headers in dedicated fields in the [[QueryParams]] system entity. For that, you need to create dedicated field that have the same name as the incoming SIP header key.

For more information about sensing SIP headers to the bot, see VoiceAI Connect documentation.

Storing AudioCodes Helpdesk Conversations in Conversation History with Agent Assist

To log conversations between AudioCodes helpdesk agents and users in the Conversation History, you must set up Agent Assist. Doing so provides valuable insights into user-agent interactions, helping you refine bot responses, improve escalation flows, and enhance overall bot performance.

Note: This feature is available in DRUID 8.9 and higher.

To set up Agent Assist follow these steps:

In the DRUID Portal go to your bot settings. Click the Channels tab, then click Voice, AudioCodes – VoiceAi Connect. The channel info section expands.
Select Allow assist bot and copy the Assist bot Druid url and the Token as you will need them in the subsequent steps.
Log into AudioCodes LiveHub and select Bots from the left menu.
On the Bots page, click the Add new assist bot button.

The Connect your bot wizard appears.

Select Druid as bot framework, then click Next.

Enter the bot details. In the Bot URL field, paste the Assist bot Druid url you copied from DRUID. In the Token field, paste the token you copied from DRUID.

Click the Validate bot configuration button. If the validation fails, check that you entered the correct Bot URL and token. If the validation passes, click Next.
Set the assist bot settings and click Create.

Click Routing in the left menu. On the Routing Rules tab, search for the main bot you created for the DRUID bot integration and click the Edit button. Do not search for the assist bot you created.

The Edit routing rule page appears.

Tap on Assist bot and select the assist bot you created previously from the drop-down, then click Update.

Agent Assist is now successfully set up, enabling the logging of AudioCodes agent-user interactions in the Conversation History.

Recommending Responses to Helpdesk Agents with Agent Assist

Agent Assist analyzes real-time voice interactions and suggests AI-powered responses to help AudioCodes helpdesk agents provide faster and more accurate support. It can use large language models (LLMs) to interpret client messages and recommend the most relevant replies from the Knowledge Base.

Note: This feature is available in DRUID 8.13 and later and requires Agent Assist setup to store AudioCodes helpdesk conversations in Conversation History.

How It Works

When a client-bot conversation is transferred to a helpdesk agent, Agent Assist activates automatically—if the Agent Assist special flow is configured. It processes each message in real time as follows:

Captures the full conversation transcript by updating [[ConversationInfo]].AssistConversationItem[i], storing:

The message in [[ConversationInfo]].AssistConversationItem.Message
The message originator in [[ConversationInfo]].AssistConversationItem.Originator (user for client messages, agent for helpdesk agent messages).

Searches the Knowledge Base using each client message to find relevant content.
Triggers the Agent Assist special flow to generate a response suggestion based on the client’s message.
Sends the suggested reply to the helpdesk agent.
Prevents overlapping suggestions by ensuring only the latest client message triggers a response—after the previous Agent Assist flow has completed.

Set up Agent Assist for response recommendations

Open Solution Library, search for solution Agent Assist + Live Chat, and import it.
Go to Bot Details > General details > Dialogue Management.
Select Use Knowledge Base.
From the Knowledge Base response flow field, select Agent assist flow.

Go to Apps and configure the connection strings for the GPT-azure.com app.

Go to Flows, search for 'Agent assist flow', and click on the Add Suggestion step. The solution comes by default with Druid Data Service integration, saving the suggested responses to the Agent Assist workspace that comes with the solution.

You can remove this integration and configure an integration with the third-party tool your helpdesk agents use to handle client calls.

After setup, Agent Assist will automatically suggest responses, helping agents provide faster and more accurate support.

Hint: Agent Assist supports machine translation. Enable and configure the desired machine translation services on the bot. For more information, see Real-time Machine Translation in Live Chat.

Live Voice Translation

Live Voice Translation, powered by AudioCodes Live Hub, enables real-time, bidirectional voice translation for your DRUID bot. This feature allows your bot to communicate seamlessly with users in multiple languages by automatically translating spoken conversations.

To activate and configure live voice translation, follow these steps:

Log into AudioCodes LiveHub.
In the main menu, click on Hub+ menu and select Voice translation.

Click the Add new voice translation button to configure automatic voice translation.

Enter the bot name and from the Region drop-down, select the region associated with the bot.
Select Dynamic activation. For more information on dynamic translation, see the AudioCodes Live Hub documentation.

Click Next.
Add desired Languages and the activation triggers for each language pair. This defines which languages will be translated and under what conditions the translation will be activated.

Click Next.
Adjust the original message volume and set the initiation message. This allows you to control the volume of the untranslated audio and set a message to be played when translation begins.

Click Create. Now you need to enable voice translation for an inbound routing rule.
In the main menu, click Routing.
Search for the desired routing rule and click Edit.

Tap on Voice translation and select the newly created voice translation. This links your voice translation configuration to this specific inbound call route.

Click Update.
In DRUID, make sure you have configured the same voice codes on your Voice AudioCodes channel as you have set them within AudioCodes Live Hub (at step 7). This ensures proper translation between DRUID and AudioCodes.