Voice

Druid is working to deliver voice bots as we recognize a concrete demand and benefit for specific scenarios, enabling the users to talk to the bots in their preferred voice channels:

Telephony. Users calling a phone number and have the conversation with a bot. This can be valuable for customers calling in Call Center and have an automated conversation before being transferred to a human agent. Employees can also chose to call IRT Help Desk or HR and have an automated conversation. In Telecom, the voice assistant becomes more than a voice channel, it involves a phone number where users call and a telephone exchange to establish the phone calls.
Voice intranet page. Imagine editing a contract and use voice to tell the chatbot to perform specific actions or tasks.

As a first step, the voice channel is available within the Druid platform in technology preview, via Druid web snippet. You can prepare your chatbots as voice bots while practicing a conversation with voice inside the Druid Portal authoring or Druid web snippet hosted in Intranet or Public websites.

Druid delivers the speech-to-text and text-to-speech functionality by integrating Technology Partners. Druid integrates out of the box the Microsoft and Vonage speech technologies.

If you want to integrate your preferred speech provider, contact Druid Tech Support.

How the Voice Channel works with the WebChat snippet

Press the microphone button in the chat snippet and start talking with the bot.

Hint: If you don’t see the microphone icon, activate the Voice channel first. See details above.

Your voice is processed by the Speech To Text (STT) service, as you speak. You will see the transcript in the input area. When you complete the sentence, the text is sent to the bot.
The chatbot processes the text and responds with a text. You will see the text response in the chat snippet.
In addition, the chatbot also talks to you the response. The speak response is delivered by the Text To Speech (TTS) service.

Note: In Flows authoring you have a dedicated <Voice> setting for each step where you can customize a speak response especially for the voice channels, different from the text response.

Setting up the native Voice Channel (MS Cognitive Service)

Important! If you want to use the voice channel in PROD environments, get in touch with Druid Tech Support, they will provide you with the necessary keys.

In the Druid ChatBot Portal go to your bot settings. Click the Channels tab, and then click Voice, MS Cognitive Service. The Channel Info section expands.
Provide the channel details, the Key and Region provided by Druid Support Team in the voice channel activation email.

Hint: For demo purposes, you can request a test key to Druid Tech Support.

Click Save.
In the table below the voice channel details, click the plus icon ( ). A row is added to the table.
From the Language drop-down, select the bot language (default or additional) and from the Voice drop-down, select the human-like neural or standard voice the chatbot will use to respond to the user. You can add more languages and different voices per language as best suits your needs.
Click the Save icon displayed inline.

Click Save. The voice button appears in the chat snippet and users can click on it and speak with the chatbot.

How the Voice Channel works with SDL Real-time Machine Translation

If you use the SDL service for real-time translation and activate the Voice channel, the chatbot will playback the response in the user’s language.

When activating SDL machine translation, you can choose when the translation is performed: at conversation time or authoring time. For more information, see Building Multi-Language Bots Using SDL.

Voice Channel with Conversation Time Translation

The user speaks in language A.
Speech to text (STT) is performed in language A.
The text is translated into the bot’s default language by the Machine Translation service.
NLP is performed in the bot’s default language.
Response is generated in the bot default language.
The response is translated from default language into language A (the conversation language) by the Machine Translation service.
The chatbot responds with the text in language A.
The response text in language A is turned into audio by the Text-to-Speech (TTS) service, and talked to the user.

Note: In DRUID 5.7 and higher, whenever the Conversation Time Translation is enabled on the bot, the language selector on the webchat snippet is replaced with a World icon (non-selectable) because user can use other languages than the bot languages. The webchat snippet automatically adapts to the new language code; therefore, if the user sends a voice message in a language different than the bot languages, the voice message will be sent to the STT service in that language.

Voice Channel with Authoring Time Translation

When you’re using Authoring time translation, Druid calls the SDL machine translation service to translate the message written in the Voice message on flow steps from default bot language to all additional languages.

The user speaks in language A (language A is bot authoring language, default or additional).
Speech to text (STT) is performed in language A.
NLP is done in the language A.
The chatbot responds with the text in language A.
The response text in language A is turned into audio by the Text-to-Speech (TTS) service, and talked to the user.