Use LLM Response Streaming for Voice Interactions

LLM response streaming lets your agent deliver model output token by token, giving users faster, more natural responses. This release focuses on voice interactions, where early partial replies help reduce pauses and improve the overall experience. While streaming also works in webchat, the current implementation is optimized for voice, with webchat-specific improvements planned for future versions.

NOTE: This feature is available in technology preview starting with Druid version 9.11

Prerequisite

  • Before you start, make sure you have an LLM resource set up in the Druid Portal. For more information, see Create LLM Resources.

Add LLM Streaming to a Run Agent Step

Follow these steps to enable LLM response streaming in your Agent flow:

  1. Open your flow and click the Run Agent step.
  2. Scroll to the Post Actions section.
  3. Remove the existing LLM integration (if any configured).
  4. Click in the integrations field and select LlmStreamer.
  5. Click on the internal action.
  6. In the configuration panel, select your Endpoint Type and Model Name.
  7. Save the step and publish your flow.

Once enabled, your voice interactions start returning partial responses as they are generated. This helps your agent react faster and makes conversations feel more fluid and natural.