Generative Endpoints
Generative Endpoints define the credential nomenclature used in various settings. DRUID provides various types of generative endpoints: OpenAI, AzureOpenAI, DruidAzureOpenAI, Mistral, DruidMistral and DruidGoogle on request.
Adding a New Generative Endpoint
To add a generative endpoint:
- Click Add new.
- Enter the name of the endpoint to be displayed in the UI.
- From the Endpoint Type field, select Druid if you have a LLM subscription with DRUID or Custom if you're using your own generative endpoint.
- From the Type field, select the endpoint type you want to use.
- Configure the required parameters (see details below). By correctly configuring these parameters, you can optimize how DRUID integrates with generative AI models, ensuring efficient and cost-effective AI-powered interactions.
- Click Save.
Configuring Generative Endpoint Parameters
Caption
Customize the name of the endpoint displayed in the UI. This is useful for distinguishing between different endpoints when managing multiple connections.
API URL
Enter the URL of the generative API. This is the endpoint where DRUID sends requests to process and generate responses.
Example: For OpenAI, you might use https://api.openai.com/v1.
API Key
Provide the secret key from your generative provider (e.g., OpenAI or Azure). This key authenticates requests to the API.
Example: If you integrate Azure OpenAI, you must generate an API key from the Azure OpenAI Studio and enter it here.
Model Name
- For Druid generative endpoints, select a model token from the list of activated models on your tenant.
- For Custom generative endpoints, specify the generative model to use. Example: For OpenAI, common models include openai-4o-mini or gpt-4-turbo.
Different models offer varying capabilities, performance levels, and token limits.
Max Tokens per Session Limit
Defines the maximum number of tokens a model can use in a single extraction session (i.e., per data extraction action). This prevents excessive token usage and optimizes performance.
Default values:
- For bots created in DRUID 7.3 and later, the default is 400,000 tokens.
- For bots created in earlier versions, the parameter defaults to null.
Best practice: If your chatbot processes large data volumes, adjust this setting to ensure efficient token allocation and cost management.
Model Context Window Size
Defines the total number of tokens (input + output) the model can process in a single interaction. Each model has a fixed context window size and if the total token count exceeds this limit, the response is truncated.
Example: If the context window is 4096 tokens, and the input is 3000 tokens, the model can generate up to 1096 tokens before reaching the limit.
Max Tokens (per request)
Set the maximum number of tokens that can be used in a single request. This limits the amount of text input that the AI model can process at once.
Model-specific behavior: Some models have predefined maximum token limits. For instance, gpt-4 supports up to 8,000 tokens per request, while smaller models may have lower limits.
Disable SSL Validation
This parameter controls whether SSL certificate validation is enforced when connecting to the generative API endpoint.
- Enabled: The system bypasses SSL certificate validation, allowing connections even if the certificate is self-signed, expired, or untrusted.
- Disabled: The system enforces SSL validation, ensuring secure communication by verifying the server's certificate.
Use Case: Enable this option only if your API endpoint uses a self-signed certificate or if SSL verification issues prevent a successful connection. Keeping SSL validation enabled enhances security by preventing potential man-in-the-middle attacks.
Optimization tips
- Monitor token usage: Track logs to see if users hit limits often. Adjust if needed.
- Start conservatively: Use lower limits initially and increase based on real-world demand.
- Balance cost vs. performance: If costs are too high, lower Max Tokens per Session Limit or reduce Model Context Window Size.
- Fine-tune for efficiency: Shorten responses where possible and use summarization techniques to keep output concise.