Inference API
The Inference API allows you to interact with AI models through three main endpoints: chat, completion, and prompt-based interactions. This unified interface simplifies integration with multiple AI providers while maintaining consistent security and monitoring.
Required Headers
All API endpoints require these headers:
- Name
x-connection-id
- Type
- string
- Description
Your connection identifier. You can find this in your dashboard under Connections section.
- Name
x-api-key
- Type
- string
- Description
Your API key for authentication. You can generate this from your dashboard under API Keys section.
- Name
traceparent
- Type
- string
- Description
OpenTelemetry trace parent for distributed tracing
- Name
tracestate
- Type
- string
- Description
OpenTelemetry trace state information
Chat Endpoint
Send a chat request to the AI model. Use this for conversational interactions where context is important.
Request Body
- Name
modelId
- Type
- string
- Description
The ID of the AI model to use for inference
- Name
messages
- Type
- array
- Description
Array of messages in the conversation. Each message has:
- role: "user" | "assistant" | "system"
- content: string
- Name
parameters
- Type
- object
- Description
- Name
endUserId
- Type
- string
- Description
Unique identifier for the end user of this chat session
- Name
temperature
- Type
- number
- Description
Sampling temperature
- Name
maxTokens
- Type
- integer
- Description
Maximum number of tokens to generate
- Name
stream
- Type
- boolean
- Description
Whether to stream the response
- Name
jsonSchema
- Type
- object
- Description
JSON schema for structured output
- Name
sessionId
- Type
- string
- Description
Unique identifier for this chat session
Request
curl -X POST https://api.usageguard.com/v1/inference/chat \
-H "x-connection-id: {connection_id}" \
-H "x-api-key: {api_key}" \
-H "traceparent: {traceparent}" \
-H "tracestate: {tracestate}" \
-H "Content-Type: application/json" \
-d '{
"modelId": "gpt-4",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"parameters": {
"endUserId": "user_123",
"temperature": 0.7,
"maxTokens": 100
},
"sessionId": "chat_abc123"
}'
Response
{
"id": "resp_xyz789",
"model": "gpt-4",
"choices": [
{
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
}
}
]
}
400: Bad Request
{
"error": "Bad Request",
"message": "Invalid request. Check your model ID, message format, and parameters."
}
401: Unauthorized
{
"error": "Unauthorized",
"message": "Invalid API key or connection ID."
}
Completion Endpoint
Get a text completion from the AI model. Use this for generating content or completing partial text.
Request Body
- Name
modelId
- Type
- string
- Description
The ID of the AI model to use for inference
- Name
prompt
- Type
- string
- Description
The text prompt to complete
- Name
parameters
- Type
- object
- Description
- Name
endUserId
- Type
- string
- Description
Unique identifier for the end user
- Name
temperature
- Type
- number
- Description
Sampling temperature
- Name
maxTokens
- Type
- integer
- Description
Maximum number of tokens to generate
- Name
stream
- Type
- boolean
- Description
Whether to stream the response
- Name
jsonSchema
- Type
- object
- Description
JSON schema for structured output
Request
curl -X POST https://api.usageguard.com/v1/inference/completion \
-H "x-connection-id: {connection_id}" \
-H "x-api-key: {api_key}" \
-H "traceparent: {traceparent}" \
-H "tracestate: {tracestate}" \
-H "Content-Type: application/json" \
-d '{
"modelId": "gpt-4",
"prompt": "The capital of France is",
"parameters": {
"endUserId": "user_123",
"temperature": 0.7,
"maxTokens": 50
}
}'
Response
{
"id": "resp_xyz789",
"model": "gpt-4",
"choices": [
{
"text": " Paris, a city known for its rich history, culture, and iconic landmarks like the Eiffel Tower."
}
]
}
400: Bad Request
{
"error": "Bad Request",
"message": "Invalid request. Verify your model ID, prompt format, and parameters."
}
Prompt Endpoint
Execute a predefined prompt template. Use this for consistent, templated interactions with the AI model.
Request Body
- Name
modelId
- Type
- string
- Description
The ID of the AI model to use
- Name
promptId
- Type
- string
- Description
ID of the prompt template to use
- Name
variables
- Type
- object
- Description
Variables to fill in the prompt template
- Name
parameters
- Type
- object
- Description
- Name
endUserId
- Type
- string
- Description
Unique identifier for the end user
- Name
temperature
- Type
- number
- Description
Sampling temperature
- Name
maxTokens
- Type
- integer
- Description
Maximum number of tokens to generate
- Name
stream
- Type
- boolean
- Description
Whether to stream the response
Request
curl -X POST https://api.usageguard.com/v1/inference/prompt \
-H "x-connection-id: {connection_id}" \
-H "x-api-key: {api_key}" \
-H "traceparent: {traceparent}" \
-H "tracestate: {tracestate}" \
-H "Content-Type: application/json" \
-d '{
"modelId": "gpt-4",
"promptId": "translate_text",
"variables": {
"text": "Hello world",
"target_language": "French"
},
"parameters": {
"endUserId": "user_123",
"temperature": 0.7,
"maxTokens": 50
}
}'
Response
{
"id": "resp_xyz789",
"model": "gpt-4",
"result": "Bonjour le monde"
}
400: Bad Request
{
"error": "Bad Request",
"message": "Invalid request. Common issues include: invalid prompt ID, missing required variables, or invalid parameters."
}
403: Forbidden
{
"error": "Forbidden",
"message": "Insufficient permissions to use this prompt template."
}