Inference API
The Inference API allows you to interact with AI models through two main endpoints: /completion and /chat This unified interface simplifies integration with multiple AI providers while maintaining consistent security and monitoring.
Completion Endpoint
Get a text completion from the AI model. Use this for generating content or completing partial text.
Request Headers
- Name
x-connection-id
- Type
- string
- Description
Your connection identifier. You can find this in your dashboard under Connections section.
- Name
x-api-key
- Type
- string
- Description
Your API key for authentication. You can generate this from your dashboard under API Keys section.
- Name
traceparent
- Type
- string
- Description
OpenTelemetry trace parent for distributed tracing
- Name
tracestate
- Type
- string
- Description
OpenTelemetry trace state information
Request Body
- Name
type
- Type
- string
- Description
Always set to "Inference"
- Name
version
- Type
- string
- Description
API version (currently "2.0")
- Name
model
- Type
- string
- Description
The ID of the AI model to use for inference
- Name
messages
- Type
- array
- Description
Array of messages in the conversation. Each message has:
- role: "user" | "assistant" | "system"
- content: array of content objects with type and text
- Name
parameters
- Type
- object
- Description
- Name
end_user_id
- Type
- string
- Description
Unique identifier for the end user
- Name
temperature
- Type
- number
- Description
Sampling temperature (default: 0.7)
- Name
max_tokens
- Type
- integer
- Description
Maximum number of tokens to generate
- Name
top_p
- Type
- number
- Description
Nucleus sampling parameter (default: 0.9)
- Name
frequency_penalty
- Type
- number
- Description
Frequency penalty (default: 0.0)
- Name
presence_penalty
- Type
- number
- Description
Presence penalty (default: 0.0)
- Name
stream
- Type
- boolean
- Description
Whether to stream the response (default: false)
- Name
include_usage
- Type
- boolean
- Description
Whether to include usage information (default: true)
- Name
json_schema
- Type
- object
- Description
JSON schema for structured output
- Name
stop
- Type
- string
- Description
Stop sequence for generation
- Name
session_id
- Type
- string
- Description
Unique identifier for this chat session
- Name
tools
- Type
- array
- Description
Array of tools available to the model
- Name
auto_continuation
- Type
- boolean
- Description
Whether to enable auto continuation (default: false)
This flag enables UsageGuard to append continuation call to the model with tool result.
- Name
agent_parameters
- Type
- object
- Description
Additional parameters for agent behavior
Request
curl -X POST https://api.usageguard.com/v1/inference/completion \
-H "x-connection-id: {connection_id}" \
-H "x-api-key: {api_key}" \
-H "Content-Type: application/json" \
-d '{
"type": "Inference",
"version": "2.0",
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "The capital of France is"
}
]
}
],
"parameters": {
"end_user_id": "user_123",
"temperature": 0.7,
"max_tokens": 50,
"top_p": 0.9,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"stream": false,
"include_usage": true
},
"session_id": "comp_abc123",
"auto_continuation": false
}'
Response
{
"id": "resp_xyz789",
"model": "gpt-4",
"choices": [
{
"message": {
"role": "assistant",
"content": [
{
"type": "text",
"text": " Paris, a city known for its rich history, culture, and iconic landmarks like the Eiffel Tower."
}
]
}
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 15,
"total_tokens": 20
}
}
400: Bad Request
{
"error": "Bad Request",
"message": "Invalid request. Verify your model ID, prompt format, and parameters."
}
Chat Endpoint
Send a chat request to the AI model. Use this for conversational interactions where context is important.
Important: The chat endpoint requires both end_user_id
in the parameters and session_id
to function properly. Requests without these fields will be rejected.
Request Headers
- Name
x-connection-id
- Type
- string
- Description
Your connection identifier. You can find this in your dashboard under Connections section.
- Name
x-api-key
- Type
- string
- Description
Your API key for authentication. You can generate this from your dashboard under API Keys section.
- Name
traceparent
- Type
- string
- Description
OpenTelemetry trace parent for distributed tracing
- Name
tracestate
- Type
- string
- Description
OpenTelemetry trace state information
Request Body
- Name
type
- Type
- string
- Description
Always set to "Inference"
- Name
version
- Type
- string
- Description
API version (currently "2.0")
- Name
model
- Type
- string
- Description
The ID of the AI model to use for inference
- Name
messages
- Type
- array
- Description
Array of messages in the conversation. Each message has:
- role: "user" | "assistant" | "system"
- content: array of content objects with type and text
- Name
parameters
- Type
- object
- Description
- Name
end_user_id
- Type
- string
- Description
Unique identifier for the end user
- Name
temperature
- Type
- number
- Description
Sampling temperature (default: 0.7)
- Name
max_tokens
- Type
- integer
- Description
Maximum number of tokens to generate
- Name
top_p
- Type
- number
- Description
Nucleus sampling parameter (default: 0.9)
- Name
frequency_penalty
- Type
- number
- Description
Frequency penalty (default: 0.0)
- Name
presence_penalty
- Type
- number
- Description
Presence penalty (default: 0.0)
- Name
stream
- Type
- boolean
- Description
Whether to stream the response (default: false)
- Name
include_usage
- Type
- boolean
- Description
Whether to include usage information (default: true)
- Name
json_schema
- Type
- object
- Description
JSON schema for structured output
- Name
stop
- Type
- string
- Description
Stop sequence for generation
- Name
session_id
- Type
- string
- Description
Unique identifier for this chat session
- Name
tools
- Type
- array
- Description
Array of tools available to the model. Each tool has:
- Name
name
- Type
- string
- Description
The name of the tool
- Name
description
- Type
- string
- Description
A description of what the tool does
- Name
input_schema
- Type
- object
- Description
JSON schema defining the input parameters for the tool
- Name
auto_continuation
- Type
- boolean
- Description
Whether to enable auto continuation (default: false)
This flag enables UsageGuard to append continuation call to the model with tool result.
- Name
agent_parameters
- Type
- object
- Description
Additional parameters for agent behavior
Request
curl -X POST https://api.usageguard.com/v1/inference/chat \
-H "x-connection-id: {connection_id}" \
-H "x-api-key: {api_key}" \
-H "Content-Type: application/json" \
-d '{
"type": "Inference",
"version": "2.0",
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a helpful assistant."
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is the capital of France?"
}
]
}
],
"parameters": {
"end_user_id": "user_123",
"temperature": 0.7,
"max_tokens": 100,
"top_p": 0.9,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"stream": false,
"include_usage": true
},
"session_id": "chat_abc123",
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
],
"auto_continuation": false
}'
Response
{
"id": "resp_xyz789",
"model": "gpt-4",
"choices": [
{
"message": {
"role": "assistant",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
]
}
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}
400: Bad Request
{
"error": "Bad Request",
"message": "Invalid request. Check your model ID, message format, and parameters."
}
401: Unauthorized
{
"error": "Unauthorized",
"message": "Invalid API key or connection ID."
}