Inference API

The Inference API allows you to interact with AI models through two main endpoints: /completion and /chat This unified interface simplifies integration with multiple AI providers while maintaining consistent security and monitoring.

POST/v1/inference/completion

Completion Endpoint

Get a text completion from the AI model. Use this for generating content or completing partial text.

Request Headers

Name
x-connection-id
Type
string
Description
Your connection identifier. You can find this in your dashboard under Connections section.
Name
x-api-key
Type
string
Description
Your API key for authentication. You can generate this from your dashboard under API Keys section.
Name
traceparent
Type
string
Description
OpenTelemetry trace parent for distributed tracing
Name
tracestate
Type
string
Description
OpenTelemetry trace state information

Request Body

Name
type
Type
string
Description
Always set to "Inference"
Name
version
Type
string
Description
API version (currently "2.0")
Name
model
Type
string
Description
The ID of the AI model to use for inference
Name
messages
Type
array
Description
Array of messages in the conversation. Each message has:
- role: "user" | "assistant" | "system"
- content: array of content objects with type and text
Name
parameters
Type
object
Description
Name
end_user_id
Type
string
Description
Unique identifier for the end user
Name
temperature
Type
number
Description
Sampling temperature (default: 0.7)
Name
max_tokens
Type
integer
Description
Maximum number of tokens to generate
Name
top_p
Type
number
Description
Nucleus sampling parameter (default: 0.9)
Name
frequency_penalty
Type
number
Description
Frequency penalty (default: 0.0)
Name
presence_penalty
Type
number
Description
Presence penalty (default: 0.0)
Name
stream
Type
boolean
Description
Whether to stream the response (default: false)
Name
include_usage
Type
boolean
Description
Whether to include usage information (default: true)
Name
json_schema
Type
object
Description
JSON schema for structured output
Name
stop
Type
string
Description
Stop sequence for generation
Name
session_id
Type
string
Description
Unique identifier for this chat session
Name
tools
Type
array
Description
Array of tools available to the model
Name
auto_continuation
Type
boolean
Description
Whether to enable auto continuation (default: false)
This flag enables UsageGuard to append continuation call to the model with tool result.
Name
agent_parameters
Type
object
Description
Additional parameters for agent behavior

Request

POST

/v1/inference/completion

curl -X POST https://api.usageguard.com/v1/inference/completion \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Inference",
    "version": "2.0",
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "The capital of France is"
          }
        ]
      }
    ],
    "parameters": {
      "end_user_id": "user_123",
      "temperature": 0.7,
      "max_tokens": 50,
      "top_p": 0.9,
      "frequency_penalty": 0.0,
      "presence_penalty": 0.0,
      "stream": false,
      "include_usage": true
    },
    "session_id": "comp_abc123",
    "auto_continuation": false
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": " Paris, a city known for its rich history, culture, and iconic landmarks like the Eiffel Tower."
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 15,
    "total_tokens": 20
  }
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Verify your model ID, prompt format, and parameters."
}

POST/v1/inference/chat

Chat Endpoint

Send a chat request to the AI model. Use this for conversational interactions where context is important.

Important: The chat endpoint requires both end_user_id in the parameters and session_id to function properly. Requests without these fields will be rejected.

Request Headers

Name
x-connection-id
Type
string
Description
Your connection identifier. You can find this in your dashboard under Connections section.
Name
x-api-key
Type
string
Description
Your API key for authentication. You can generate this from your dashboard under API Keys section.
Name
traceparent
Type
string
Description
OpenTelemetry trace parent for distributed tracing
Name
tracestate
Type
string
Description
OpenTelemetry trace state information

Request Body

Name
type
Type
string
Description
Always set to "Inference"
Name
version
Type
string
Description
API version (currently "2.0")
Name
model
Type
string
Description
The ID of the AI model to use for inference
Name
messages
Type
array
Description
Array of messages in the conversation. Each message has:
- role: "user" | "assistant" | "system"
- content: array of content objects with type and text
Name
parameters
Type
object
Description
Name
end_user_id
Type
string
Description
Unique identifier for the end user
Name
temperature
Type
number
Description
Sampling temperature (default: 0.7)
Name
max_tokens
Type
integer
Description
Maximum number of tokens to generate
Name
top_p
Type
number
Description
Nucleus sampling parameter (default: 0.9)
Name
frequency_penalty
Type
number
Description
Frequency penalty (default: 0.0)
Name
presence_penalty
Type
number
Description
Presence penalty (default: 0.0)
Name
stream
Type
boolean
Description
Whether to stream the response (default: false)
Name
include_usage
Type
boolean
Description
Whether to include usage information (default: true)
Name
json_schema
Type
object
Description
JSON schema for structured output
Name
stop
Type
string
Description
Stop sequence for generation
Name
session_id
Type
string
Description
Unique identifier for this chat session
Name
tools
Type
array
Description
Array of tools available to the model. Each tool has:
Name
name
Type
string
Description
The name of the tool
Name
description
Type
string
Description
A description of what the tool does
Name
input_schema
Type
object
Description
JSON schema defining the input parameters for the tool
Name
auto_continuation
Type
boolean
Description
Whether to enable auto continuation (default: false)
This flag enables UsageGuard to append continuation call to the model with tool result.
Name
agent_parameters
Type
object
Description
Additional parameters for agent behavior

Request

POST

/v1/inference/chat

curl -X POST https://api.usageguard.com/v1/inference/chat \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Inference",
    "version": "2.0",
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are a helpful assistant."
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is the capital of France?"
          }
        ]
      }
    ],
    "parameters": {
      "end_user_id": "user_123",
      "temperature": 0.7,
      "max_tokens": 100,
      "top_p": 0.9,
      "frequency_penalty": 0.0,
      "presence_penalty": 0.0,
      "stream": false,
      "include_usage": true
    },
    "session_id": "chat_abc123",
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ],
    "auto_continuation": false
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "The capital of France is Paris."
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Check your model ID, message format, and parameters."
}

401: Unauthorized

{
  "error": "Unauthorized",
  "message": "Invalid API key or connection ID."
}

Tool Requests and Responses

When working with tools in the chat or completion endpoints, the conversation follows a specific pattern for tool calls and responses. Here's how to handle tool interactions:

Tool Call Pattern

The interaction typically follows these steps:

Assistant Tool Call: The model requests to use a tool
Tool Result: The system executes the tool and returns results
Assistant Continuation: The model continues the conversation using the tool results

Here's an example of the complete flow:

// Step 1: Assistant makes a tool call
{
  "role": "assistant",
  "content": "I'll help you find some great restaurants!",
  "tool_calls": [
    {
      "id": "tool_call_001",
      "type": "function",
      "function": {
        "name": "sys_search_documents",
        "arguments": "{\"search_phrases\": [\"Italian restaurants\"], \"limit\": 3}"
      }
    }
  ]
}

// Step 2: System returns tool result
{
  "role": "tool",
  "tool_call_id": "tool_call_001",
  "content": "{ \"SearchResults\": [ { \"restaurant_name\": \"Pasta Paradise\", \"cuisine\": \"Italian\", \"rating\": 4.5, \"address\": \"123 Main St\" } ] }"
}

// Step 3: Assistant continues with response
{
  "role": "assistant",
  "content": "I found a great Italian restaurant for you: 'Pasta Paradise' with a 4.5 rating, located at 123 Main St."
}

Complete Request Example

Here's how the entire request would look when making a tool-enabled chat request:

{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Can you recommend some Italian restaurants?"
    },
    {
      "role": "assistant",
      "content": "I'll help you find some great restaurants!",
      "tool_calls": [
        {
          "id": "tool_call_001",
          "type": "function",
          "function": {
            "name": "sys_search_documents",
            "arguments": "{\"search_phrases\": [\"Italian restaurants\"], \"limit\": 3}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "tool_call_001",
      "content": "{ \"SearchResults\": [ { \"restaurant_name\": \"Pasta Paradise\", \"cuisine\": \"Italian\", \"rating\": 4.5, \"address\": \"123 Main St\" } ] }"
    }
  ],
  "parameters": {
    "end_user_id": "user_123",
    "temperature": 0.7,
    "max_tokens": 100,
    "top_p": 0.9,
    "stream": false,
    "include_usage": true
  },
  "session_id": "chat_abc123",
  "tools": [
    {
      "name": "sys_search_documents",
      "description": "Search for restaurants and dining options in the system",
      "input_schema": {
        "type": "object",
        "properties": {
          "search_phrases": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "Array of search phrases"
          },
          "limit": {
            "type": "integer",
            "description": "Maximum number of results to return"
          }
        },
        "required": ["search_phrases"]
      }
    }
  ],
  "auto_continuation": false
}

Important Notes

Each tool call must have a unique tool_call_id
The tool response must reference the same tool_call_id as the original call
The auto_continuation parameter in your request can be set to true to automatically handle the continuation step if you are using built-in tools (e.g. sys_search_documents) or running our expirmental external tools support.
Tool results should be valid JSON strings that can be parsed by the model

Tip: When using tools, make sure to include the tools array in your request body with proper schema definitions for each tool.