Inference API

The Inference API allows you to interact with AI models through three main endpoints: chat, completion, and prompt-based interactions. This unified interface simplifies integration with multiple AI providers while maintaining consistent security and monitoring.

Required Headers

All API endpoints require these headers:

  • Name
    x-connection-id
    Type
    string
    Description

    Your connection identifier. You can find this in your dashboard under Connections section.

  • Name
    x-api-key
    Type
    string
    Description

    Your API key for authentication. You can generate this from your dashboard under API Keys section.

  • Name
    traceparent
    Type
    string
    Description

    OpenTelemetry trace parent for distributed tracing

  • Name
    tracestate
    Type
    string
    Description

    OpenTelemetry trace state information

POST/v1/inference/chat

Chat Endpoint

Send a chat request to the AI model. Use this for conversational interactions where context is important.

Request Body

  • Name
    modelId
    Type
    string
    Description

    The ID of the AI model to use for inference

  • Name
    messages
    Type
    array
    Description

    Array of messages in the conversation. Each message has:

    • role: "user" | "assistant" | "system"
    • content: string
  • Name
    parameters
    Type
    object
    Description
  • Name
    endUserId
    Type
    string
    Description

    Unique identifier for the end user of this chat session

  • Name
    temperature
    Type
    number
    Description

    Sampling temperature

  • Name
    maxTokens
    Type
    integer
    Description

    Maximum number of tokens to generate

  • Name
    stream
    Type
    boolean
    Description

    Whether to stream the response

  • Name
    jsonSchema
    Type
    object
    Description

    JSON schema for structured output

  • Name
    sessionId
    Type
    string
    Description

    Unique identifier for this chat session

Request

POST
/v1/inference/chat
curl -X POST https://api.usageguard.com/v1/inference/chat \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "traceparent: {traceparent}" \
  -H "tracestate: {tracestate}" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "What is the capital of France?"
      }
    ],
    "parameters": {
      "endUserId": "user_123",
      "temperature": 0.7,
      "maxTokens": 100
    },
    "sessionId": "chat_abc123"
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      }
    }
  ]
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Check your model ID, message format, and parameters."
}

401: Unauthorized

{
  "error": "Unauthorized",
  "message": "Invalid API key or connection ID."
}
POST/v1/inference/completion

Completion Endpoint

Get a text completion from the AI model. Use this for generating content or completing partial text.

Request Body

  • Name
    modelId
    Type
    string
    Description

    The ID of the AI model to use for inference

  • Name
    prompt
    Type
    string
    Description

    The text prompt to complete

  • Name
    parameters
    Type
    object
    Description
  • Name
    endUserId
    Type
    string
    Description

    Unique identifier for the end user

  • Name
    temperature
    Type
    number
    Description

    Sampling temperature

  • Name
    maxTokens
    Type
    integer
    Description

    Maximum number of tokens to generate

  • Name
    stream
    Type
    boolean
    Description

    Whether to stream the response

  • Name
    jsonSchema
    Type
    object
    Description

    JSON schema for structured output

Request

POST
/v1/inference/completion
curl -X POST https://api.usageguard.com/v1/inference/completion \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "traceparent: {traceparent}" \
  -H "tracestate: {tracestate}" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "gpt-4",
    "prompt": "The capital of France is",
    "parameters": {
      "endUserId": "user_123",
      "temperature": 0.7,
      "maxTokens": 50
    }
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "choices": [
    {
      "text": " Paris, a city known for its rich history, culture, and iconic landmarks like the Eiffel Tower."
    }
  ]
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Verify your model ID, prompt format, and parameters."
}

POST/v1/inference/prompt

Prompt Endpoint

Execute a predefined prompt template. Use this for consistent, templated interactions with the AI model.

Request Body

  • Name
    modelId
    Type
    string
    Description

    The ID of the AI model to use

  • Name
    promptId
    Type
    string
    Description

    ID of the prompt template to use

  • Name
    variables
    Type
    object
    Description

    Variables to fill in the prompt template

  • Name
    parameters
    Type
    object
    Description
  • Name
    endUserId
    Type
    string
    Description

    Unique identifier for the end user

  • Name
    temperature
    Type
    number
    Description

    Sampling temperature

  • Name
    maxTokens
    Type
    integer
    Description

    Maximum number of tokens to generate

  • Name
    stream
    Type
    boolean
    Description

    Whether to stream the response

Request

POST
/v1/inference/prompt
curl -X POST https://api.usageguard.com/v1/inference/prompt \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "traceparent: {traceparent}" \
  -H "tracestate: {tracestate}" \
  -H "Content-Type: application/json" \
  -d '{
    "modelId": "gpt-4",
    "promptId": "translate_text",
    "variables": {
      "text": "Hello world",
      "target_language": "French"
    },
    "parameters": {
      "endUserId": "user_123",
      "temperature": 0.7,
      "maxTokens": 50
    }
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "result": "Bonjour le monde"
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Common issues include: invalid prompt ID, missing required variables, or invalid parameters."
}

403: Forbidden

{
  "error": "Forbidden",
  "message": "Insufficient permissions to use this prompt template."
}

Was this page helpful?