Inference API

The Inference API allows you to interact with AI models through two main endpoints: /completion and /chat This unified interface simplifies integration with multiple AI providers while maintaining consistent security and monitoring.

POST/v1/inference/completion

Completion Endpoint

Get a text completion from the AI model. Use this for generating content or completing partial text.

Request Headers

  • Name
    x-connection-id
    Type
    string
    Description

    Your connection identifier. You can find this in your dashboard under Connections section.

  • Name
    x-api-key
    Type
    string
    Description

    Your API key for authentication. You can generate this from your dashboard under API Keys section.

  • Name
    traceparent
    Type
    string
    Description

    OpenTelemetry trace parent for distributed tracing

  • Name
    tracestate
    Type
    string
    Description

    OpenTelemetry trace state information

Request Body

  • Name
    type
    Type
    string
    Description

    Always set to "Inference"

  • Name
    version
    Type
    string
    Description

    API version (currently "2.0")

  • Name
    model
    Type
    string
    Description

    The ID of the AI model to use for inference

  • Name
    messages
    Type
    array
    Description

    Array of messages in the conversation. Each message has:

    • role: "user" | "assistant" | "system"
    • content: array of content objects with type and text
  • Name
    parameters
    Type
    object
    Description
    • Name
      end_user_id
      Type
      string
      Description

      Unique identifier for the end user

    • Name
      temperature
      Type
      number
      Description

      Sampling temperature (default: 0.7)

    • Name
      max_tokens
      Type
      integer
      Description

      Maximum number of tokens to generate

    • Name
      top_p
      Type
      number
      Description

      Nucleus sampling parameter (default: 0.9)

    • Name
      frequency_penalty
      Type
      number
      Description

      Frequency penalty (default: 0.0)

    • Name
      presence_penalty
      Type
      number
      Description

      Presence penalty (default: 0.0)

    • Name
      stream
      Type
      boolean
      Description

      Whether to stream the response (default: false)

    • Name
      include_usage
      Type
      boolean
      Description

      Whether to include usage information (default: true)

    • Name
      json_schema
      Type
      object
      Description

      JSON schema for structured output

    • Name
      stop
      Type
      string
      Description

      Stop sequence for generation

  • Name
    session_id
    Type
    string
    Description

    Unique identifier for this chat session

  • Name
    tools
    Type
    array
    Description

    Array of tools available to the model

  • Name
    auto_continuation
    Type
    boolean
    Description

    Whether to enable auto continuation (default: false)
    This flag enables UsageGuard to append continuation call to the model with tool result.

  • Name
    agent_parameters
    Type
    object
    Description

    Additional parameters for agent behavior

Request

POST
/v1/inference/completion
curl -X POST https://api.usageguard.com/v1/inference/completion \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Inference",
    "version": "2.0",
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "The capital of France is"
          }
        ]
      }
    ],
    "parameters": {
      "end_user_id": "user_123",
      "temperature": 0.7,
      "max_tokens": 50,
      "top_p": 0.9,
      "frequency_penalty": 0.0,
      "presence_penalty": 0.0,
      "stream": false,
      "include_usage": true
    },
    "session_id": "comp_abc123",
    "auto_continuation": false
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": " Paris, a city known for its rich history, culture, and iconic landmarks like the Eiffel Tower."
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 15,
    "total_tokens": 20
  }
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Verify your model ID, prompt format, and parameters."
}

POST/v1/inference/chat

Chat Endpoint

Send a chat request to the AI model. Use this for conversational interactions where context is important.

Request Headers

  • Name
    x-connection-id
    Type
    string
    Description

    Your connection identifier. You can find this in your dashboard under Connections section.

  • Name
    x-api-key
    Type
    string
    Description

    Your API key for authentication. You can generate this from your dashboard under API Keys section.

  • Name
    traceparent
    Type
    string
    Description

    OpenTelemetry trace parent for distributed tracing

  • Name
    tracestate
    Type
    string
    Description

    OpenTelemetry trace state information

Request Body

  • Name
    type
    Type
    string
    Description

    Always set to "Inference"

  • Name
    version
    Type
    string
    Description

    API version (currently "2.0")

  • Name
    model
    Type
    string
    Description

    The ID of the AI model to use for inference

  • Name
    messages
    Type
    array
    Description

    Array of messages in the conversation. Each message has:

    • role: "user" | "assistant" | "system"
    • content: array of content objects with type and text
  • Name
    parameters
    Type
    object
    Description
    • Name
      end_user_id
      Type
      string
      Description

      Unique identifier for the end user

    • Name
      temperature
      Type
      number
      Description

      Sampling temperature (default: 0.7)

    • Name
      max_tokens
      Type
      integer
      Description

      Maximum number of tokens to generate

    • Name
      top_p
      Type
      number
      Description

      Nucleus sampling parameter (default: 0.9)

    • Name
      frequency_penalty
      Type
      number
      Description

      Frequency penalty (default: 0.0)

    • Name
      presence_penalty
      Type
      number
      Description

      Presence penalty (default: 0.0)

    • Name
      stream
      Type
      boolean
      Description

      Whether to stream the response (default: false)

    • Name
      include_usage
      Type
      boolean
      Description

      Whether to include usage information (default: true)

    • Name
      json_schema
      Type
      object
      Description

      JSON schema for structured output

    • Name
      stop
      Type
      string
      Description

      Stop sequence for generation

  • Name
    session_id
    Type
    string
    Description

    Unique identifier for this chat session

  • Name
    tools
    Type
    array
    Description

    Array of tools available to the model. Each tool has:

    • Name
      name
      Type
      string
      Description

      The name of the tool

    • Name
      description
      Type
      string
      Description

      A description of what the tool does

    • Name
      input_schema
      Type
      object
      Description

      JSON schema defining the input parameters for the tool

  • Name
    auto_continuation
    Type
    boolean
    Description

    Whether to enable auto continuation (default: false)
    This flag enables UsageGuard to append continuation call to the model with tool result.

  • Name
    agent_parameters
    Type
    object
    Description

    Additional parameters for agent behavior

Request

POST
/v1/inference/chat
curl -X POST https://api.usageguard.com/v1/inference/chat \
  -H "x-connection-id: {connection_id}" \
  -H "x-api-key: {api_key}" \
  -H "Content-Type: application/json" \
  -d '{
    "type": "Inference",
    "version": "2.0",
    "model": "gpt-4",
    "messages": [
      {
        "role": "system",
        "content": [
          {
            "type": "text",
            "text": "You are a helpful assistant."
          }
        ]
      },
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What is the capital of France?"
          }
        ]
      }
    ],
    "parameters": {
      "end_user_id": "user_123",
      "temperature": 0.7,
      "max_tokens": 100,
      "top_p": 0.9,
      "frequency_penalty": 0.0,
      "presence_penalty": 0.0,
      "stream": false,
      "include_usage": true
    },
    "session_id": "chat_abc123",
    "tools": [
      {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "input_schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            }
          },
          "required": ["location"]
        }
      }
    ],
    "auto_continuation": false
  }'

Response

{
  "id": "resp_xyz789",
  "model": "gpt-4",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": [
          {
            "type": "text",
            "text": "The capital of France is Paris."
          }
        ]
      }
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

400: Bad Request

{
  "error": "Bad Request",
  "message": "Invalid request. Check your model ID, message format, and parameters."
}

401: Unauthorized

{
  "error": "Unauthorized",
  "message": "Invalid API key or connection ID."
}

Was this page helpful?