Proxy vs Direct API Calls

When integrating Language Models (LLMs) into your application, you have two main approaches: making direct API calls to the LLM provider or using a proxy service like UsageGuard. This page explores the benefits and considerations of each approach, helping you make an informed decision for your project.

The Power of Proxy: UsageGuard's Advantages

Using UsageGuard as a proxy for your LLM API calls offers several significant benefits:

Unified API: Access multiple LLM providers through a single, consistent interface.
Enhanced Security: Built-in safeguards and content moderation protect your application and users.
Compliance Support: Comprehensive logging and PII management aid in maintaining regulatory compliance.
Cost Control: Set and enforce usage limits to manage expenses associated with LLM API usage.
Flexibility: Easily switch between providers or models without changing your application code.
Advanced Features: Leverage capabilities like request transformation, response processing, and more.

UsageGuard acts as an intelligent middleware, adding value to your API calls without requiring significant changes to your existing code.

Performance Considerations

While the benefits of using a proxy are substantial, it's important to consider the potential impact on latency. Our benchmarks show:

Benchmark

First Request (ms):   1,643
Requests:             18,694
Bad responses:        0
Mean latency (us):    80,288
Max latency (us):     2,516,643
Requests/sec:         1,262
Requests/sec (max):   5,649

As you can see, the average latency introduced by UsageGuard is minimal, typically ranging from 50-100ms. For most applications, this slight increase in latency is negligible compared to the added value and features provided by UsageGuard.

Note: The first request may have higher latency due to connection establishment and potential cold starts. Subsequent requests are significantly faster.

Performance Impact of Advanced Features

It's worth noting that certain advanced features can have a more noticeable impact on performance:

Request Logging: Storing full request and response bodies can introduce additional latency.
Complex Policy Evaluations: Extensive content moderation or PII detection on large inputs may increase processing time.
Request Buffering: Some features may require buffering the entire request before forwarding.

Warning: Features that cause requests to buffer (e.g., request logging) should be used judiciously. Consider enabling them only for specific use cases or during debugging phases.

When to Use Direct API Calls

While UsageGuard offers significant advantages, there might be scenarios where direct API calls are preferable:

Extremely Low-Latency Requirements: If your application demands the absolute minimum latency, direct calls might be necessary.
Simple Use Cases: For very basic integrations without need for advanced features, direct calls might suffice.
Provider-Specific Features: If you need to use provider-specific features not supported by UsageGuard, direct calls may be required.

Making the Right Choice

When deciding between proxy and direct API calls, consider the following:

Feature Requirements: Do you need the advanced features offered by UsageGuard?
Latency Tolerance: Is your application sensitive to small increases in latency?
Scalability Needs: Will you be integrating multiple LLM providers or switching between models?
Compliance Requirements: Do you need robust logging and PII management capabilities?
Development Resources: Would a unified API save significant development time and resources?

Conclusion

For most applications, the benefits of using UsageGuard as a proxy far outweigh the minimal latency increase. The added security, flexibility, and advanced features provide substantial value that can significantly enhance your LLM integration.

Ready to get started with UsageGuard? Check out our Quickstart Guide to begin leveraging the power of proxy API calls for your LLM integration.