Moderation & Compliance

In this guide, you will learn the different types of moderation and how to configure them in your app. Moderation involves monitoring and managing user-generated content to prevent harmful or inappropriate behavior.

By implementing effective moderation strategies, you can ensure that you are building responsible AI with your apps remains with a consistent level of safty, quality and integrity.

Moderation features applied at the connection level. You can configure them in your dashboard. This allows you to set different policies for each connection.

You can set moderation policies in the Connection -> Select a connection -> Edit Policies section of your dashboard.

Content Filtering

In this section, we will cover various content filtering techniques and policies to help you maintain a safe and respectful environment in your app. Content filtering involves using automated tools to detect and manage inappropriate or harmful content, here are some of the most common categories.

Prohibited Content

UsageGuard uses narrow models and LLMs moderation APIs to detect and filter out (based on your set policies) prohibited content in these categories.

Category	Description
Hate	Content that expresses, incites, or promotes hate based on protected characteristics. Includes threats of violence or harm towards protected groups.
Harassment	Content that expresses, incites, or promotes harassing language towards any target. Includes threats of violence or harm.
Self-Harm	Content that promotes, encourages, or depicts acts of self-harm. Includes personal intent and instructions for performing self-harm acts.
Violence	Content that depicts death, violence, or physical injury, including graphic details.

By default no moderation or filtering is applied. You can configure your connection to either Allow, Block, or Audit requests containing such content.

Note that models such as GPT-4o and Claude can still block content baseed on their own filtering. However, proactively filtering content based on your policies is recommended to ensure a safe, consistent across models and respectful environment for your users as well as compliance with 3rd party terms of service.

If you have a particular use case that requires access to an unrestricted or unsensored model, please get in touch with us at support@usageguard.com.

NSFW (Adult Content)

You can also use UsageGuard to identify and filter NSFW content based on your set policies in these categories.

Category	Description
Sexual	Content meant to arouse sexual excitement or promote sexual services. Includes content involving minors (under 18).
Pornography	Images, videos, or text that depict sexual acts or nudity.
Violence	Graphic depictions of physical harm, injury, or death.
Gore	Content that shows extreme violence, blood, or mutilation.
Profanity	Excessive use of offensive or vulgar language.

NSFW content is allowed by default. You can configure your connection to either Allow, Block or Audit requests containing such content.

While 3rd party models can allow requests that contains NSFW content, it doesn't gurantee to provide a response, It is recommended to proactively filter NSFW content to avoid incurring costs unless you are working with models that would provide answers to your requests.

PII Detection

This policy help you identify and manage PII in your app, ensuring that sensitive information is handled appropriately and securely. Proper handling of PII is crucial for maintaining user privacy and complying with data protection regulations.

PII	Description
Email	Email address.
Phone Number	Telephone number, multiple international formats.
Address	Physical address, including street, city, state, and ZIP code.
SSN	Social Security Number, a unique identifier for individuals in the U.S.
Credit Card	Credit Card Number (MasterCard, Visa, etc.).
CVV	Card Verification Value, a security code for credit or debit card transactions.

PII content is allowed by default. You can configure your connection to either Allow, Block, Redact or Audit requests containing such content.

System Prompt Override

System prompts allow you to inject predefined instructions or context into any request made to the UsageGuard API that will override the request system prompt.

This feature is particularly useful for guiding the behavior of language models, ensuring consistent responses, and maintaining control over the generated content within a connection.

Examples of System Prompts

Here are some examples of how you can use system prompts overrides:

Setting the Tone: You can use system prompts to set the tone of the responses. For example, you might want the responses to be formal, friendly, or technical.
{ "role": "system", "content": "Please respond in a formal and professional tone." }

Providing Context: System prompts can provide additional context that the model should consider when generating a response.

{
  "role": "system",
  "content": "You are an AI assistant helping a user with technical support for a software application."
}

Guiding Behavior: You can guide the model's behavior by specifying what it should or should not do.

{
  "role": "system",
  "content": "Do not provide any medical advice. If asked, suggest consulting a healthcare professional."
}

Injecting Instructions: System prompts can include specific instructions that the model should follow.

{
  "role": "system",
  "content": "Always start your response with 'According to our records,' 
  and end with 'Thank you for your inquiry.'"
}

Forcing JSON Format: You can instruct the model to return responses in a specific format, such as JSON.

{
  "role": "system",
  "content": "Please format your response as a JSON object with the following schema:
   { \"name\": \"string\", \"age\": \"number\", \"email\": \"string\" }."
}

Configuring System Prompts

You can configure system prompts in your connection settings. This allows you to define and manage the prompts that will be injected into your requests. The system prompts can be enabled to modify the request system prompt or disable it completely.

To add or manage system prompts, navigate to Connection -> Policies in your Dashboard.

By using system prompts, you can ensure that your requests are handled in a consistent and controlled manner, enhancing the overall effectiveness and reliability of your application.

Request & Response Logging

Request and responses with their HTTP status and moderation status (if they are flagged for any policy and what was the outcome) are logged by default.

This feature is specific to logging request and response bodies you should only enabled this if you have a use case that requires it like regulatory auditing or or temporarily for debugging. you can enable request and/or response body auditing independently..

UsageGuard performs logging in the background to minimize any performance impact on your application. Note that there may be a slight delay between when the request is made and when the log becomes available.

Request-Level Logging

You can configure request and response logging from Connection -> Settings in your dashboard.

Potential Issues

While logging request and response bodies can be useful, it is important to be aware of the potential compliance and performance issues:

Compliance: Ensure that you are not logging sensitive information, such as Personally Identifiable Information (PII) or payment data, unless you have a specific use case that requires it.
Performance: Logging large request and response bodies can impact the performance of your application. Consider the trade-offs and implement logging selectively.

Use Cases

Here are some common use cases for logging request and response bodies:

Debugging: Capture the request and response bodies to troubleshoot issues and understand the flow of data in your application.
Compliance: Maintain an audit trail of request and response bodies to demonstrate compliance with regulatory requirements.
Monitoring: Monitor the request and response bodies to detect anomalies or suspicious activities in your application.
Analytics: Analyze the request and response bodies to gain insights into user behavior and improve your application.