Moderation & Compliance

In this guide, you will learn the different types of moderation and how to configure them in your app. Moderation involves monitoring and managing user-generated content to prevent harmful or inappropriate behavior.

By implementing effective moderation strategies, you can ensure that you are building responsible AI with your apps remains with a consistent level of safty, quality and integrity.

You can set moderation policies in the Connection -> Select a connection -> Edit Policies section of your dashboard.

Content Filtering

In this section, we will cover various content filtering techniques and policies to help you maintain a safe and respectful environment in your app. Content filtering involves using automated tools to detect and manage inappropriate or harmful content, here are some of the most common categories.

Prohibited Content

UsageGuard uses narrow models and LLMs moderation APIs to detect and filter out (based on your set policies) prohibited content in these categories.

CategoryDescription
HateContent that expresses, incites, or promotes hate based on protected characteristics. Includes threats of violence or harm towards protected groups.
HarassmentContent that expresses, incites, or promotes harassing language towards any target. Includes threats of violence or harm.
Self-HarmContent that promotes, encourages, or depicts acts of self-harm. Includes personal intent and instructions for performing self-harm acts.
ViolenceContent that depicts death, violence, or physical injury, including graphic details.

By default no moderation or filtering is applied. You can configure your connection to either Allow, Block, or Audit requests containing such content.

NSFW (Adult Content)

You can also use UsageGuard to identify and filter NSFW content based on your set policies in these categories.

CategoryDescription
SexualContent meant to arouse sexual excitement or promote sexual services. Includes content involving minors (under 18).
PornographyImages, videos, or text that depict sexual acts or nudity.
ViolenceGraphic depictions of physical harm, injury, or death.
GoreContent that shows extreme violence, blood, or mutilation.
ProfanityExcessive use of offensive or vulgar language.

NSFW content is allowed by default. You can configure your connection to either Allow, Block or Audit requests containing such content.

PII Detection

This policy help you identify and manage PII in your app, ensuring that sensitive information is handled appropriately and securely. Proper handling of PII is crucial for maintaining user privacy and complying with data protection regulations.

PIIDescription
EmailEmail address.
Phone NumberTelephone number, multiple international formats.
AddressPhysical address, including street, city, state, and ZIP code.
SSNSocial Security Number, a unique identifier for individuals in the U.S.
Credit CardCredit Card Number (MasterCard, Visa, etc.).
CVVCard Verification Value, a security code for credit or debit card transactions.

PII content is allowed by default. You can configure your connection to either Allow, Block, Redact or Audit requests containing such content.

System Prompt Override

System prompts allow you to inject predefined instructions or context into any request made to the UsageGuard API that will override the request system prompt.

This feature is particularly useful for guiding the behavior of language models, ensuring consistent responses, and maintaining control over the generated content within a connection.

Examples of System Prompts

Here are some examples of how you can use system prompts overrides:

  1. Setting the Tone: You can use system prompts to set the tone of the responses. For example, you might want the responses to be formal, friendly, or technical.

    {
      "role": "system",
      "content": "Please respond in a formal and professional tone."
    }
    
  2. Providing Context: System prompts can provide additional context that the model should consider when generating a response.

    {
      "role": "system",
      "content": "You are an AI assistant helping a user with technical support for a software application."
    }
    
  3. Guiding Behavior: You can guide the model's behavior by specifying what it should or should not do.

    {
      "role": "system",
      "content": "Do not provide any medical advice. If asked, suggest consulting a healthcare professional."
    }
    
  4. Injecting Instructions: System prompts can include specific instructions that the model should follow.

    {
      "role": "system",
      "content": "Always start your response with 'According to our records,' 
      and end with 'Thank you for your inquiry.'"
    }
    
  5. Forcing JSON Format: You can instruct the model to return responses in a specific format, such as JSON.

    {
      "role": "system",
      "content": "Please format your response as a JSON object with the following schema:
       { \"name\": \"string\", \"age\": \"number\", \"email\": \"string\" }."
    }
    

Configuring System Prompts

You can configure system prompts in your connection settings. This allows you to define and manage the prompts that will be injected into your requests. The system prompts can be enabled to modify the request system prompt or disable it completely.

To add or manage system prompts, navigate to Connection -> Policies in your Dashboard.

By using system prompts, you can ensure that your requests are handled in a consistent and controlled manner, enhancing the overall effectiveness and reliability of your application.

Request & Response Logging

Request and responses with their HTTP status and moderation status (if they are flagged for any policy and what was the outcome) are logged by default.

This feature is specific to logging request and response bodies you should only enabled this if you have a use case that requires it like regulatory auditing or or temporarily for debugging. you can enable request and/or response body auditing independently..

Request-Level Logging

You can configure request and response logging from Connection -> Settings in your dashboard.

Potential Issues

While logging request and response bodies can be useful, it is important to be aware of the potential compliance and performance issues:

  • Compliance: Ensure that you are not logging sensitive information, such as Personally Identifiable Information (PII) or payment data, unless you have a specific use case that requires it.
  • Performance: Logging large request and response bodies can impact the performance of your application. Consider the trade-offs and implement logging selectively.

Use Cases

Here are some common use cases for logging request and response bodies:

  • Debugging: Capture the request and response bodies to troubleshoot issues and understand the flow of data in your application.
  • Compliance: Maintain an audit trail of request and response bodies to demonstrate compliance with regulatory requirements.
  • Monitoring: Monitor the request and response bodies to detect anomalies or suspicious activities in your application.
  • Analytics: Analyze the request and response bodies to gain insights into user behavior and improve your application.

Was this page helpful?