Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Safety and content filters

Google's generative AI models, like Gemini 2.5 Flash, are designed to prioritize safety. However, they can still generate harmful responses, especially when they're explicitly prompted. To further enhance safety and minimize misuse, you can configure content filters to block potentially harmful responses.

This page describes each of the safety and content filter types and outlines key safety concepts. For configurable content filters, it shows you how to configure the blocking thresholds of each harm category to control how often prompts and responses are blocked.

Safety and content filters act as a barrier, preventing harmful output, but they don't directly influence the model's behavior. To learn more about model steerability, see System instructions for safety.

Unsafe prompts

The Gemini API in Vertex AI provides one of the following enum codes to explain why a prompt was rejected:

Enum	Filter type	Description
`PROHIBITED_CONTENT`	Non-configurable safety filter	The prompt was blocked because it was flagged for containing the prohibited contents, usually CSAM.
`BLOCKED_REASON_UNSPECIFIED`	N/A	The reason for blocking the prompt is unspecified.
`OTHER`	N/A	This enum refers to all other reasons for blocking a prompt. Note that Gemini API in Vertex AI does not support all languages. For a list of supported languages, see Gemini language support.

To learn more, see BlockedReason.

The following is an example of Gemini API in Vertex AI output when a prompt is blocked for containing PROHIBITED_CONTENT:

{
  "promptFeedback": {
    "blockReason": "PROHIBITED_CONTENT"
  },
  "usageMetadata": {
    "promptTokenCount": 7,
    "totalTokenCount": 7
  }
}

Unsafe responses

The following filters can detect and block potentially unsafe responses:

Non-configurable safety filters, which block child sexual abuse material (CSAM) and personally identifiable information (PII).
Configurable content filters, which block unsafe content based on a list of harm categories and their user-configured blocking thresholds. You can configure blocking thresholds for each of these harms based on what is appropriate for your use case and business. To learn more, see Configurable content filters.
Citation filters, which provide citations for source material. To learn more, see Citation filter.

An LLM generates responses in units of text called tokens. A model stops generating tokens because it reaches a natural stopping point or because one of the filters blocks the response. The Gemini API in Vertex AI provides one of the following enum codes to explain why token generation stopped:

Enum	Filter type	Description
`STOP`	N/A	This enum indicates that the model reached a natural stopping point or the provided stop sequence.
`MAX_TOKENS`	N/A	The token generation was stopped because the model reached the maximum number of tokens that was specified in the request.
`SAFETY`	Configurable content filter	The token generation was stopped because the response was flagged for harmful content.
`RECITATION`	Citation filter	The token generation stopped because of potential recitation.
`SPII`	Non-configurable safety filter	The token generation was stopped because the response was flagged for Sensitive Personally Identifiable Information (SPII) content.
`PROHIBITED_CONTENT`	Non-configurable safety filter	The token generation was stopped because the response was flagged for containing prohibited content, usually CSAM.
`FINISH_REASON_UNSPECIFIED`	N/A	The finish reason is unspecified.
`OTHER`	N/A	This enum refers to all other reasons that stop token generation. Note that token generation is not supported for all languages. For a list of supported languages, see Gemini language support.

To learn more, see FinishReason.

If a filter blocks the response, it voids the response's Candidate.content field. It does not provide any feedback to the model.

Configurable content filters

Content filters assess content against a list of harms. For each harm category, the content filters assign one score based on the probability of the content being harmful and another score based on the severity of harmful content.

The configurable content filters don't have versioning independent of model versions. Google won't update the configurable content filter for a previously released version of a model. However, it may update the configurable content filter for a future version of a model.

Harm categories

Content filters assess content based on the following harm categories:

Harm Category	Definition
Hate Speech	Negative or harmful comments targeting identity and/or protected attributes.
Harassment	Threatening, intimidating, bullying, or abusive comments targeting another individual.
Sexually Explicit	Contains references to sexual acts or other lewd content.
Dangerous Content	Promotes or enables access to harmful goods, services, and activities.

Comparison of probability scores and severity scores

The probability safety score reflects the likelihood that a model response is associated with the respective harm. It has an associated confidence score between 0.0 and 1.0, rounded to one decimal place. The confidence score is discretized into four confidence levels: NEGLIGIBLE, LOW, MEDIUM, and HIGH.

The severity score reflects the magnitude of how harmful a model response might be. It has an associated severity score ranging from 0.0 to 1.0, rounded to one decimal place. The severity score is discretized into four levels: NEGLIGIBLE, LOW, MEDIUM, and HIGH.

Content can have a low probability score and a high severity score, or a high probability score and a low severity score.

How to configure content filters

You can use the Gemini API in Vertex AI or the Google Cloud console to configure content filters.

Gemini API in Vertex AI

The Gemini API in Vertex AI provides two "harm block" methods:

SEVERITY: This method uses both probability and severity scores.
PROBABILITY: This method uses the probability score only.

The default method is SEVERITY. For models older than gemini-1.5-flash and gemini-1.5-pro, the default method is PROBABILITY. To learn more, see HarmBlockMethod API reference.

The Gemini API in Vertex AI provides the following "harm block" thresholds:

BLOCK_LOW_AND_ABOVE: Block when the probability score or the severity score is LOW, MEDIUM or HIGH.
BLOCK_MEDIUM_AND_ABOVE: Block when the probability score or the severity score is MEDIUM or HIGH.
BLOCK_ONLY_HIGH: Block when the probability score or the severity score is HIGH.
HARM_BLOCK_THRESHOLD_UNSPECIFIED: Block using the default threshold.
OFF: No automated response blocking and no metadata is returned. For gemini-2.5-flash and subsequent models, OFF is the default value.
BLOCK_NONE: The BLOCK_NONE setting removes automated response blocking. Instead, you can configure your own content guidelines with the returned scores. This is a restricted field that isn't available to all users in GA model versions.

For example, the following Python code demonstrates how you can set the harm block threshold to BLOCK_ONLY_HIGH for the dangerous content category:

generative_models.SafetySetting(
  category=generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
  threshold=generative_models.HarmBlockThreshold.BLOCK_ONLY_HIGH,
),

This will block most of the content that is classified as dangerous content. To learn more, see HarmBlockThreshold API reference.

For end-to-end examples in Python, Node.js, Java, Go, C# and REST, see Examples of content filter configuration.

Google Cloud console

The Google Cloud console lets you configure a threshold for each content attribute. The content filter uses only the probability scores. There is no option to use the severity scores.

The Google Cloud console provides the following threshold values:

Off (default): No automated response blocking.
Block few: Block when the probability score is HIGH.
Block some: Block when the probability score is MEDIUM or HIGH.
Block most: Block when the probability score is LOW, MEDIUM or HIGH.

For example, if you set the block setting to Block few for the Dangerous Content category, everything that has a high probability of being dangerous content is blocked. Anything with a lower probability is allowed. The default threshold is Block some.

To set the thresholds, see the following steps:

In the Vertex AI section of the Google Cloud console, go to the Vertex AI Studio page.

Go to Vertex AI Studio
Under Create a new prompt, click any of the buttons to open the prompt design page.
Click Safety settings.

The Safety settings dialog window opens.
For each harm category, configure the desired threshold value.
Click Save.

Example output when a response is blocked by the configurable content filter

The following is an example of Gemini API in Vertex AI output when a response is blocked by the configurable content filter for containing dangerous content:

{
  "candidates": [{
    "finishReason": "SAFETY",
    "safetyRatings": [{
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.11027937,
      "severity": "HARM_SEVERITY_LOW",
      "severityScore": 0.28487435
    }, {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "probability": "HIGH",
      "blocked": true,
      "probabilityScore": 0.95422274,
      "severity": "HARM_SEVERITY_MEDIUM",
      "severityScore": 0.43398145
    }, {
      "category": "HARM_CATEGORY_HARASSMENT",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.11085559,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severityScore": 0.19027223
    }, {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "probability": "NEGLIGIBLE",
      "probabilityScore": 0.22901751,
      "severity": "HARM_SEVERITY_NEGLIGIBLE",
      "severityScore": 0.09089675
    }]
  }],
  "usageMetadata": {
    "promptTokenCount": 38,
    "totalTokenCount": 38
  }
}

Examples of content filter configuration

The following examples demonstrate how you can configure the content filter using the Gemini API in Vertex AI:

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

from google import genai
from google.genai.types import (
    GenerateContentConfig,
    HarmCategory,
    HarmBlockThreshold,
    HttpOptions,
    SafetySetting,
)

client = genai.Client(http_options=HttpOptions(api_version="v1"))

system_instruction = "Be as mean as possible."

prompt = """
    Write a list of 5 disrespectful things that I might say to the universe after stubbing my toe in the dark.
"""

safety_settings = [
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,
        threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
    SafetySetting(
        category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
        threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
    ),
]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt,
    config=GenerateContentConfig(
        system_instruction=system_instruction,
        safety_settings=safety_settings,
    ),
)

# Response will be `None` if it is blocked.
print(response.text)
# Example response:
#     None

# Finish Reason will be `SAFETY` if it is blocked.
print(response.candidates[0].finish_reason)
# Example response:
#     FinishReason.SAFETY

# For details on all the fields in the response
for each in response.candidates[0].safety_ratings:
    print('\nCategory: ', str(each.category))
    print('Is Blocked:', True if each.blocked else False)
    print('Probability: ', each.probability)
    print('Probability Score: ', each.probability_score)
    print('Severity:', each.severity)
    print('Severity Score:', each.severity_score)
# Example response:
#
#     Category:  HarmCategory.HARM_CATEGORY_HATE_SPEECH
#     Is Blocked: False
#     Probability:  HarmProbability.NEGLIGIBLE
#     Probability Score:  2.547714e-05
#     Severity: HarmSeverity.HARM_SEVERITY_NEGLIGIBLE
#     Severity Score: None
#
#     Category:  HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT
#     Is Blocked: False
#     Probability:  HarmProbability.NEGLIGIBLE
#     Probability Score:  3.6103818e-06
#     Severity: HarmSeverity.HARM_SEVERITY_NEGLIGIBLE
#     Severity Score: None
#
#     Category:  HarmCategory.HARM_CATEGORY_HARASSMENT
#     Is Blocked: True
#     Probability:  HarmProbability.MEDIUM
#     Probability Score:  0.71599233
#     Severity: HarmSeverity.HARM_SEVERITY_MEDIUM
#     Severity Score: 0.30782545
#
#     Category:  HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT
#     Is Blocked: False
#     Probability:  HarmProbability.NEGLIGIBLE
#     Probability Score:  1.5624657e-05
#     Severity: HarmSeverity.HARM_SEVERITY_NEGLIGIBLE
#     Severity Score: None

REST

After you set up your environment, you can use REST to test a text prompt. The following sample sends a request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

LOCATION: The region to process the request. Available options include the following:
Click to expand a partial list of available regions
- us-central1
- us-west4
- northamerica-northeast1
- us-east4
- us-west1
- asia-northeast3
- asia-southeast1
- asia-northeast1
PROJECT_ID: Your project ID.
MODEL_ID: The model ID of the multimodal model that you want to use, like gemini-2.5-flash.
ROLE: The role in a conversation associated with the content. Specifying a role is required even in singleturn use cases. Acceptable values include the following:
- USER: Specifies content that's sent by you.
- MODEL: Specifies the model's response.
TEXT: The text instructions to include in the prompt.
SAFETY_CATEGORY: The safety category to configure a threshold for. Acceptable values include the following:
Click to expand safety categories
- HARM_CATEGORY_SEXUALLY_EXPLICIT
- HARM_CATEGORY_HATE_SPEECH
- HARM_CATEGORY_HARASSMENT
- HARM_CATEGORY_DANGEROUS_CONTENT
THRESHOLD: The threshold for blocking responses that could belong to the specified safety category based on probability. Acceptable values include the following:
Click to expand blocking thresholds
- BLOCK_NONE
- BLOCK_ONLY_HIGH
- BLOCK_MEDIUM_AND_ABOVE (default)
- BLOCK_LOW_AND_ABOVE
BLOCK_LOW_AND_ABOVE blocks the most while BLOCK_ONLY_HIGH blocks the least.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent

Request JSON body:

{
  "contents": {
    "role": "ROLE",
    "parts": { "text": "TEXT" }
  },
  "safetySettings": {
    "category": "SAFETY_CATEGORY",
    "threshold": "THRESHOLD"
  },
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:streamGenerateContent" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

[{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " The picture shows a table with a white tablecloth. On the table are two cups of coffee, a bowl of blueberries, and five scones with blueberries. There"
          }
        ]
      },
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ]
}
,
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": " are also some pink flowers on the table. The background is a dark blue color. The picture is taken from a top-down perspective."
          }
        ]
      },
      "finishReason": "STOP",
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 262,
    "candidatesTokenCount": 59,
    "totalTokenCount": 321
  }
}

Example curl command

LOCATION="us-central1"
MODEL_ID="gemini-2.5-flash"
PROJECT_ID="test-project"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/publishers/google/models/${MODEL_ID}:streamGenerateContent -d \
$'{
  "contents": {
    "role": "user",
    "parts": { "text": "Hello!" }
  },
  "safety_settings": [
    {
      "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      "threshold": "OFF"
    },
    {
      "category": "HARM_CATEGORY_HATE_SPEECH",
      "threshold": "BLOCK_LOW_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_HARASSMENT",
      "threshold": "BLOCK_MEDIUM_AND_ABOVE"
    },
    {
      "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
      "threshold": "BLOCK_ONLY_HIGH"
    }
  ]
}'

Citation filter

The generative code features of Vertex AI are intended to produce original content. By design, Gemini limits the likelihood that existing content is replicated at length. If a Gemini feature does make an extensive quotation from a web page, Gemini cites that page.

Sometimes the same content can be found on multiple web pages. Gemini attempts to point you to a popular source. In the case of citations to code repositories, the citation might also reference an applicable open source license. Complying with any license requirements is your own responsibility.

To learn about the metadata of the citation filter, see the Citation API reference.

Civic integrity filter

The civic integrity filter detects and blocks prompts that mention or relate to political elections and candidates. This filter is disabled by default. To turn it on, set the blocking threshold for CIVIC_INTEGRITY to any of the following values. It doesn't make a difference which value you specify.

BLOCK_LOW_AND_ABOVE
BLOCK_MEDIUM_AND_ABOVE
BLOCK_ONLY_HIGH

The following Python code shows you how to turn on the civic integrity filter:

  generative_models.SafetySetting(
      category=generative_models.HarmCategory.CIVIC_INTEGRITY,
      threshold=generative_models.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,
  ),

For more details about the civic integrity filter, contact your Google Cloud representative.

Best practices

While content filters help prevent unsafe content, they might occasionally block benign content or miss harmful content. Advanced models like Gemini 2.5 Flash are designed to generate safe responses even without filters. Test different filter settings to find the right balance between safety and allowing appropriate content.

What's next

Learn about system instructions for safety.
Learn about abuse monitoring.
Learn more about responsible AI.
Learn how to process blocked responses.

Safety and content filters Stay organized with collections Save and categorize content based on your preferences.

Unsafe prompts

Unsafe responses

Configurable content filters

Harm categories

Comparison of probability scores and severity scores

How to configure content filters

Gemini API in Vertex AI

Google Cloud console

Example output when a response is blocked by the configurable content filter

Examples of content filter configuration

Python

Install

REST

curl

PowerShell

Response

Example curl command

Citation filter

Civic integrity filter

Best practices

What's next

Safety and content filters