Data collection and use policy

This document describes how JetBrains handles the JetBrains AI service usage related data.

1. Data collection

1.1 General

The JetBrains AI service can collect two types of data related to the usage of AI features:

Behavioral data
Detailed data

The user fully controls both types of data collection.

The data from the JetBrains AI service is sent to third-party language model providers (such as OpenAI), which means this data is also processed on the servers of these providers (and according to their policies); neither the user nor JetBrains has control over this third-party data processing. JetBrains does not work with the large language model providers that use customer data for training models.

Please check the list of the engaged third-party language model providers and the documents describing how they handle the data here.

1.2 Behavioral data collection

Behavioral data collection includes data such as:

Types of AI features used.
Rates of acceptance for suggestions from different AI features.
Performance data (for example, the amount of time it takes to generate AI suggestions).
User feedback on the quality of results produced by different AI features.

Behavioral data does not include any personally identifiable data or any source code files or fragments from the user project.

This data is used by various teams at JetBrains for analyzing product usage, improving product features, and training machine learning (ML) models that control the behavior of different product features (for example, controlling the automatic activation of ML features). It is not used for training ML models that generate code or text, or another type of data from which outputs could be extracted.

Collection of behavioral data is controlled by the standard data sharing settings (see the product documentation for details). It is enabled by default in EAP builds and disabled by default in release builds.

1.3 Detailed data collection

Detailed data collection includes complete data about interactions with large language models. This means the full text of inputs sent by the IDE to the large language model and its responses, including source code snippets.

Access to the collected data will be restricted only to the teams at JetBrains that specifically work on large language model development and integration. This data will be analyzed to understand product usage and identify opportunities for improvement. It will not be used for training any ML models that generate code or text, or revealed in any form to any other users.

We will also implement a retention policy for this data; it will be stored only for a limited amount of time not exceeding 30 days.

Collection of detailed data is enabled only based on explicit approval of users and is controlled in the product settings.

If the user does not opt in to detailed data collection, the inputs will be sent directly to the LLM provider and processed according to their data collection and use policy. The outputs will be sent directly to the user IDE. The inputs and outputs will not be persistently stored on JetBrains servers.

For more information on zero-data retention (ZDR), see Data retention.

Data processing in Codebase Indexing and Semantic Search features

The JetBrains AI service contains a feature allowing semantic indexing of user codebases to enable contextual code assistance and improved code generation capabilities. This indexing allows JetBrains AI to answer questions with awareness of the entire codebase context and generate code suggestions that reference existing implementations within the project.

2.1. Index data processing and storage

In order to perform the codebased indexing, codebase data is uploaded to JetBrains AI. Source code files are processed by chunking content into manageable segments and generating semantic embeddings that capture the meaning and relationships within the code. These embeddings are stored on JetBrains servers within global cloud providers infrastructure, listed at https://www.jetbrains.com/legal/docs/privacy/third-parties/, to enable efficient semantic search and contextual analysis.

Uploaded data is stored only temporarily, strictly for indexing purposes. The embeddings do not contain the original source code text but only represent numerical representations of code semantics that enable similarity matching and contextual understanding.

2.2. Access control and data scoping

Index access is strictly controlled based on repository ownership and organizational boundaries:

Private repository indexes are explicitly linked to their corresponding customers and cannot be accessed by unauthorized parties.
Enterprise repositories are accessible to all users within the same enterprise organization, enabling collaborative code assistance across team projects. The access is aligned with per-user corporate JetBrains licenses.
Individual user repositories are privately indexed and restricted to access by the repository owner only.

All index queries and semantic search operations are scoped within the appropriate customer or organizational boundary. This means that users within the same enterprise can search across all indexed repositories within that enterprise, while individual users can only access their own privately indexed repositories.

Last modified: 27 June 2025