Introduction to Concept Drift
Last Updated :
02 Sep, 2020
If we place ourselves in a frame that differs slightly from what we usually see. For instance: when we do batch learning, i.e. learning on a fixed set of data that generates a given model, algorithms can quickly become ineffective or even counterproductive. This problem could occur because of the modification of data or the occurrence of new data constantly. This problem is known as concept drift.
A formal definition:
Concept drift is the event where the statistical properties of the class variable of the data — in other words, the target we want to predict — change over time. When a model is trained, it knows a function that maps the independent variables, or predictors, to the target variables. In other words, predicting the target variable with the help of other independent variables. In a static and perfect environment where none of these predictors nor the target changes or evolves, the model should perform as it did on day one because there’s no change. But if the predictors are changed with time, the model might change the performance, as it was trained with old data, and predicting from new data might be tough for the model because of the evolution of the predictors.
An example of such a situation is Dynamic Data (For Instance: Streaming Data), where not only do the statistical properties of the target variable change but so does its meaning. When this change happens, the mapping found by the function is no longer suitable for the new environment.
In Machine Learning and Predictive analytics, the concept drift means the statistical properties of the target variable of the data, of which the model is trying to predict, changes over time in very unpredicted ways. This leads to problems because as time passes, the predictions become less accurate. Hence of little or no use.
Let’s illustrate an example of a sensor positioned on a volcano in order to collect the temperature of the latter over time. Suppose that we collect data over several days during which it only rained. Learning about these data would allow us to obtain the following model (figure below): beyond a certain threshold, we consider that the volcano is active and if not, it is at rest.
Figure 1: Data during Rain
However, a few days later, a heatwave arrives and the temperature distribution is found changed as below (Figure 2). We can easily see that the model established earlier is no longer valid, you have to adapt it.
Figure 2: Data After Rain
We can also see the concept of concept-drift in shopping during Diwali in India. During normal days shopping goes very normally but suddenly during the time of Diwali, the shopping hikes very sudden. Below are the few statistics that are taken from
here.
Figure 2: Data After Rain
Similar Reads
Inductive Reasoning in AI Inductive reasoning, a fundamental aspect of human logic and reasoning, plays a pivotal role in the realm of artificial intelligence (AI). This cognitive process involves making generalizations from specific observations, which AI systems mimic to improve decision-making and predict outcomes. This a
7 min read
Components of Time Series Data Time series data, which consists of observations recorded over time at regular intervals, can be analyzed by breaking it down into four primary components. These components help identify patterns, trends, and irregularities in the data. It's often shown as a line graph to easily see patterns over ti
10 min read
Conceptual Dependency (CD) Theory in Artificial Intelligence Conceptual Dependency (CD) theory in Artificial Intelligence (AI), developed by Roger Schank in 1969, aims to enable machines to understand human language. It focuses on representing the meaning of sentences in a way that transcends specific words or languages, allowing AI systems to grasp the core
5 min read
Bass Diffusion Model The "Bass Diffusion Model," or BASS Model for short, is a mathematical model that is used to examine and forecast how new ideas and goods will be adopted and spread within a market or community. The foundation of Frank Bass's 1969 model is the idea that consumers can be divided into two groups: inno
13 min read
What are Diffusion Models? Diffusion models are a powerful class of generative models that have gained prominence in the field of machine learning and artificial intelligence. They offer a unique approach to generating data by simulating the diffusion process, which is inspired by physical processes such as heat diffusion. Th
6 min read
Role of Differential calculus in Machine Learning A subset of artificial intelligence called machine learning has completely changed how we handle challenging issues in a variety of industries. The idea of optimization, which is crucial for building models that can correctly predict events, is at the core of this revolution. This optimization metho
10 min read