SlideShare a Scribd company logo
4
Most read
7
Most read
15
Most read
Mining Stream, Time Series, and Sequence Data
Methodologies for Stream Data Processing and Stream Data SystemsRandom SamplingSliding WindowsHistogramsMulti resolution MethodsSketches Synopses
Randomized Algorithms to analyze Data StreamsRandomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
Data Stream Management Systems and Stream QueriesIn traditional database systems, data are stored in finite and persistent databases.stream data are infinite and impossible to store fully in a database. Data Stream Management System (DSMS), there may be multiple data streams.Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
Critical Layers of stream data cube    Two critical cuboids (or layers)The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to studyThe second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
Hoeffding Tree AlgorithmThe Hoeffding tree algorithm is a decision tree learning method for stream data classification.It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
Very Fast Decision Tree (VFDT) The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
Concept-adapting Very Fast Decision Tree algorithm (CVFDT).CVFDT also uses a sliding window approach; however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones. Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
A Classifier Ensemble Approach to Stream Data ClassificationThe idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream.Whenever a new chunk arrives, we build a new classifier from it. The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment. Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
Clustering in evolving data streamsCompute and store summaries of past dataApply a divide-and-conquer strategyIncremental clustering of incoming data streamsPerform micro clustering as well as macro clustering analysisExplore multiple time granularity for the analysis of cluster evolutionDivide stream clustering into on-line and off-line processes
Mining Time-Series DataA time-series database consists of sequences of values or events obtained over repeated measurements of time.Trend AnalysisSimilarity Search in Time-Series Analysis
Markov Chain for sequence analysisA Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
Tasks using hidden Markov models include:Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
Different algorithms in series analysisForward AlgorithmViterbi AlgorithmBaum-Welch Algorithm
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

More Related Content

What's hot (20)

PPTX
Mining Data Streams
SujaAldrin
 
PPTX
Birch Algorithm With Solved Example
kailash shaw
 
PPT
5.3 mining sequential patterns
Krish_ver2
 
PPT
introduction to data mining tutorial
Salah Amean
 
PPTX
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
PDF
CS6010 Social Network Analysis Unit IV
pkaviya
 
PPTX
Data Mining: Text and web mining
DataminingTools Inc
 
PDF
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
PPTX
Information retrieval 15 alternative algebraic models
Vaibhav Khanna
 
PPTX
Data mining primitives
lavanya marichamy
 
PPTX
Trusted systems
ahmad abdelhafeez
 
PPTX
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
PPT
OLAP technology
Dr. Mahendra Srivastava
 
PPT
2.4 rule based classification
Krish_ver2
 
PPT
Data preprocessing
Jason Rodrigues
 
PPTX
Web Mining & Text Mining
Hemant Sharma
 
PPT
1.2 steps and functionalities
Krish_ver2
 
PPTX
Grid based method & model based clustering method
rajshreemuthiah
 
PPT
Introduction To Data Mining
Phi Jack
 
PPT
Introduction to Data Mining
Sushil Kulkarni
 
Mining Data Streams
SujaAldrin
 
Birch Algorithm With Solved Example
kailash shaw
 
5.3 mining sequential patterns
Krish_ver2
 
introduction to data mining tutorial
Salah Amean
 
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
CS6010 Social Network Analysis Unit IV
pkaviya
 
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Information retrieval 15 alternative algebraic models
Vaibhav Khanna
 
Data mining primitives
lavanya marichamy
 
Trusted systems
ahmad abdelhafeez
 
Mining single dimensional boolean association rules from transactional
ramya marichamy
 
OLAP technology
Dr. Mahendra Srivastava
 
2.4 rule based classification
Krish_ver2
 
Data preprocessing
Jason Rodrigues
 
Web Mining & Text Mining
Hemant Sharma
 
1.2 steps and functionalities
Krish_ver2
 
Grid based method & model based clustering method
rajshreemuthiah
 
Introduction To Data Mining
Phi Jack
 
Introduction to Data Mining
Sushil Kulkarni
 

Similar to Data Mining: Mining stream time series and sequence data (20)

PDF
ME Synopsis
Poonam Debnath
 
PDF
Application of Dynamic Clustering Alogirthm in Medical Surveillance
IJCSEA Journal
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
PDF
Paper Title - Mining Techniques for Streaming Data
IJDKP
 
PDF
MINING TECHNIQUES FOR STREAMING DATA
IJDKP
 
PDF
MINING TECHNIQUES FOR STREAMING DATA
IJDKP
 
PDF
Parametric comparison based on split criterion on classification algorithm
IAEME Publication
 
PDF
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
cscpconf
 
PDF
IJCSIT
Poonam Debnath
 
PDF
Aa31163168
IJERA Editor
 
PPT
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
PDF
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Albert Bifet
 
PPT
Data mining technique for classification and feature evaluation using stream ...
ranjit banshpal
 
PPT
Chapter 08 Data Mining Techniques
Houw Liong The
 
PPT
081.ppt
amil baba
 
ME Synopsis
Poonam Debnath
 
Application of Dynamic Clustering Alogirthm in Medical Surveillance
IJCSEA Journal
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
IJCSEA Journal
 
Paper Title - Mining Techniques for Streaming Data
IJDKP
 
MINING TECHNIQUES FOR STREAMING DATA
IJDKP
 
MINING TECHNIQUES FOR STREAMING DATA
IJDKP
 
Parametric comparison based on split criterion on classification algorithm
IAEME Publication
 
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
cscpconf
 
Aa31163168
IJERA Editor
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Albert Bifet
 
Data mining technique for classification and feature evaluation using stream ...
ranjit banshpal
 
Chapter 08 Data Mining Techniques
Houw Liong The
 
081.ppt
amil baba
 
Ad

More from DataminingTools Inc (20)

PPTX
Terminology Machine Learning
DataminingTools Inc
 
PPTX
Techniques Machine Learning
DataminingTools Inc
 
PPTX
Machine learning Introduction
DataminingTools Inc
 
PPTX
Areas of machine leanring
DataminingTools Inc
 
PPTX
AI: Planning and AI
DataminingTools Inc
 
PPTX
AI: Logic in AI 2
DataminingTools Inc
 
PPTX
AI: Logic in AI
DataminingTools Inc
 
PPTX
AI: Learning in AI 2
DataminingTools Inc
 
PPTX
AI: Learning in AI
DataminingTools Inc
 
PPTX
AI: Introduction to artificial intelligence
DataminingTools Inc
 
PPTX
AI: Belief Networks
DataminingTools Inc
 
PPTX
AI: AI & Searching
DataminingTools Inc
 
PPTX
AI: AI & Problem Solving
DataminingTools Inc
 
PPTX
Data Mining: Outlier analysis
DataminingTools Inc
 
PPTX
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
PPTX
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
PPTX
Data warehouse and olap technology
DataminingTools Inc
 
PPTX
Data Mining: Data processing
DataminingTools Inc
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
PPTX
Data Mining: Classification and analysis
DataminingTools Inc
 
Terminology Machine Learning
DataminingTools Inc
 
Techniques Machine Learning
DataminingTools Inc
 
Machine learning Introduction
DataminingTools Inc
 
Areas of machine leanring
DataminingTools Inc
 
AI: Planning and AI
DataminingTools Inc
 
AI: Logic in AI 2
DataminingTools Inc
 
AI: Logic in AI
DataminingTools Inc
 
AI: Learning in AI 2
DataminingTools Inc
 
AI: Learning in AI
DataminingTools Inc
 
AI: Introduction to artificial intelligence
DataminingTools Inc
 
AI: Belief Networks
DataminingTools Inc
 
AI: AI & Searching
DataminingTools Inc
 
AI: AI & Problem Solving
DataminingTools Inc
 
Data Mining: Outlier analysis
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
DataminingTools Inc
 
Data Mining: Data processing
DataminingTools Inc
 
Data Mining: clustering and analysis
DataminingTools Inc
 
Data Mining: Classification and analysis
DataminingTools Inc
 
Ad

Recently uploaded (20)

PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 

Data Mining: Mining stream time series and sequence data

  • 1. Mining Stream, Time Series, and Sequence Data
  • 2. Methodologies for Stream Data Processing and Stream Data SystemsRandom SamplingSliding WindowsHistogramsMulti resolution MethodsSketches Synopses
  • 3. Randomized Algorithms to analyze Data StreamsRandomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
  • 4. Data Stream Management Systems and Stream QueriesIn traditional database systems, data are stored in finite and persistent databases.stream data are infinite and impossible to store fully in a database. Data Stream Management System (DSMS), there may be multiple data streams.Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
  • 5. Critical Layers of stream data cube Two critical cuboids (or layers)The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to studyThe second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
  • 6. Hoeffding Tree AlgorithmThe Hoeffding tree algorithm is a decision tree learning method for stream data classification.It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
  • 7. Very Fast Decision Tree (VFDT) The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
  • 8. Concept-adapting Very Fast Decision Tree algorithm (CVFDT).CVFDT also uses a sliding window approach; however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones. Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
  • 9. A Classifier Ensemble Approach to Stream Data ClassificationThe idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream.Whenever a new chunk arrives, we build a new classifier from it. The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment. Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
  • 10. Clustering in evolving data streamsCompute and store summaries of past dataApply a divide-and-conquer strategyIncremental clustering of incoming data streamsPerform micro clustering as well as macro clustering analysisExplore multiple time granularity for the analysis of cluster evolutionDivide stream clustering into on-line and off-line processes
  • 11. Mining Time-Series DataA time-series database consists of sequences of values or events obtained over repeated measurements of time.Trend AnalysisSimilarity Search in Time-Series Analysis
  • 12. Markov Chain for sequence analysisA Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
  • 13. Tasks using hidden Markov models include:Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
  • 14. Different algorithms in series analysisForward AlgorithmViterbi AlgorithmBaum-Welch Algorithm
  • 15. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net