SlideShare a Scribd company logo
2031ICT Data Analytics
Methods
Lecture 9: Time Series and
Sequence Data Analysis
Outline
§ Time-series analysis
§ Definition
§ Types of analysis
§ Sequence data analysis
§ Sequence pattern mining
§ Natural language processing
Stream data
§ There are two types of stream data:
• Time series
• Sequence data
• A time series is a series of data points indexed in time order.
• Usually, a time series is a sequence taken at successive spaced points in time.
• Sequence data is a series of ordered elements or events recorded with or without a
concrete notion of time.
• Therefore, time series can be considered a special sequence data case.
Time series
§ A Time series is often plotted via a run chart that is a graph that displays observed
data in a time sequence.
§ Data is recorded at regular intervals.
§ Time series examples include weather data, heart rate monitoring (EKG), brain
monitoring (EEG), quarterly sales, stock prices, industry forecasts, interest rates,
and largely in any domain of applied science and engineering which involves
temporal measurements.
Method of time series analysis
§ The methods can be divided into frequency-domain methods and time-domain
methods.
• Frequency: the number of occurrences of a repeating event per unit of time
https://www.youtube.com/watch?v=fYtVHhk3xJ0
Method of time series analysis
§ The methods can also be divided parametric and non-parametric methods.
§ The parametric approaches assume that the underlying process (progresses in
time) has a certain structure which can be described using a small number of
parameters (e.g., mean and deviation). The task is to estimate the parameters
of the model.
§ Non-parametric approaches explicitly estimate the properties of the process
without assuming that the process has any particular structure.
§ Methods of time series analysis may also be divided into linear (regression) and
non-linear (regression).
Motivation of time series
§ In the context of statistics, econometrics, quantitative finance, seismology,
meteorology, and geophysics the primary goal of time series analysis is
forecasting. The goal is to predict what will happen.
§ In the context of signal processing, control engineering and communication
engineering, it is used for signal detection. The goal is to detect anomalies,
events, and their reasons.
§ Other applications are in data mining, pattern recognition and machine learning,
where time series analysis can be used for clustering, classification, query by
content, anomaly detection as well as forecasting.
Type of time series analysis
§ Descriptive analysis: Identifies patterns in time series data, like trends and cycles;
Highlights the main characteristics of the time series data, usually in a visual format
§ Curve fitting: Plots the data along a curve to study the relationships of variables
within the data
§ Prediction and forecasting: Predicts future data based on historical trends
§ Classification: Identifies and assigns categories to the data
§ Segmentation: Splits the data into segments to show the underlying properties of
the source information
§ Anomaly detection: Finds outlier data or events
Descriptive (exploratory) analysis
§ Compute descriptive statistical
measures: mean, medium, max,
min etc
• These statistics do not take into
account the time and thus all
the points are considered to be
equivalent - time series reduce
to common dataset.
§ Trend, cycle, seasonal
• Requires mathematical
modelling
Marcel Dettling, Applied Time Series Analysis, 2020
Curve fitting
§ Curve fitting is the process of constructing a curve,
or mathematical function, that has the best fit to a
series of data points, possibly subject to constraints.
§ Curve fitting can involve either:
§ Interpolation, where an exact fit to the data is
required, or smoothing, in which a "smooth"
function is constructed that approximately fits the
data; or
§ Extrapolation which fits curve beyond the range
of the observed data, and is subject to a degree of
uncertainty since it may reflect the method used to
construct the curve as much as it reflects the
observed data.
Curve fitting
§ The construction of an economic time series involves the estimation of some
components for some dates by interpolation between values for earlier and later
dates.
§ Interpolation is useful where the data surrounding the missing data is available and
its trend, seasonality, and longer-term cycles are known.
§ Alternatively, polynomial interpolation or spline interpolation is used where
piecewise polynomial functions fit into time intervals so that they fit smoothly
together.
§ Difference between polynomial regression and polynomial interpolation
§ polynomial regression gives a single polynomial that models the entire data set.
§ polynomial interpolation yields a piecewise continuous function composed of
many polynomials to model the data set.
Curve fitting
https://stackoverflow.com/questions/30433391/how-can-i-produce-multi-point-linear-interpolation
Prediction and forecasting
§ Time series forecasting is one of the most widely applied data science techniques in
business, finance, supply chain management, production and inventory planning.
Many prediction problems involve a time component and thus require extrapolation
of time series data, or time series forecasting.
§ Time series forecasting is also an important area of machine learning (ML) and can
be cast as a supervised learning problem. ML methods such as Regression, Neural
Networks, Support Vector Machines — can be applied to it. Forecasting involves
taking models fit on historical data and using them to predict future observations.
§ Time series forecasting means to forecast or to predict the future value over a
period of time. It entails developing models based on previous data and applying
them to make observations and guide future strategic decisions.
§ The future is forecast or estimated based on what has already happened. Time
series adds a time order dependence between observations. This dependence is
both a constraint and a structure that provides a source of additional information.
Before we discuss time series forecasting methods, let’s define time series
forecasting more closely.
Prediction and forecasting
Goals of forecasting
§ When forecasting, it is important to understand your goal. To narrow down the
specifics of your predictive modelling problem, ask questions about:
• Volume of data available — more data is often more helpful, offering greater
opportunity for exploratory data analysis, model testing and tuning.
• Required time horizon of predictions — shorter time horizons are often easier to
predict — with higher confidence — than longer ones.
• Forecast update frequency — Forecasts might need to be updated frequently
over time (updating forecasts as new information becomes available often
results in more accurate predictions).
• Forecast temporal frequency — Often forecasts can be made at lower or higher
frequencies, which allows harnessing down-sampling and up-sampling of data
Methods of forecasting - Decompositional Method
• Time series data can exhibit a
variety of patterns, so it is often
helpful to split a time series into
components, each representing an
underlying pattern category. This is
what decompositional models do.
• We often know how each
component affects the progression
of time series data, e.g., boxing day
sale surge, which makes
decomposition less difficult
https://quantdare.com/decomposition-to-improve-time-series-prediction/
§ In time series forecasting, data
smoothing is a statistical
technique that involves
removing outliers from a time
series data set to make a
pattern more visible. Smoothing
data removes or reduces
random variation and shows
underlying trends and cyclic
components.
Methods of forecasting - Smooth-based
https://statisticsbyjim.com/time-series/exponential-smoothing-time-series-forecasting/
§ The moving-average model specifies that
the output variable depends linearly on the
current and various past values of time
series data.
§ Example:
Smooth-based forecasting - Moving average
https://statisticsbyjim.com/time-series/moving-averages-smoothing/
Classification
§ Assigning time series pattern to a
specific category.
§ It can be used for handwriting
recognition, voice recognition,
speaker recognition, ECG/EEG
signal classification and so on.
Classification – dynamic time wrapping
§ Dynamic time warping (DTW) combined with K-nearest neighbors (K-NN)
has been a benchmark for other time series classification algorithms to
beat.
§ Idea: segmenting the whole data with respect to time, e.g., some
segments are across longer or shorter time slots
§ One classifies a new incoming time series by finding K most similar time
series in the training data and assign the new time series to the class
appear most of the time.
Classification – dynamic time wrapping
§ DTW method needs to calculate the distance between two time series to
(tell if the are close or not). While we can take the distance between each
point in the time series, it is not necessarily clear which points should be
compared to which in the two time series.
§ DTW solves this by pairing up the different time points by drawing lines
between them in such a way that each time point in a series must be
connected to a time point in the other series, and two lines must never
cross. The distance is the sum of the difference between the paired time
points.
Segmentation
§ Splitting a time-series into a sequence of segments. It is often the case that a time-
series can be represented as a sequence of individual segments, each with its own
characteristic properties. For example, the audio signal from an audio conference
call can be partitioned into pieces corresponding to the times during which each
person was speaking. In time-series segmentation, the goal is to identify the
segment boundary points in the time-series, and to characterize the dynamical
properties associated with each segment.
Anomaly detection
§ Outlier is the data points which deviate from some standard or usual pattern as to
arouse suspicions that it may be generated from a different mechanism.”
§ Whether any abnormal signals/events is observed?
https://neptune.ai/blog/anomaly-detection-in-time-series
Outline
§ Time-series analysis
§ Definition
§ Types of analysis
§ Sequence data analysis
§ Sequence pattern mining
§ Natural language processing
Sequential pattern mining
§ Sequential pattern mining aims to find statistically relevant patterns between data
examples in data sequence.
§ One of the most important tasks in sequential pattern mining is string mining. The
mining aims to find a string in a sequence. Examples include finding
words/phrases/sentences in a long text, or find a particular pattern of nucleotide
bases 'A', 'G', 'C' and 'T' in DNA sequences, or amino acids for protein sequences.
This is the technology that has been used to sequencing COVID-19 viruses.
Sequential pattern mining
§ A problem in sequence mining is to find frequent itemsets and the order they
appear.
§ For example, if we want to find whether it is the case "if a {customer buys a book},
he or she is likely to {buy another book} next month", or in the context of stock
prices, "if {price of Apple rises}, it is likely that {price of Samsung rises} in the same
week".
§ Itemset mining is useful to find relationship between frequently co-occurring items in
large transactions. Then the information can be used to develop recommendation
system.
Sequential pattern mining – definition
§ A sequence is an ordered list of elements: s = <e1, e2, e3, e4, e5, e6>.
§ Element e1 happens before e2 which is before e3, and so on.
§ Take the steps to put an elephant into a fridge as an example:
§ Open the door of the fridge
§ Put the elephant into the fridge
§ Close the door
So s = < open the door, put the elephant, close the door>
Sequence dataset
§ A sequence dataset for online transactions may look like this:
§ Then the transaction sequence of customer 1 is:
S1 = < {a,b,c,d}, {d,f}, {a,e,d} >. Then we can check whether another customer has the
similar sequence/sub-sequence of S1, and the use the information to recommend items
that this new customer is interested in.
Customer ID Transaction ID Purchased
C1 1 a,b,c,d
C2 2 a,f,c,e
C1 3 d,f
C3 4 b,c,e,f
C2 5 a,c,d,e
C1 6 a,e,d
Subsequence
§ A sequence <a1, a2, …, an> is contained in another sequence
<b1, b2, …, bm> (m ≥n) if there exist integers i1 < i2 < … < in, such that
a1⊆bi1 , a2⊆bi2, …, an⊆bin
Support of a subsequence
§ The support of a subsequence w is defined as the fraction of data sequences
that contain w
§ A sequential pattern is a frequent subsequence (i.e., a subsequence whose
support is ≥ minsup where minsup means minimum support)
§ Given:
• a database of sequences
• a user-specified minimum support threshold, minsup
§ Task:
• Find all subsequences with support ≥ minsup
apriori property for sequences
§ Let D be a database that contains a collection of data sequences d. The support of
a sequence t is the fraction of all data sequences that contain t:
𝑠 𝑡 =
|𝑑 ∈ 𝐷: 𝑡 𝑖𝑠 𝑎 𝑠𝑢𝑏𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑑|
|𝐷|
where |D| is the number of sequences in the database
§ apriori property:
§ If a data sequence d contains a sequence t, then it will also contain any
subsequence of t.
§ Therefore: If w is a subsequence of t, then s(w) ≥ s(t).
Frequent subsequences
If A, B, C, D, E are customers and events are products, then {2,4}, {3},{5} and
{1},{2} are more frequent combinations for recommendation in the following
transactions.

More Related Content

Similar to Lecture9_Time_Series_2024_and_data_analysis (1).pdf (20)

PPTX
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
PPTX
Project Schedule Management - Estimate Activity Durations - PMP Workgroup
Tùng Trần Thanh
 
PDF
Time Series Analysis: Techniques for Analyzing Time-Dependent Data
Uncodemy
 
PDF
Mastering Time Series Forecasting - Guide to Techniques, Applications, and Fu...
Data & Analytics Magazin
 
PPTX
Gaussian Processes and Time Series.pptx
guruprassand
 
PDF
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
Libcorpio
 
PDF
Smart E-Logistics for SCM Spend Analysis
IRJET Journal
 
PDF
Time Series Analysis: Theory and Practice
Tetiana Ivanova
 
DOC
Time series analysis
Faltu Focat
 
PPTX
UNIT 2, TOPIC 4.ppt.....................
KiranpreetBedi1
 
PDF
Ac26185187
IJERA Editor
 
DOCX
TIME SERIES ANALYSIS.docx
MilhhanMohsin
 
PPT
Time Series Analysis and Forecasting.ppt
sadhvimanerikar
 
PPTX
Time series analysis & forecasting-Day1.pptx
AsmaaMahmoud89
 
PPTX
Data Science and analytics, computer Science
MurugeswariC1
 
PPTX
Time series analysis
Utkarsh Sharma
 
PPTX
Presentation On Time Series Analysis in Mechine Learning
mahfuzur32785
 
PDF
FORECASTING AND MARKET DEMAND-PLANNING.pdf
DerykArcilla
 
PPT
Forecasting_KHULNAUNIVERSITYOF ENGINEERING AND TECHNOLOGY.ppt
quaderinayeem
 
PDF
Unit 5_ Advanced Database Models, Systems, and Applications.pdf
COSMOS58
 
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
Project Schedule Management - Estimate Activity Durations - PMP Workgroup
Tùng Trần Thanh
 
Time Series Analysis: Techniques for Analyzing Time-Dependent Data
Uncodemy
 
Mastering Time Series Forecasting - Guide to Techniques, Applications, and Fu...
Data & Analytics Magazin
 
Gaussian Processes and Time Series.pptx
guruprassand
 
TIME SERIES & CROSS ‎SECTIONAL ANALYSIS
Libcorpio
 
Smart E-Logistics for SCM Spend Analysis
IRJET Journal
 
Time Series Analysis: Theory and Practice
Tetiana Ivanova
 
Time series analysis
Faltu Focat
 
UNIT 2, TOPIC 4.ppt.....................
KiranpreetBedi1
 
Ac26185187
IJERA Editor
 
TIME SERIES ANALYSIS.docx
MilhhanMohsin
 
Time Series Analysis and Forecasting.ppt
sadhvimanerikar
 
Time series analysis & forecasting-Day1.pptx
AsmaaMahmoud89
 
Data Science and analytics, computer Science
MurugeswariC1
 
Time series analysis
Utkarsh Sharma
 
Presentation On Time Series Analysis in Mechine Learning
mahfuzur32785
 
FORECASTING AND MARKET DEMAND-PLANNING.pdf
DerykArcilla
 
Forecasting_KHULNAUNIVERSITYOF ENGINEERING AND TECHNOLOGY.ppt
quaderinayeem
 
Unit 5_ Advanced Database Models, Systems, and Applications.pdf
COSMOS58
 

Recently uploaded (20)

PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
deep dive data management sharepoint apps.ppt
novaprofk
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
Ad

Lecture9_Time_Series_2024_and_data_analysis (1).pdf

  • 1. 2031ICT Data Analytics Methods Lecture 9: Time Series and Sequence Data Analysis
  • 2. Outline § Time-series analysis § Definition § Types of analysis § Sequence data analysis § Sequence pattern mining § Natural language processing
  • 3. Stream data § There are two types of stream data: • Time series • Sequence data • A time series is a series of data points indexed in time order. • Usually, a time series is a sequence taken at successive spaced points in time. • Sequence data is a series of ordered elements or events recorded with or without a concrete notion of time. • Therefore, time series can be considered a special sequence data case.
  • 4. Time series § A Time series is often plotted via a run chart that is a graph that displays observed data in a time sequence. § Data is recorded at regular intervals. § Time series examples include weather data, heart rate monitoring (EKG), brain monitoring (EEG), quarterly sales, stock prices, industry forecasts, interest rates, and largely in any domain of applied science and engineering which involves temporal measurements.
  • 5. Method of time series analysis § The methods can be divided into frequency-domain methods and time-domain methods. • Frequency: the number of occurrences of a repeating event per unit of time https://www.youtube.com/watch?v=fYtVHhk3xJ0
  • 6. Method of time series analysis § The methods can also be divided parametric and non-parametric methods. § The parametric approaches assume that the underlying process (progresses in time) has a certain structure which can be described using a small number of parameters (e.g., mean and deviation). The task is to estimate the parameters of the model. § Non-parametric approaches explicitly estimate the properties of the process without assuming that the process has any particular structure. § Methods of time series analysis may also be divided into linear (regression) and non-linear (regression).
  • 7. Motivation of time series § In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and geophysics the primary goal of time series analysis is forecasting. The goal is to predict what will happen. § In the context of signal processing, control engineering and communication engineering, it is used for signal detection. The goal is to detect anomalies, events, and their reasons. § Other applications are in data mining, pattern recognition and machine learning, where time series analysis can be used for clustering, classification, query by content, anomaly detection as well as forecasting.
  • 8. Type of time series analysis § Descriptive analysis: Identifies patterns in time series data, like trends and cycles; Highlights the main characteristics of the time series data, usually in a visual format § Curve fitting: Plots the data along a curve to study the relationships of variables within the data § Prediction and forecasting: Predicts future data based on historical trends § Classification: Identifies and assigns categories to the data § Segmentation: Splits the data into segments to show the underlying properties of the source information § Anomaly detection: Finds outlier data or events
  • 9. Descriptive (exploratory) analysis § Compute descriptive statistical measures: mean, medium, max, min etc • These statistics do not take into account the time and thus all the points are considered to be equivalent - time series reduce to common dataset. § Trend, cycle, seasonal • Requires mathematical modelling Marcel Dettling, Applied Time Series Analysis, 2020
  • 10. Curve fitting § Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. § Curve fitting can involve either: § Interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data; or § Extrapolation which fits curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data.
  • 11. Curve fitting § The construction of an economic time series involves the estimation of some components for some dates by interpolation between values for earlier and later dates. § Interpolation is useful where the data surrounding the missing data is available and its trend, seasonality, and longer-term cycles are known. § Alternatively, polynomial interpolation or spline interpolation is used where piecewise polynomial functions fit into time intervals so that they fit smoothly together. § Difference between polynomial regression and polynomial interpolation § polynomial regression gives a single polynomial that models the entire data set. § polynomial interpolation yields a piecewise continuous function composed of many polynomials to model the data set.
  • 13. Prediction and forecasting § Time series forecasting is one of the most widely applied data science techniques in business, finance, supply chain management, production and inventory planning. Many prediction problems involve a time component and thus require extrapolation of time series data, or time series forecasting. § Time series forecasting is also an important area of machine learning (ML) and can be cast as a supervised learning problem. ML methods such as Regression, Neural Networks, Support Vector Machines — can be applied to it. Forecasting involves taking models fit on historical data and using them to predict future observations.
  • 14. § Time series forecasting means to forecast or to predict the future value over a period of time. It entails developing models based on previous data and applying them to make observations and guide future strategic decisions. § The future is forecast or estimated based on what has already happened. Time series adds a time order dependence between observations. This dependence is both a constraint and a structure that provides a source of additional information. Before we discuss time series forecasting methods, let’s define time series forecasting more closely. Prediction and forecasting
  • 15. Goals of forecasting § When forecasting, it is important to understand your goal. To narrow down the specifics of your predictive modelling problem, ask questions about: • Volume of data available — more data is often more helpful, offering greater opportunity for exploratory data analysis, model testing and tuning. • Required time horizon of predictions — shorter time horizons are often easier to predict — with higher confidence — than longer ones. • Forecast update frequency — Forecasts might need to be updated frequently over time (updating forecasts as new information becomes available often results in more accurate predictions). • Forecast temporal frequency — Often forecasts can be made at lower or higher frequencies, which allows harnessing down-sampling and up-sampling of data
  • 16. Methods of forecasting - Decompositional Method • Time series data can exhibit a variety of patterns, so it is often helpful to split a time series into components, each representing an underlying pattern category. This is what decompositional models do. • We often know how each component affects the progression of time series data, e.g., boxing day sale surge, which makes decomposition less difficult https://quantdare.com/decomposition-to-improve-time-series-prediction/
  • 17. § In time series forecasting, data smoothing is a statistical technique that involves removing outliers from a time series data set to make a pattern more visible. Smoothing data removes or reduces random variation and shows underlying trends and cyclic components. Methods of forecasting - Smooth-based https://statisticsbyjim.com/time-series/exponential-smoothing-time-series-forecasting/
  • 18. § The moving-average model specifies that the output variable depends linearly on the current and various past values of time series data. § Example: Smooth-based forecasting - Moving average https://statisticsbyjim.com/time-series/moving-averages-smoothing/
  • 19. Classification § Assigning time series pattern to a specific category. § It can be used for handwriting recognition, voice recognition, speaker recognition, ECG/EEG signal classification and so on.
  • 20. Classification – dynamic time wrapping § Dynamic time warping (DTW) combined with K-nearest neighbors (K-NN) has been a benchmark for other time series classification algorithms to beat. § Idea: segmenting the whole data with respect to time, e.g., some segments are across longer or shorter time slots § One classifies a new incoming time series by finding K most similar time series in the training data and assign the new time series to the class appear most of the time.
  • 21. Classification – dynamic time wrapping § DTW method needs to calculate the distance between two time series to (tell if the are close or not). While we can take the distance between each point in the time series, it is not necessarily clear which points should be compared to which in the two time series. § DTW solves this by pairing up the different time points by drawing lines between them in such a way that each time point in a series must be connected to a time point in the other series, and two lines must never cross. The distance is the sum of the difference between the paired time points.
  • 22. Segmentation § Splitting a time-series into a sequence of segments. It is often the case that a time- series can be represented as a sequence of individual segments, each with its own characteristic properties. For example, the audio signal from an audio conference call can be partitioned into pieces corresponding to the times during which each person was speaking. In time-series segmentation, the goal is to identify the segment boundary points in the time-series, and to characterize the dynamical properties associated with each segment.
  • 23. Anomaly detection § Outlier is the data points which deviate from some standard or usual pattern as to arouse suspicions that it may be generated from a different mechanism.” § Whether any abnormal signals/events is observed? https://neptune.ai/blog/anomaly-detection-in-time-series
  • 24. Outline § Time-series analysis § Definition § Types of analysis § Sequence data analysis § Sequence pattern mining § Natural language processing
  • 25. Sequential pattern mining § Sequential pattern mining aims to find statistically relevant patterns between data examples in data sequence. § One of the most important tasks in sequential pattern mining is string mining. The mining aims to find a string in a sequence. Examples include finding words/phrases/sentences in a long text, or find a particular pattern of nucleotide bases 'A', 'G', 'C' and 'T' in DNA sequences, or amino acids for protein sequences. This is the technology that has been used to sequencing COVID-19 viruses.
  • 26. Sequential pattern mining § A problem in sequence mining is to find frequent itemsets and the order they appear. § For example, if we want to find whether it is the case "if a {customer buys a book}, he or she is likely to {buy another book} next month", or in the context of stock prices, "if {price of Apple rises}, it is likely that {price of Samsung rises} in the same week". § Itemset mining is useful to find relationship between frequently co-occurring items in large transactions. Then the information can be used to develop recommendation system.
  • 27. Sequential pattern mining – definition § A sequence is an ordered list of elements: s = <e1, e2, e3, e4, e5, e6>. § Element e1 happens before e2 which is before e3, and so on. § Take the steps to put an elephant into a fridge as an example: § Open the door of the fridge § Put the elephant into the fridge § Close the door So s = < open the door, put the elephant, close the door>
  • 28. Sequence dataset § A sequence dataset for online transactions may look like this: § Then the transaction sequence of customer 1 is: S1 = < {a,b,c,d}, {d,f}, {a,e,d} >. Then we can check whether another customer has the similar sequence/sub-sequence of S1, and the use the information to recommend items that this new customer is interested in. Customer ID Transaction ID Purchased C1 1 a,b,c,d C2 2 a,f,c,e C1 3 d,f C3 4 b,c,e,f C2 5 a,c,d,e C1 6 a,e,d
  • 29. Subsequence § A sequence <a1, a2, …, an> is contained in another sequence <b1, b2, …, bm> (m ≥n) if there exist integers i1 < i2 < … < in, such that a1⊆bi1 , a2⊆bi2, …, an⊆bin
  • 30. Support of a subsequence § The support of a subsequence w is defined as the fraction of data sequences that contain w § A sequential pattern is a frequent subsequence (i.e., a subsequence whose support is ≥ minsup where minsup means minimum support) § Given: • a database of sequences • a user-specified minimum support threshold, minsup § Task: • Find all subsequences with support ≥ minsup
  • 31. apriori property for sequences § Let D be a database that contains a collection of data sequences d. The support of a sequence t is the fraction of all data sequences that contain t: 𝑠 𝑡 = |𝑑 ∈ 𝐷: 𝑡 𝑖𝑠 𝑎 𝑠𝑢𝑏𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑑| |𝐷| where |D| is the number of sequences in the database § apriori property: § If a data sequence d contains a sequence t, then it will also contain any subsequence of t. § Therefore: If w is a subsequence of t, then s(w) ≥ s(t).
  • 32. Frequent subsequences If A, B, C, D, E are customers and events are products, then {2,4}, {3},{5} and {1},{2} are more frequent combinations for recommendation in the following transactions.