SlideShare a Scribd company logo
Ramya Raghavendra
IBM Research
rraghav@us.ibm.com
IMPROVING TRAFFIC
PREDICTION USING
WEATHER DATA
#EUent7
#EUent7
Pranita Dewan Joshua Rosenkranz
Ramya Raghavendra Mudhakar Srivatsa
About me
• PhD, CS from UC Santa
Barbara
• Researcher at IBM TJ
Watson
Machine Learning Process
Business
Understanding
• Challenge
• Why it is
important
• Why it is
hard
Data Collection
• Traffic
• Weather
• Archival
• Real-time
Data
preprocessing
• Cleaning
• Joins
• Spark time
series library
Traffic
modeling
• ARIMA
• Random
forest
• LSTM
#EUent7
Machine Learning Process
Business
Understanding
• Challenge
• Why it is
important
• Why it is
hard
Data Collection
• Traffic
• Weather
• Archival
• Real-time
Data
preprocessing
• Cleaning
• Joins
• Spark time
series library
Traffic
modeling
• ARIMA
• Random
forest
• LSTM
#EUent7
Driver behavior data is only valid in the context of what is
also happening on the road
UBI – Usage Based Insurance
71 6571 7265 44˚
Driver	
Speed
Driver	
Speed
Speed	
Limit
Speed	
Limit
Reference	
Speed
Weather	
Condition
Temp	
Reading
2
Congestion	
Index
Limited Analysis
can lead to
inaccurate
assessments, and
impact retention
More data, and driver relevant data will
lead to greater understanding of
behavior and associated risk
With	36.2	Billion	wasted	trucking	hours	caused	by	traffic	congestion,	
and	the	average	citizen	losing	nearly	$800	per	year	in	wasted	fuel	and	
time,	we	need	to	PREDICT	traffic	to	increase	efficiency.
The Challenge
What time should I leave tomorrow to get
to Newark the quickest?
With snow expected in the morning, what
time do I need to leave to get to work by 8:00?
What should I tell my morning viewers
about their evening commute today?
Predictive	Traffic	Demo
#EUent7
Why It’s Important
22%
Several	
times/day
32%
Once/day
13%
2-3	
times/	
week
6%
<2	
times/week
12%
Never
54% CHECK TRAFFIC DAILY
62%
59%
63%
62%
68%
63%
31%
28%
26%
26%
29%
37%
Drive	times	…
Drive	times	for	…
Best	routes	for	…
Best	routes	to	get	…
How	weather	is	…
Maps	showing	…
Before	I	leave As	I'm	driving
TWC TRAFFIC SURVEY
2:1 PEOPLE WANT TRAFFIC DATA BEFORE THEY LEAVE
#EUent7
We historically know general traffic patterns, but many variables
can significantly change expectations. Weather is one of the
primary variables. So what did we do?
The Challenge – No Easy Task
• 2.58 Billion Traffic records in
the five cites studied
• 262 Million weather records in
the 1 year study
• Week Day vs. Weekend,
Morning Commute vs.
Evening Commute
• Results tabulated on bad
weather days, where impacts
matter the most.
Selected 5 Unique Cities in
different US geographies
Analyzed 1 year of both
traffic and weather data
Built a cognitive model that
predicts future traffic flows for
15 mins to 24 hours into the
future.
#EUent7
Machine Learning Process
Business
Understanding
• Challenge
• Why it is
important
• Why it is
hard
Data Collection
• Traffic
• Weather
• Archival
• Real-time
Data
preprocessing
• Cleaning
• Joins
• Spark time
series library
Traffic
modeling
• ARIMA
• Random
forest
• LSTM
#EUent7
• History on Demand
– Weather features accessed via lat/lon or bounding box
– Hourly historical information from July 2011
• Enhanced Forecast
– Forecasts at 4 km. resolution every 15 minutes
#EUent7
Weather Data
https://business.weather.com/products/weather-data-packages
• Traffic, road and incident data
– 300M sources
– 8M kilometers of road
• Real-time traffic flow information for all
functional road classifications
• eXtreme Definiton segments (XD)
– 100-350m long
– traffic updated every 5 minutes
#EUent7
Traffic Data
1Apache Spark extensions to handle time series and geospatial data
Traffic
(historical)
Weather
(historical +
predicted)
Incidence
Reports
(Police,
Construction,
Traffic Cam,
Tweets)
Data
Sources
First Order Models
• ARIMA/BATS
Second Order
Models
• Spatial
Correlation
• Causality
Higher Order Models
• Random forest
• LSTM
Machine
Learning
Models
Analytics
Platform
Spark
Streaming
Training
Scoring
Apache
Spark1
HDFS/
Cassandra
#EUent7
Setup
Spark-TimeSeries: Library for Distributed Time Series
Analytics on Apache Spark
#EUent7
Scale	out	
• Single	JVM:	Streams
• Horizontal:	ShortTSRDD
• Longitudinal:	LongTSRDD
Data	types
• Fully	templated
• Integers,	Doubles,	Strings	etc
• Fully	supporting	geo	locations	
Windowing
• Record	based
• Time	based
• Activity	based
Runtime	support
• Periodic,	Aperiodic,	Hybrid
• Aligned/	Unaligned	timeseries
Multivariate	analysis
• Temporal	joins
• Record-based	Join
Languages
• Scala
• Java
• Python*
Class Features/Models
Runtime datatypes
• Java streams
• Short timeseries RDD (horizontal partitioning)
• Long timeseries RDD (longitudinal partitioning)
• Timeseries Partitioner
Runtime timeseries transforms
• Map/Transform
• Segmentation (record, time, burst, regression)
• Temporal Join
• Interpolation (linear, cubic-spline)
• Forecast
• Filter/slice
Unsupervised/Semisupervized learning
• Similar sequence detection (Damerau-Levenshtein, Dynamic Time Warping)
• Semi-supervized clustering (motif-based)
• Timeseries clustering (k-means, k-shape)
• Subsequence mining( frequent, discriminatory, timeseries motifs )
• Automatic model selection (Autoforecaster), Grid-search (for H-W), Hannan-Rissanen, Yule-
Walker
Math • Kalman Filter, convolution/deconvolution, autocorrelation, cross-correlation, FFT, DCT
Statistical tests • Ljung Box test, Augmented Dickey-Fuller test, Granger Causality
Seasonal + Trend Modeling, Non-Linear
• Holt-Winters Additive, Holt-Winters Multiplicative, Segmented Models, Seasonal-Trend
Decomposition, Multi-Seasonality, BATS (Box-Cox, ARMA Error)
Linear Modeling
• ARIMA / ARMA, Linear Regression, Ridge Regression, Moving Averaging
Runtime
support
Algorithms
Machine Learning Process
Business
Understanding
• Challenge
• Why it is
important
• Why it is
hard
Data Collection
• Traffic
• Weather
• Archival
• Real-time
Data
preprocessing
• Cleaning
• Joins
• Spark time
series library
Traffic
modeling
• ARIMA
• Random
forest
• LSTM
#EUent7
• ARIMA (Autoregressive integrated moving average) – Used for time-series forecasting
• Use ARIMA to predict per road segment future speeds based on previously observed values
• Can model hour-of-day and day-of-week patterns
• Cannot handle non-periodic “incidents”
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
24 hour window prediction
errors
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
0 2 4 6
prediction errors tailARIMA Prediction example
p: # autoregressive terms,
d: # non-seasonal differences needed for stationarity
q: # lagged forecast errors in the prediction equation.
75% accuracy
Time: ~3 mins
(linear scaleout with
TSRDD)
#EUent7
ARIMA Based Model
• Per-road segment regression tree for prediction
• Regression tree features:
• Current speeds on the road segment
• Current speeds on “connected” road segments
• Predicted weather on the road segment
• Connected Road Segment Extraction Methodologies:
à Spatial Radius àCorrelation àCausality
Congestion on a road segment affects
connected road segments
Accuracy:
• 89% weather
• 82% noweather
Time: 6-8 mins
(linear scaleout with
TSRDD)
TSRDD
#EUent7
Random Forest Based Model
Vu +
Training per node
#EUent7
LSTM + Node Embedding as Feature Vector
• Create node embedding
• Concatenate node embedding with time series data
• Node embedding allow the model to learn spatial components of the
graph, while the time series data incorporates the temporal components
SparkHDFS
CSV
Parquet
JSON
(File) Train
Models Offline: One model
per-city and per-
prediction-time-
horizon; Updated
every three months;
No raw data is stored
CSV
JSON
(15 min
per-city
updates)
StreamingKafka
Model Updates
REDIS
REST
API
Online: One Kafka and one Spark streaming job per city,
prediction over multiple time horizons are stored against the
edge id key in REDIS; REST API only accesses REDIS
Traffic
Weather
Temporal &
spatial joins
#EUent7
Architecture
Driver behavior data is only valid in the context of what is
also happening on the road
UBI – Usage Based Insurance
71 6571 7265 44˚
Driver	
Speed
Driver	
Speed
Speed	
Limit
Speed	
Limit
Reference	
Speed
Weather	
Condition
Temp	
Reading
2
Congestion	
Index
Limited Analysis
can lead to
inaccurate
assessments, and
impact retention
More data, and driver relevant data will
lead to greater understanding of
behavior and associated risk
The Results
Total Percentage
reduction in
prediction error
Percentage
reduction in error
during morning rush
hour
Percentage reduction
in error during evening
rush hour
Chicago 34.4% 16.9% 41.5%
Houston 30.6% 19.3% 17.9%
Philadelphia 24.7% 9.5% 19.5%
Atlanta 15.1% 3.3% 2.19%
Portland 23.0% 15.3% 23.8%
Chicago
Houston
Philadelphia
Atlanta
Portland
Significant Improvements in Accuracy in All Geographies Modeled
#EUent7
5
Predictive Traffic will significantly impact how
drivers plan their day. We will…
Alert users, before they travel, that their journey may take
longer than normal.
Deliver intelligent mobile tools to find the best times to
travel – if at all.
Over time, Predictive Traffic gets smarter by learning from
new IoT data: road conditions, local traffic behavior,
weather sensors, incidents, user generated feedback, traffic
cameras, etc.
Commuting gets better with Predictive Traffic
#EUent7
Open source details
#EUent7
https://ibm.github.io/
https://www.ibm.com/developerworks

More Related Content

What's hot (20)

PDF
Machine Learning & AI - 2022 intro for pre-college students.pdf
Ed Fernandez
 
PPTX
Introduction to Customer Data Platforms
Treasure Data, Inc.
 
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
PDF
DAS Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
PPTX
Our big data
uthrarajan
 
PPTX
Generative AI Use cases for Enterprise - Second Session
Gene Leybzon
 
PDF
Generative AI: Shifting the AI Landscape
Deakin University
 
PDF
LLMs Bootcamp
Fiza987241
 
PDF
The ABCs of Treating Data as Product
DATAVERSITY
 
PDF
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
 
PDF
UTILITY OF AI
Andre Muscat
 
PDF
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
PDF
The Future is in Responsible Generative AI
Saeed Al Dhaheri
 
PPTX
Big data
Ami Redwan Haq
 
PDF
Master Data Management's Place in the Data Governance Landscape
CCG
 
PPTX
Data Observability.pptx
SonaSamad1
 
PDF
Large Language Models Bootcamp
Data Science Dojo
 
PDF
Understanding big data and data analytics big data
Seta Wicaksana
 
PPT
Gartner: Master Data Management Functionality
Gartner
 
PDF
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 
Machine Learning & AI - 2022 intro for pre-college students.pdf
Ed Fernandez
 
Introduction to Customer Data Platforms
Treasure Data, Inc.
 
Data Architecture Strategies: Data Architecture for Digital Transformation
DATAVERSITY
 
DAS Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Our big data
uthrarajan
 
Generative AI Use cases for Enterprise - Second Session
Gene Leybzon
 
Generative AI: Shifting the AI Landscape
Deakin University
 
LLMs Bootcamp
Fiza987241
 
The ABCs of Treating Data as Product
DATAVERSITY
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
 
UTILITY OF AI
Andre Muscat
 
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
The Future is in Responsible Generative AI
Saeed Al Dhaheri
 
Big data
Ami Redwan Haq
 
Master Data Management's Place in the Data Governance Landscape
CCG
 
Data Observability.pptx
SonaSamad1
 
Large Language Models Bootcamp
Data Science Dojo
 
Understanding big data and data analytics big data
Seta Wicaksana
 
Gartner: Master Data Management Functionality
Gartner
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
DianaGray10
 

Viewers also liked (18)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Spark Summit
 
PDF
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
Spark Summit
 
PDF
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Spark Summit
 
PDF
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Spark Summit
 
PPTX
Low Touch Machine Learning with Leah McGuire (Salesforce)
Spark Summit
 
PDF
Building Machine Learning Algorithms on Apache Spark with William Benton
Spark Summit
 
PDF
Feature Hashing for Scalable Machine Learning with Nick Pentreath
Spark Summit
 
PDF
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
PDF
Art of Feature Engineering for Data Science with Nabeel Sarwar
Spark Summit
 
PPTX
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Databricks
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Histogram Equalized Heat Maps from Log Data via Apache Spark with Arvind Rao
Spark Summit
 
Natural Language Understanding at Scale with Spark-Native NLP, Spark ML, and ...
Spark Summit
 
Apache Spark—Apache HBase Connector: Feature Rich and Efficient Access to HBa...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Spark Summit
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Spark Summit
 
Low Touch Machine Learning with Leah McGuire (Salesforce)
Spark Summit
 
Building Machine Learning Algorithms on Apache Spark with William Benton
Spark Summit
 
Feature Hashing for Scalable Machine Learning with Nick Pentreath
Spark Summit
 
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
Art of Feature Engineering for Data Science with Nabeel Sarwar
Spark Summit
 
Deep-Dive into Deep Learning Pipelines with Sue Ann Hong and Tim Hunter
Databricks
 
Ad

Similar to Improving Traffic Prediction Using Weather Data with Ramya Raghavendra (20)

PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
IRJET Journal
 
PDF
Classification Approach for Big Data Driven Traffic Flow Prediction using Ap...
IRJET Journal
 
PPTX
Traffic Prediction from Street Network images.pptx
chirantanGupta1
 
PDF
Smart traffic forecasting: leveraging adaptive machine learning and big data ...
IAESIJAI
 
PDF
0505.pdf
TadiyosHailemichael
 
PDF
TRAFFIC FORECAST FOR INTELLECTUAL TRANSPORTATION SYSTEM USING MACHINE LEARNING
IRJET Journal
 
PDF
Application Of Long Short Term Memory Networks For Long- And Short-Term Bus T...
Deja Lewis
 
PPTX
Predict Traffic flow with KNN and LSTM
Afzaal Subhani
 
PDF
2019 MATC Fall Webinar Series - Dr. Anusha S.P.
Mid-America Transportation Center
 
PDF
Neural Network Based Parking via Google Map Guidance
IJERA Editor
 
PDF
900 keynote abbott
Rising Media, Inc.
 
PPTX
Smart Mobility
inLabFIB
 
PPTX
Traffic Data Analysis and Prediction using Big Data
Jongwook Woo
 
PDF
Dixon Deep Learning
SciCompIIT
 
PDF
Machine Learning Based Traffic Volume Count Prediction
IRJET Journal
 
PDF
Driving Behavior for ADAS and Autonomous Driving VIII
Yu Huang
 
PDF
IBM Predictive analytics IoT Presentation
Ian Skerrett
 
PDF
Machine learning in the physical world by Kip Larson from AWS IoT
Bill Liu
 
PPTX
An effective joint prediction model for travel demands and traffic flows
ivaderivader
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
IRJET Journal
 
Classification Approach for Big Data Driven Traffic Flow Prediction using Ap...
IRJET Journal
 
Traffic Prediction from Street Network images.pptx
chirantanGupta1
 
Smart traffic forecasting: leveraging adaptive machine learning and big data ...
IAESIJAI
 
TRAFFIC FORECAST FOR INTELLECTUAL TRANSPORTATION SYSTEM USING MACHINE LEARNING
IRJET Journal
 
Application Of Long Short Term Memory Networks For Long- And Short-Term Bus T...
Deja Lewis
 
Predict Traffic flow with KNN and LSTM
Afzaal Subhani
 
2019 MATC Fall Webinar Series - Dr. Anusha S.P.
Mid-America Transportation Center
 
Neural Network Based Parking via Google Map Guidance
IJERA Editor
 
900 keynote abbott
Rising Media, Inc.
 
Smart Mobility
inLabFIB
 
Traffic Data Analysis and Prediction using Big Data
Jongwook Woo
 
Dixon Deep Learning
SciCompIIT
 
Machine Learning Based Traffic Volume Count Prediction
IRJET Journal
 
Driving Behavior for ADAS and Autonomous Driving VIII
Yu Huang
 
IBM Predictive analytics IoT Presentation
Ian Skerrett
 
Machine learning in the physical world by Kip Larson from AWS IoT
Bill Liu
 
An effective joint prediction model for travel demands and traffic flows
ivaderivader
 
Ad

More from Spark Summit (20)

PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
PDF
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Spark Summit
 
PDF
Variant-Apache Spark for Bioinformatics with Piotr Szul
Spark Summit
 
PDF
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
PDF
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Spark Summit
 
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
PDF
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Spark Summit
 
PDF
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Apache Spark-Bench: Simulate, Test, Compare, Exercise, and Yes, Benchmark wit...
Spark Summit
 
Variant-Apache Spark for Bioinformatics with Piotr Szul
Spark Summit
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Spark Summit
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Spark Summit
 
Hardware Acceleration of Apache Spark on Energy-Efficient FPGAs with Christof...
Spark Summit
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Spark Summit
 

Recently uploaded (20)

PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
BinarySearchTree in datastructures in detail
kichokuttu
 
What Is Data Integration and Transformation?
subhashenia
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
big data eco system fundamentals of data science
arivukarasi
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 

Improving Traffic Prediction Using Weather Data with Ramya Raghavendra

  • 1. Ramya Raghavendra IBM Research [email protected] IMPROVING TRAFFIC PREDICTION USING WEATHER DATA #EUent7
  • 2. #EUent7 Pranita Dewan Joshua Rosenkranz Ramya Raghavendra Mudhakar Srivatsa About me • PhD, CS from UC Santa Barbara • Researcher at IBM TJ Watson
  • 3. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  • 4. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  • 5. Driver behavior data is only valid in the context of what is also happening on the road UBI – Usage Based Insurance 71 6571 7265 44˚ Driver Speed Driver Speed Speed Limit Speed Limit Reference Speed Weather Condition Temp Reading 2 Congestion Index Limited Analysis can lead to inaccurate assessments, and impact retention More data, and driver relevant data will lead to greater understanding of behavior and associated risk With 36.2 Billion wasted trucking hours caused by traffic congestion, and the average citizen losing nearly $800 per year in wasted fuel and time, we need to PREDICT traffic to increase efficiency. The Challenge What time should I leave tomorrow to get to Newark the quickest? With snow expected in the morning, what time do I need to leave to get to work by 8:00? What should I tell my morning viewers about their evening commute today? Predictive Traffic Demo #EUent7
  • 6. Why It’s Important 22% Several times/day 32% Once/day 13% 2-3 times/ week 6% <2 times/week 12% Never 54% CHECK TRAFFIC DAILY 62% 59% 63% 62% 68% 63% 31% 28% 26% 26% 29% 37% Drive times … Drive times for … Best routes for … Best routes to get … How weather is … Maps showing … Before I leave As I'm driving TWC TRAFFIC SURVEY 2:1 PEOPLE WANT TRAFFIC DATA BEFORE THEY LEAVE #EUent7
  • 7. We historically know general traffic patterns, but many variables can significantly change expectations. Weather is one of the primary variables. So what did we do? The Challenge – No Easy Task • 2.58 Billion Traffic records in the five cites studied • 262 Million weather records in the 1 year study • Week Day vs. Weekend, Morning Commute vs. Evening Commute • Results tabulated on bad weather days, where impacts matter the most. Selected 5 Unique Cities in different US geographies Analyzed 1 year of both traffic and weather data Built a cognitive model that predicts future traffic flows for 15 mins to 24 hours into the future. #EUent7
  • 8. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  • 9. • History on Demand – Weather features accessed via lat/lon or bounding box – Hourly historical information from July 2011 • Enhanced Forecast – Forecasts at 4 km. resolution every 15 minutes #EUent7 Weather Data https://business.weather.com/products/weather-data-packages
  • 10. • Traffic, road and incident data – 300M sources – 8M kilometers of road • Real-time traffic flow information for all functional road classifications • eXtreme Definiton segments (XD) – 100-350m long – traffic updated every 5 minutes #EUent7 Traffic Data
  • 11. 1Apache Spark extensions to handle time series and geospatial data Traffic (historical) Weather (historical + predicted) Incidence Reports (Police, Construction, Traffic Cam, Tweets) Data Sources First Order Models • ARIMA/BATS Second Order Models • Spatial Correlation • Causality Higher Order Models • Random forest • LSTM Machine Learning Models Analytics Platform Spark Streaming Training Scoring Apache Spark1 HDFS/ Cassandra #EUent7 Setup
  • 12. Spark-TimeSeries: Library for Distributed Time Series Analytics on Apache Spark #EUent7 Scale out • Single JVM: Streams • Horizontal: ShortTSRDD • Longitudinal: LongTSRDD Data types • Fully templated • Integers, Doubles, Strings etc • Fully supporting geo locations Windowing • Record based • Time based • Activity based Runtime support • Periodic, Aperiodic, Hybrid • Aligned/ Unaligned timeseries Multivariate analysis • Temporal joins • Record-based Join Languages • Scala • Java • Python*
  • 13. Class Features/Models Runtime datatypes • Java streams • Short timeseries RDD (horizontal partitioning) • Long timeseries RDD (longitudinal partitioning) • Timeseries Partitioner Runtime timeseries transforms • Map/Transform • Segmentation (record, time, burst, regression) • Temporal Join • Interpolation (linear, cubic-spline) • Forecast • Filter/slice Unsupervised/Semisupervized learning • Similar sequence detection (Damerau-Levenshtein, Dynamic Time Warping) • Semi-supervized clustering (motif-based) • Timeseries clustering (k-means, k-shape) • Subsequence mining( frequent, discriminatory, timeseries motifs ) • Automatic model selection (Autoforecaster), Grid-search (for H-W), Hannan-Rissanen, Yule- Walker Math • Kalman Filter, convolution/deconvolution, autocorrelation, cross-correlation, FFT, DCT Statistical tests • Ljung Box test, Augmented Dickey-Fuller test, Granger Causality Seasonal + Trend Modeling, Non-Linear • Holt-Winters Additive, Holt-Winters Multiplicative, Segmented Models, Seasonal-Trend Decomposition, Multi-Seasonality, BATS (Box-Cox, ARMA Error) Linear Modeling • ARIMA / ARMA, Linear Regression, Ridge Regression, Moving Averaging Runtime support Algorithms
  • 14. Machine Learning Process Business Understanding • Challenge • Why it is important • Why it is hard Data Collection • Traffic • Weather • Archival • Real-time Data preprocessing • Cleaning • Joins • Spark time series library Traffic modeling • ARIMA • Random forest • LSTM #EUent7
  • 15. • ARIMA (Autoregressive integrated moving average) – Used for time-series forecasting • Use ARIMA to predict per road segment future speeds based on previously observed values • Can model hour-of-day and day-of-week patterns • Cannot handle non-periodic “incidents” 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 24 hour window prediction errors 0.000001 0.00001 0.0001 0.001 0.01 0.1 1 0 2 4 6 prediction errors tailARIMA Prediction example p: # autoregressive terms, d: # non-seasonal differences needed for stationarity q: # lagged forecast errors in the prediction equation. 75% accuracy Time: ~3 mins (linear scaleout with TSRDD) #EUent7 ARIMA Based Model
  • 16. • Per-road segment regression tree for prediction • Regression tree features: • Current speeds on the road segment • Current speeds on “connected” road segments • Predicted weather on the road segment • Connected Road Segment Extraction Methodologies: à Spatial Radius àCorrelation àCausality Congestion on a road segment affects connected road segments Accuracy: • 89% weather • 82% noweather Time: 6-8 mins (linear scaleout with TSRDD) TSRDD #EUent7 Random Forest Based Model
  • 17. Vu + Training per node #EUent7 LSTM + Node Embedding as Feature Vector • Create node embedding • Concatenate node embedding with time series data • Node embedding allow the model to learn spatial components of the graph, while the time series data incorporates the temporal components
  • 18. SparkHDFS CSV Parquet JSON (File) Train Models Offline: One model per-city and per- prediction-time- horizon; Updated every three months; No raw data is stored CSV JSON (15 min per-city updates) StreamingKafka Model Updates REDIS REST API Online: One Kafka and one Spark streaming job per city, prediction over multiple time horizons are stored against the edge id key in REDIS; REST API only accesses REDIS Traffic Weather Temporal & spatial joins #EUent7 Architecture
  • 19. Driver behavior data is only valid in the context of what is also happening on the road UBI – Usage Based Insurance 71 6571 7265 44˚ Driver Speed Driver Speed Speed Limit Speed Limit Reference Speed Weather Condition Temp Reading 2 Congestion Index Limited Analysis can lead to inaccurate assessments, and impact retention More data, and driver relevant data will lead to greater understanding of behavior and associated risk The Results Total Percentage reduction in prediction error Percentage reduction in error during morning rush hour Percentage reduction in error during evening rush hour Chicago 34.4% 16.9% 41.5% Houston 30.6% 19.3% 17.9% Philadelphia 24.7% 9.5% 19.5% Atlanta 15.1% 3.3% 2.19% Portland 23.0% 15.3% 23.8% Chicago Houston Philadelphia Atlanta Portland Significant Improvements in Accuracy in All Geographies Modeled #EUent7
  • 20. 5 Predictive Traffic will significantly impact how drivers plan their day. We will… Alert users, before they travel, that their journey may take longer than normal. Deliver intelligent mobile tools to find the best times to travel – if at all. Over time, Predictive Traffic gets smarter by learning from new IoT data: road conditions, local traffic behavior, weather sensors, incidents, user generated feedback, traffic cameras, etc. Commuting gets better with Predictive Traffic #EUent7