SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1516
CLASSIFICATION APPROACH FOR BIG DATA DRIVEN TRAFFIC FLOW
PREDICTION USING APACHE SPARK
Riya Gandewar1, Anupama Phakatkar2
1Student, Dept. of Computer Engineering, PICT college, Pune, Maharashtra
2 Professor, Dept. of Computer Engineering, PICT college, Pune, Maharashtra
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Traffic problems are crucial issues in the
rapidly developing society. Traffic flow prediction is an
important problem in Intelligent Transportation Systems.
Over the last few years, traffic data have beenexploding,and
we have truly entered the era of big data for transportation.
In big data driven traffic flow prediction systems, accuracy
and timeliness affects the robustness of prediction
performance. Existing traffic flow prediction methods uses
different classification approaches, dealing with accuracy
and processing time problems.Toovercometheseproblems,
the system uses K-Nearest Neighbors (KNN) for
classification and Convolution Neural Network (CNN) for
prediction of traffic flow. The KNN is used to find out the
patterns of route, that’s how much time should be required
from one destination to another. The CNN is used to predict
the traffic flow in the particular route. The traffic flow data
processing is done in Apache Spark. The output of the
proposed system is, routes which have minimum predicted
traffic flow. It is observed that, the proposed system for
traffic flow prediction has superior performance.
Key Words: Big Data, Classification, Prediction, Hadoop,
Apache Spark
1.INTRODUCTION
Accurate and timely traffic flow information are strongly
needed for individual travellers, business sectors, and
government agencies. Traffic flow prediction has gained
more and more attention with the rapid development and
deployment of Intelligent Transportation Systems (ITSs). It
collects traffic data such as traffic volume and speed on
every road and provide statistical summary services,usually
on traffic congestion [15]. Traffic congestion effects on
Vehicular queuing, travel time, cost, fuel consumption,
pollution in the environment. In a big city, the change of
traffic flow has a large impact on people’s daily life, such as
the route selection for drivers.
The big data generated by the Intelligent Transportation
Systems are worth further exploring to traffic management.
The introduction and development of the Intelligent
Transport System have resulted in more reliable traffic
information gathering, analyzing, and processing, thereby
providing more time-relevant and precise traffic analysis
and prediction to users.
Traffic flow disruptions can be categorized as predictable
and unpredictable. Predictable disruptions include traffic
signals, stop signs, public transit services, scheduled sport
events, music concerts, road constructions etc.
Unpredictable disruptions include auto mobile accidents,
breakdowns, and emergency road closures.
The impact of disruption to traffic flow depends on the
location, the duration of the disruption [9]. Due to
unpredictable disruptions, long-termforecastingmaynot be
accurate enough for practical use. However, short-term
traffic prediction, if properly done, may reach an accuracy
level that is useful for several applications, e.g., we may just
want to know the traffic flow volume within 10 min to 30
min for deciding our route while driving.
Traffic flow conditions are extremely uncertain in current
complex transportation network situation, due to the
heterogeneous and dynamic nature of traffic to nonlinear
interactions between drivers and environments. Moreover,
the traffic state of a specific location is highly influenced by
its upstream and downstream traffic conditions besides the
traffic conditions in the past and future periods,andthusthe
spatial and temporal correlations are the inherent features
of traffic flow. Most of traditional classification approaches
take the traffic flows as the individual and independent
instances, which do not consider thecorrelationinformation
among traffic flows.
In the last several decades, there have been manystudiesfor
predicting short term traffic flow. Recently, there have been
various traffic flow prediction systems, models and
algorithms using statistics-based approaches and
computational intelligence-based approaches.
Despite the popularity, existing work tends to suffer from
the various problems. Firstly, in a city where traffic jam
occurs, the traffic flows at different locations will influence
each other. In a complicated traffic network, the influences
between different locations are decided by traffic network
structure, signal light, commercial layout, etc. Secondly, it
requires the storage of the whole training set which would
be an excessive amount of storage for large data sets and
leads to a large computation time in the classification stage.
The existing methods built the model based on time-series
data regardless of spatial influences. Most of the existing
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1517
studies, adopt few useful features, which cannot provide
sufficient information for forecasting.
In recent years, deep learning has drawn a lotofattention on
the tasks of classification, natural language processing,
dimensionalityreduction,objectdetection,motionmodeling,
etc. The method first uses the deep network to rebuild input
features, and then train neural network for predicting the
traffic flow [12]. The proposed system uses a convolution
neural network approach under Apache Spark on Hadoop
platform. Convolution Neural Network (CNN) is used to
forecast traffic flow and KNN is used to find the spatial
influences that exist in locations. The input to the method is
historical traffic flow data from target locationandhistorical
data about other nearby locations. These data are grouped
together to build a feature matrix for prediction. This
proposed approach for real-time traffic flow prediction has
superior performance by speeding up the classification
process and reducing the storage requirements and
processing time. More importantly, with reasonable
execution time, the classification performance can be
significantly improved by flow correlationinformation,even
under the extremely difficult circumstance of very large
training samples.
2. RELATED WORK
Researchers have been trying to make traffic flow
forecasting more accurate since last few years. Based on the
length of time interval, there are long-term forecasting and
short-term forecasting.
Long-term forecasting indicates making a prediction by
month or by year. The volume of traffic flow is large and
relatively stable, and it is slightly affected by the daily
accident. For example, Zhong sheng Hou et al. [6] proposed a
method based on ARIMA to forecast traffic flow in a month
after they revealed that their time series data had the long
term trend and the fluctuations at the 12-hour time scale.
For short term forecasting, auto regressive integrated
moving average (ARIMA) models, and artificial neural
network (ANN) [1] models are widely exploited. Xiangjie
Kong et al. [11] proposed a plan moving average algorithm
for utilizing previous days historical data. In addition, some
researchers have compared the seasonal ARIMA (SARIMA)
model, which is a TSA model, and showed its excellent
prediction performance. Fuying Yu et al.[9] compared
SARIMAandnonparametric(data-drivenregression)models.
The authors in this research concluded that the SARIMA
model has better performance than the nonparametric
regression models.
H. Hu et al.[14] additionally compared time-series
analysis methods and SVR models and concluded that
SARIMA showed the best performance. In addition, there are
some integration models for this task. For example, Su Yang
et al. [2] proposed a 3-stage model which integrates ARIMA
and ANN, which uses the ARIMA forecasting data as a part of
the input of ANN. X. Chen et al. [16] studied an ensemble
model which consists of a statistical method and a neural
network bagging model.
Yuqi Wang and Wengen Li proposed method for
predicting traffic congestion correlation between road
segments on GPS trajectories [3]. Method extract various
features on each pair of road segments from road network.
The result of this is input to the several classifiers to predict
congestion correlation. It use classifiers like decision tree,
Logistic regression, Random forest and Support vector
machines.
Besides, researchers also consider the use of integrated
data. Hao-Fan Yang et al. [4] proposed a Gaussian mixture
model clustering (GMM) method to partition the data set for
training ANN. Deep learningmethodshavealsogainedalotof
attention recently. Jinyoung Ahn et al. [5] used a stacked
autoencoder (SAE) tolearn generic traffic flow features. Y.Lv
et al. [12] applieda deep belief networks modelintrafficflow
prediction, which adopts multitask learning to reduce the
error.
To sum up, all the above-mentioned methods have many
desirable properties in different disciplines, and thus it is
hard to conclude that which one of these is significantly
superior to other methods in any situation. One of the best
essential reasons is that the accuracy of prediction models
which are developed with small scale separatespecifictraffic
data depends on the traffic flow features embedded in the
collected traffic data. Furthermore, most of the existing
models are performed in stand-alone models, and thus the
computational effort is expensive [7] and the capability of
data processing and storage is restricted. However,
researchers also developed a general architecture of
distributed modeling in a MapReduce framework for traffic
flow forecasting[8], to efficiently process large-scale traffic
data on a Hadoop platform.
As of now, there is diversified research in all the
techniques and platform. The proposed work suggests,
Traffic flow prediction of large volume traffic data using a
classification approach in Apache Spark.
3. PROPOSED SYSTEM
3.1 Proposed Architecture
Fig. 1 shows architectural design of proposed system.
Following are important modules in the system :
Storage System : Apache HDFS is used as an underFS
storage system. User’s CSV files, trained models, testing and
training data set stored in HDFS. Data processing is done
through Apache Spark.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1518
Computation Framework : Apache Spark is used for
computation. Data classifier and predictorisbuiltinSpark.It
access traffic flow data in CSV format. Spark’s Mlib is used
for training KNN and CNN model.
Preprocessing : The module fetchesdata accordingtouser’s
query, to analyze the data and to find the minimum average
flow, average speed and average journey time.
Data Classifier and predictor : The input of K-Nearest
Neighbor model is user’s current location, user’sdestination
location and timestamp. It gets the current time and other
related values. It collects all parameters such as traffic flow,
average speed. Distance is calculated from source to
destination considering different routes. The top k routes
with their respective parameters are fetched from this. The
input to the CNN is traffic flow, average speed, average
journey time. All routes are fetched from KNN and calculate
weight for each route. Generate a hash map and store each
link and traffic flow in it. From polling layer, more accurate
values will be calculated. Store all list into result.
Fig 1 : Traffic Flow Prediction System architecture
Web Application : User registers for traffic flow prediction
system using a web application, providing their own details.
After successfully login to site, user can search for places
where they want to go at a particular time, to know the
traffic flow at that particular place. After analyzing data, the
system gives output as different routesconsideringtheinput
and traffic flow of that place.
3.1 System Algorithm
1. KNN CLASSIFICATION ALGORITHM
Input: User’s current location, User’s destination location,
Timestamp
Output: Link description collection with path details KNN [].
Step 1: Get current time and other related values.
Step 2: Collect all average total values and traffic flow.
Step 3: Calculate distance using below formula.
L=Wi1,Wi2,Wi3,……,Win Weight of each list
C = c1,c2,…..,cn clusters of each list
Step 4: UrlList = URL1(w), URL2(w).. URLn(w)
Step 5: Add each UL to KNN [].
Step 6: Return KNN [].
2. CNN PREDICTION ALGORITHM
Input: Output list of KNN [].
Output: Single or double Link description with path details
Step 1: For each (Link k to KNN)
Step 2: Get each object KNN[k]
Step 3: Calculate each object weight
Step 4: Generate hashmap <double, string>
Store each link into Map <x, object>
end for
Step 5: Polling layer : SortMap(Map)
Step 6: Find first 3 objects from list
Step 7: Store all list into Res[].
Step 8: Return Res[].
4. RESULTS
4.1 Performance Evaluation
The performance of the algorithm is measured on following
parameters:
1. Accuracy
2. System Response Time
We use two widely employed evaluation measures to assess
the forecast performance:
The Root Mean Squared Error (RMSE) is a way to measure
the average error of the forecasting results and is calculated
by
Here, xt and xp are the actual and the forecast values.nisthe
total number of locations.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1519
The Mean Relative Error (MRE) is a way to measure the
proportional error oftheforecastingresultsandiscalculated
by
4.2 Analysis and Discussion
Lower the RMSE higher the system accuracy, Fig. 2 shows
Root Mean Squared Error of Single CNN, DGCNN and
Proposed System from which we can conclude that our
proposed system have higher accuracy than the existing
system.
Fig 2 : RMSE for Single CNN, DGCNN and Proposed System
Lower the MRE higher the system accuracy, Fig. 3 shows
Mean Relative Error of Single CNN, DGCNN and Proposed
System from which we can conclude that our proposed
system have higher accuracy than the existing system.
Fig 3 : MRE for Single CNN, DGCNN and Proposed System
5. CONCLUSION
A system is proposed to predict the traffic flow from
historical traffic flow data. Whenever user search fora route,
then only that data will be extracted from historical traffic
flow data. The K-Nearest Neighbor algorithm is used for
finding K-Nearest Neighborsfrom source to destination. The
Convolution Neural Network algorithm is used to predict
traffic flow from those nearest neighbors. The system gives
output as predicted traffic flow with their respective routes.
Proposed system improves the accuracy of traffic flow
prediction system and reduces the prediction time by a
factor of 2.5x .
REFERENCES
[1] Fuying Yu , Zhijie Song, “The Short-Term Traffic Flow
Prediction Method Based on DetectorsPSOAlgorithm”, Sixth
International Conference on Intelligent Systems Design and
Engineering Applications, 2015, pp. 890-893.
[2] Su Yang, Shixiong Shi, Xiaobing Hu, Minjie Wang ,
“Discovering Spatial Contexts for Traffic Flow Prediction
with Sparse Representation based Variable Selection”,IEEE
international conference on transportation, 2015, pp. 364-
367.
[3] YuqiWang , Jiannong Cao ,Wengen Li and Tao Gu
,“Mining Traffic Congestion Correlation between Road
Segments on GPS Trajectories”, IEEE International
Conference on Smart Computing, 2016, pp. 1-8.
[4] Hao-Fan Yang, Tharam S. Dillon, LifeFellow,and Yi-Ping
Phoebe Chen, “Optimized Structure of the Traffic Flow
Forecasting Model With a Deep Learning Approach”, IEEE
transactions on neural networks and learning systems,
vol.89, 2016, pp. 1-11.
[5] Jinyoung Ahn, Eunjeong Ko, Eun Yi Kim, “Highway
Traffic Flow Prediction using SupportVectorRegressionand
Bayesian Classifier”, IEEE International ConferenceBigData
on Smart Computing, 2016, pp. 239-244.
[6] Zhongsheng Hou, Senior Member, IEEE, and Xingyi Li,
“Repeatability and Similarity of Freeway Traffic Flow and
Long-Term Prediction Under Big Data”, IEEEtransactionson
intelligent transpiration system, vol. 17, 2016, pp. 1786-
1796.
[7] Jiwan Lee , Bonghee Hong , Kyungmin Lee and Yang-Ja
Jang , “A Prediction Model of Traffic Congestion Using
Weather Data”, IEEE International Conference on Data
Science and Data Intensive Systems, 2015, pp. 81-88.
[8] Zhiyuan Ma and Guangchun Luo, “Short Term Traffic
Flow Prediction Based on On-line Sequential Extreme
Learning Machine”, 8th International Conference on
Advanced Computational Intelligence, 2016, pp. 143-149.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1520
[9] Fuying Yu , Zhijie Song, “A MapReduce-Based Nearest
Neighbor Approach for Big-Data-Driven Traffic Flow
Prediction”, Sixth International Conference PICT,
Department of Computer Engineering
on Intelligent Systems Design and Engineering Applications,
vol.4, 2015, pp. 890-893.
[10] Hong-jun Yang , Xu Hu, “Wavelet neural network with
improved genetic algorithm for traffic flow time series
prediction”, Elsevier International Journal for Light and
Electron Optics, 2016, pp. 8103-8110.
[11] Xiangjie Kong , Zhenzhen Xu , Guojiang Shen , Jinzhong
Wang , Qiuyuan Yang, Benshi Zhang, “Urban traffic
congestion estimation and prediction based on floating car
trajectory data”, Elsevier future generation computer
system, vol. 6, 2016, pp. 97-107.
[12] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic
flow prediction with big data: A deep learning approach”,
IEEE Transaction Intelligent Transportation System,vol. 16,
2015, pp. 865-873.
[13] Dawen Xiaa, Binfeng Wanga, Huaqing Lic, Yantao Lia,
Zili Zhang, “A distributed spatial temporal weighted model
on MapReduce for short-term traffic flow forecasting”,
Neurocomputing, vol. 179, 2016, pp. 246-263.
[14] H. Hu, Y. Wen, T.S. Chua, and X. Li, “Toward scalable
systems for big data analytics: A technology tutorial”, vol.2,
IEEE Access, 2014, pp. 652-687.
[15] J. Zhang, F.Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C.
Chen, “Data- driven intelligent transportation systems: A
survey”, vol. 12, IEEE Transaction Intelligent Transportation
System, 2011, pp. 1624-1639.
[16] X. Chen and X. Lin, “Big data deep learning: Challenges
and perspectives”, vol. 2, IEEE Access, 2014, pp. 514-525.
[17] Highways England network journey time and traffic
flow data[Online]. Available:
https://data.gov.uk/dataset/highways-england-network-
journey-time-andtraffic-flow-data

More Related Content

What's hot (20)

PPT
Accessibility Analysis and Modeling in Public Transport Networks - A Raster b...
Beniamino Murgante
 
DOC
Travel time prediction using svm and wma
tanjil huda sany
 
PDF
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
Meditya Wasesa
 
PDF
CBI Paper for ISEHP 2016
David K. Hale, Ph.D.
 
PPTX
Building trip matrices from mobile phone data
JumpingJaq
 
PPTX
A Macroscopic Dynamic model integrated into Dynamic Traffic Assignment: advan...
JumpingJaq
 
PDF
Paper id 71201985
IJRAT
 
PDF
Utilizing GIS to Develop a Non-Signalized Intersection Data Inventory for Saf...
IJERA Editor
 
PDF
Design and-analysis-of-a-two-stage-traffic-light-system-using-fuzzy-logic-216...
hanhdoduc
 
PPTX
Where to from here? – a modelling methodology for measuring land-use and publ...
JumpingJaq
 
PDF
IRJET- Reducing electricity usage in Internet using transactional data
IRJET Journal
 
PDF
EGR Expo 2016 - EGR 402 - 38x48 poster - Easter (1)
Jonathan Easter
 
PDF
A Parallel Computing Model for Segmentation of Vehicle Number Plate through W...
theijes
 
PPTX
Traffic studies ORIGIN AND DESTINATION STUDIES
Davinderpal Singh
 
PDF
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
Neeraj Gupta
 
PDF
The International Journal of Engineering and Science (The IJES)
theijes
 
PPTX
Beyond Level of Service – Towards a relative measurement of congestion in pla...
JumpingJaq
 
PPTX
Transit Signalisation Priority (TSP) - A New Approach to Calculate Gains
WSP
 
PPTX
MetroRapid Transit Signal Priority—Using Technology to Improve Service Quality
Center for Transportation Research - UT Austin
 
PDF
A two Stage Fuzzy Logic Adaptive Traffic Signal Control for an Isolated Inter...
ijtsrd
 
Accessibility Analysis and Modeling in Public Transport Networks - A Raster b...
Beniamino Murgante
 
Travel time prediction using svm and wma
tanjil huda sany
 
The Seaport Service Rate Prediction System: Using Drayage Truck Trajectory Da...
Meditya Wasesa
 
CBI Paper for ISEHP 2016
David K. Hale, Ph.D.
 
Building trip matrices from mobile phone data
JumpingJaq
 
A Macroscopic Dynamic model integrated into Dynamic Traffic Assignment: advan...
JumpingJaq
 
Paper id 71201985
IJRAT
 
Utilizing GIS to Develop a Non-Signalized Intersection Data Inventory for Saf...
IJERA Editor
 
Design and-analysis-of-a-two-stage-traffic-light-system-using-fuzzy-logic-216...
hanhdoduc
 
Where to from here? – a modelling methodology for measuring land-use and publ...
JumpingJaq
 
IRJET- Reducing electricity usage in Internet using transactional data
IRJET Journal
 
EGR Expo 2016 - EGR 402 - 38x48 poster - Easter (1)
Jonathan Easter
 
A Parallel Computing Model for Segmentation of Vehicle Number Plate through W...
theijes
 
Traffic studies ORIGIN AND DESTINATION STUDIES
Davinderpal Singh
 
Transmisison & generation_expansion_planning_ijepes-d-13-00427_r1_accepted
Neeraj Gupta
 
The International Journal of Engineering and Science (The IJES)
theijes
 
Beyond Level of Service – Towards a relative measurement of congestion in pla...
JumpingJaq
 
Transit Signalisation Priority (TSP) - A New Approach to Calculate Gains
WSP
 
MetroRapid Transit Signal Priority—Using Technology to Improve Service Quality
Center for Transportation Research - UT Austin
 
A two Stage Fuzzy Logic Adaptive Traffic Signal Control for an Isolated Inter...
ijtsrd
 

Similar to Classification Approach for Big Data Driven Traffic Flow Prediction using Apache Spark (20)

PDF
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
IRJET Journal
 
PDF
TRAFFIC FORECAST FOR INTELLECTUAL TRANSPORTATION SYSTEM USING MACHINE LEARNING
IRJET Journal
 
PPTX
Predict Traffic flow with KNN and LSTM
Afzaal Subhani
 
PDF
Smart traffic forecasting: leveraging adaptive machine learning and big data ...
IAESIJAI
 
PPTX
Traffic Prediction from Street Network images.pptx
chirantanGupta1
 
PDF
Traffic Prediction for Intelligent Transportation System using Machine Learning
OmSuryawanshi9
 
PDF
WEKA-based machine learning for traffic congestion prediction in Amman City
IAESIJAI
 
PDF
Adaptive traffic lights based on traffic flow prediction using machine learni...
IJECEIAES
 
PDF
‏Spatial Data-Driven Traffic Flow Prediction Using Geographical Information S...
Journal of Soft Computing in Civil Engineering
 
PDF
Network Traffic Prediction Model Considering Road Traffic Parameters Using Ar...
yaswanthmamidisetty
 
PDF
0505.pdf
TadiyosHailemichael
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
Machine Learning Based Traffic Volume Count Prediction
IRJET Journal
 
PPTX
An effective joint prediction model for travel demands and traffic flows
ivaderivader
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Neural Network Based Parking via Google Map Guidance
IJERA Editor
 
PDF
Certain Analysis on Traffic Dataset based on Data Mining Algorithms
IRJET Journal
 
PDF
Towards a new intelligent traffic system based on deep learning and data int...
IJECEIAES
 
PDF
Deep Learning Neural Network Approaches to Land Use-demographic- Temporal bas...
civejjour
 
PDF
Deep Learning Neural Network Approaches to Land Use-demographic- Temporal bas...
civejjour
 
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
IRJET Journal
 
TRAFFIC FORECAST FOR INTELLECTUAL TRANSPORTATION SYSTEM USING MACHINE LEARNING
IRJET Journal
 
Predict Traffic flow with KNN and LSTM
Afzaal Subhani
 
Smart traffic forecasting: leveraging adaptive machine learning and big data ...
IAESIJAI
 
Traffic Prediction from Street Network images.pptx
chirantanGupta1
 
Traffic Prediction for Intelligent Transportation System using Machine Learning
OmSuryawanshi9
 
WEKA-based machine learning for traffic congestion prediction in Amman City
IAESIJAI
 
Adaptive traffic lights based on traffic flow prediction using machine learni...
IJECEIAES
 
‏Spatial Data-Driven Traffic Flow Prediction Using Geographical Information S...
Journal of Soft Computing in Civil Engineering
 
Network Traffic Prediction Model Considering Road Traffic Parameters Using Ar...
yaswanthmamidisetty
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
Machine Learning Based Traffic Volume Count Prediction
IRJET Journal
 
An effective joint prediction model for travel demands and traffic flows
ivaderivader
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Neural Network Based Parking via Google Map Guidance
IJERA Editor
 
Certain Analysis on Traffic Dataset based on Data Mining Algorithms
IRJET Journal
 
Towards a new intelligent traffic system based on deep learning and data int...
IJECEIAES
 
Deep Learning Neural Network Approaches to Land Use-demographic- Temporal bas...
civejjour
 
Deep Learning Neural Network Approaches to Land Use-demographic- Temporal bas...
civejjour
 
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
PDF
Kiona – A Smart Society Automation Project
IRJET Journal
 
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
PDF
Breast Cancer Detection using Computer Vision
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
PDF
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
PDF
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PPT
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
PPTX
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
PDF
Data structures notes for unit 2 in computer science.pdf
sshubhamsingh265
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PDF
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPT
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
Data structures notes for unit 2 in computer science.pdf
sshubhamsingh265
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
DATA BASE MANAGEMENT AND RELATIONAL DATA
gomathisankariv2
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
Design Thinking basics for Engineers.pdf
CMR University
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 

Classification Approach for Big Data Driven Traffic Flow Prediction using Apache Spark

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1516 CLASSIFICATION APPROACH FOR BIG DATA DRIVEN TRAFFIC FLOW PREDICTION USING APACHE SPARK Riya Gandewar1, Anupama Phakatkar2 1Student, Dept. of Computer Engineering, PICT college, Pune, Maharashtra 2 Professor, Dept. of Computer Engineering, PICT college, Pune, Maharashtra ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Traffic problems are crucial issues in the rapidly developing society. Traffic flow prediction is an important problem in Intelligent Transportation Systems. Over the last few years, traffic data have beenexploding,and we have truly entered the era of big data for transportation. In big data driven traffic flow prediction systems, accuracy and timeliness affects the robustness of prediction performance. Existing traffic flow prediction methods uses different classification approaches, dealing with accuracy and processing time problems.Toovercometheseproblems, the system uses K-Nearest Neighbors (KNN) for classification and Convolution Neural Network (CNN) for prediction of traffic flow. The KNN is used to find out the patterns of route, that’s how much time should be required from one destination to another. The CNN is used to predict the traffic flow in the particular route. The traffic flow data processing is done in Apache Spark. The output of the proposed system is, routes which have minimum predicted traffic flow. It is observed that, the proposed system for traffic flow prediction has superior performance. Key Words: Big Data, Classification, Prediction, Hadoop, Apache Spark 1.INTRODUCTION Accurate and timely traffic flow information are strongly needed for individual travellers, business sectors, and government agencies. Traffic flow prediction has gained more and more attention with the rapid development and deployment of Intelligent Transportation Systems (ITSs). It collects traffic data such as traffic volume and speed on every road and provide statistical summary services,usually on traffic congestion [15]. Traffic congestion effects on Vehicular queuing, travel time, cost, fuel consumption, pollution in the environment. In a big city, the change of traffic flow has a large impact on people’s daily life, such as the route selection for drivers. The big data generated by the Intelligent Transportation Systems are worth further exploring to traffic management. The introduction and development of the Intelligent Transport System have resulted in more reliable traffic information gathering, analyzing, and processing, thereby providing more time-relevant and precise traffic analysis and prediction to users. Traffic flow disruptions can be categorized as predictable and unpredictable. Predictable disruptions include traffic signals, stop signs, public transit services, scheduled sport events, music concerts, road constructions etc. Unpredictable disruptions include auto mobile accidents, breakdowns, and emergency road closures. The impact of disruption to traffic flow depends on the location, the duration of the disruption [9]. Due to unpredictable disruptions, long-termforecastingmaynot be accurate enough for practical use. However, short-term traffic prediction, if properly done, may reach an accuracy level that is useful for several applications, e.g., we may just want to know the traffic flow volume within 10 min to 30 min for deciding our route while driving. Traffic flow conditions are extremely uncertain in current complex transportation network situation, due to the heterogeneous and dynamic nature of traffic to nonlinear interactions between drivers and environments. Moreover, the traffic state of a specific location is highly influenced by its upstream and downstream traffic conditions besides the traffic conditions in the past and future periods,andthusthe spatial and temporal correlations are the inherent features of traffic flow. Most of traditional classification approaches take the traffic flows as the individual and independent instances, which do not consider thecorrelationinformation among traffic flows. In the last several decades, there have been manystudiesfor predicting short term traffic flow. Recently, there have been various traffic flow prediction systems, models and algorithms using statistics-based approaches and computational intelligence-based approaches. Despite the popularity, existing work tends to suffer from the various problems. Firstly, in a city where traffic jam occurs, the traffic flows at different locations will influence each other. In a complicated traffic network, the influences between different locations are decided by traffic network structure, signal light, commercial layout, etc. Secondly, it requires the storage of the whole training set which would be an excessive amount of storage for large data sets and leads to a large computation time in the classification stage. The existing methods built the model based on time-series data regardless of spatial influences. Most of the existing
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1517 studies, adopt few useful features, which cannot provide sufficient information for forecasting. In recent years, deep learning has drawn a lotofattention on the tasks of classification, natural language processing, dimensionalityreduction,objectdetection,motionmodeling, etc. The method first uses the deep network to rebuild input features, and then train neural network for predicting the traffic flow [12]. The proposed system uses a convolution neural network approach under Apache Spark on Hadoop platform. Convolution Neural Network (CNN) is used to forecast traffic flow and KNN is used to find the spatial influences that exist in locations. The input to the method is historical traffic flow data from target locationandhistorical data about other nearby locations. These data are grouped together to build a feature matrix for prediction. This proposed approach for real-time traffic flow prediction has superior performance by speeding up the classification process and reducing the storage requirements and processing time. More importantly, with reasonable execution time, the classification performance can be significantly improved by flow correlationinformation,even under the extremely difficult circumstance of very large training samples. 2. RELATED WORK Researchers have been trying to make traffic flow forecasting more accurate since last few years. Based on the length of time interval, there are long-term forecasting and short-term forecasting. Long-term forecasting indicates making a prediction by month or by year. The volume of traffic flow is large and relatively stable, and it is slightly affected by the daily accident. For example, Zhong sheng Hou et al. [6] proposed a method based on ARIMA to forecast traffic flow in a month after they revealed that their time series data had the long term trend and the fluctuations at the 12-hour time scale. For short term forecasting, auto regressive integrated moving average (ARIMA) models, and artificial neural network (ANN) [1] models are widely exploited. Xiangjie Kong et al. [11] proposed a plan moving average algorithm for utilizing previous days historical data. In addition, some researchers have compared the seasonal ARIMA (SARIMA) model, which is a TSA model, and showed its excellent prediction performance. Fuying Yu et al.[9] compared SARIMAandnonparametric(data-drivenregression)models. The authors in this research concluded that the SARIMA model has better performance than the nonparametric regression models. H. Hu et al.[14] additionally compared time-series analysis methods and SVR models and concluded that SARIMA showed the best performance. In addition, there are some integration models for this task. For example, Su Yang et al. [2] proposed a 3-stage model which integrates ARIMA and ANN, which uses the ARIMA forecasting data as a part of the input of ANN. X. Chen et al. [16] studied an ensemble model which consists of a statistical method and a neural network bagging model. Yuqi Wang and Wengen Li proposed method for predicting traffic congestion correlation between road segments on GPS trajectories [3]. Method extract various features on each pair of road segments from road network. The result of this is input to the several classifiers to predict congestion correlation. It use classifiers like decision tree, Logistic regression, Random forest and Support vector machines. Besides, researchers also consider the use of integrated data. Hao-Fan Yang et al. [4] proposed a Gaussian mixture model clustering (GMM) method to partition the data set for training ANN. Deep learningmethodshavealsogainedalotof attention recently. Jinyoung Ahn et al. [5] used a stacked autoencoder (SAE) tolearn generic traffic flow features. Y.Lv et al. [12] applieda deep belief networks modelintrafficflow prediction, which adopts multitask learning to reduce the error. To sum up, all the above-mentioned methods have many desirable properties in different disciplines, and thus it is hard to conclude that which one of these is significantly superior to other methods in any situation. One of the best essential reasons is that the accuracy of prediction models which are developed with small scale separatespecifictraffic data depends on the traffic flow features embedded in the collected traffic data. Furthermore, most of the existing models are performed in stand-alone models, and thus the computational effort is expensive [7] and the capability of data processing and storage is restricted. However, researchers also developed a general architecture of distributed modeling in a MapReduce framework for traffic flow forecasting[8], to efficiently process large-scale traffic data on a Hadoop platform. As of now, there is diversified research in all the techniques and platform. The proposed work suggests, Traffic flow prediction of large volume traffic data using a classification approach in Apache Spark. 3. PROPOSED SYSTEM 3.1 Proposed Architecture Fig. 1 shows architectural design of proposed system. Following are important modules in the system : Storage System : Apache HDFS is used as an underFS storage system. User’s CSV files, trained models, testing and training data set stored in HDFS. Data processing is done through Apache Spark.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1518 Computation Framework : Apache Spark is used for computation. Data classifier and predictorisbuiltinSpark.It access traffic flow data in CSV format. Spark’s Mlib is used for training KNN and CNN model. Preprocessing : The module fetchesdata accordingtouser’s query, to analyze the data and to find the minimum average flow, average speed and average journey time. Data Classifier and predictor : The input of K-Nearest Neighbor model is user’s current location, user’sdestination location and timestamp. It gets the current time and other related values. It collects all parameters such as traffic flow, average speed. Distance is calculated from source to destination considering different routes. The top k routes with their respective parameters are fetched from this. The input to the CNN is traffic flow, average speed, average journey time. All routes are fetched from KNN and calculate weight for each route. Generate a hash map and store each link and traffic flow in it. From polling layer, more accurate values will be calculated. Store all list into result. Fig 1 : Traffic Flow Prediction System architecture Web Application : User registers for traffic flow prediction system using a web application, providing their own details. After successfully login to site, user can search for places where they want to go at a particular time, to know the traffic flow at that particular place. After analyzing data, the system gives output as different routesconsideringtheinput and traffic flow of that place. 3.1 System Algorithm 1. KNN CLASSIFICATION ALGORITHM Input: User’s current location, User’s destination location, Timestamp Output: Link description collection with path details KNN []. Step 1: Get current time and other related values. Step 2: Collect all average total values and traffic flow. Step 3: Calculate distance using below formula. L=Wi1,Wi2,Wi3,……,Win Weight of each list C = c1,c2,…..,cn clusters of each list Step 4: UrlList = URL1(w), URL2(w).. URLn(w) Step 5: Add each UL to KNN []. Step 6: Return KNN []. 2. CNN PREDICTION ALGORITHM Input: Output list of KNN []. Output: Single or double Link description with path details Step 1: For each (Link k to KNN) Step 2: Get each object KNN[k] Step 3: Calculate each object weight Step 4: Generate hashmap <double, string> Store each link into Map <x, object> end for Step 5: Polling layer : SortMap(Map) Step 6: Find first 3 objects from list Step 7: Store all list into Res[]. Step 8: Return Res[]. 4. RESULTS 4.1 Performance Evaluation The performance of the algorithm is measured on following parameters: 1. Accuracy 2. System Response Time We use two widely employed evaluation measures to assess the forecast performance: The Root Mean Squared Error (RMSE) is a way to measure the average error of the forecasting results and is calculated by Here, xt and xp are the actual and the forecast values.nisthe total number of locations.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1519 The Mean Relative Error (MRE) is a way to measure the proportional error oftheforecastingresultsandiscalculated by 4.2 Analysis and Discussion Lower the RMSE higher the system accuracy, Fig. 2 shows Root Mean Squared Error of Single CNN, DGCNN and Proposed System from which we can conclude that our proposed system have higher accuracy than the existing system. Fig 2 : RMSE for Single CNN, DGCNN and Proposed System Lower the MRE higher the system accuracy, Fig. 3 shows Mean Relative Error of Single CNN, DGCNN and Proposed System from which we can conclude that our proposed system have higher accuracy than the existing system. Fig 3 : MRE for Single CNN, DGCNN and Proposed System 5. CONCLUSION A system is proposed to predict the traffic flow from historical traffic flow data. Whenever user search fora route, then only that data will be extracted from historical traffic flow data. The K-Nearest Neighbor algorithm is used for finding K-Nearest Neighborsfrom source to destination. The Convolution Neural Network algorithm is used to predict traffic flow from those nearest neighbors. The system gives output as predicted traffic flow with their respective routes. Proposed system improves the accuracy of traffic flow prediction system and reduces the prediction time by a factor of 2.5x . REFERENCES [1] Fuying Yu , Zhijie Song, “The Short-Term Traffic Flow Prediction Method Based on DetectorsPSOAlgorithm”, Sixth International Conference on Intelligent Systems Design and Engineering Applications, 2015, pp. 890-893. [2] Su Yang, Shixiong Shi, Xiaobing Hu, Minjie Wang , “Discovering Spatial Contexts for Traffic Flow Prediction with Sparse Representation based Variable Selection”,IEEE international conference on transportation, 2015, pp. 364- 367. [3] YuqiWang , Jiannong Cao ,Wengen Li and Tao Gu ,“Mining Traffic Congestion Correlation between Road Segments on GPS Trajectories”, IEEE International Conference on Smart Computing, 2016, pp. 1-8. [4] Hao-Fan Yang, Tharam S. Dillon, LifeFellow,and Yi-Ping Phoebe Chen, “Optimized Structure of the Traffic Flow Forecasting Model With a Deep Learning Approach”, IEEE transactions on neural networks and learning systems, vol.89, 2016, pp. 1-11. [5] Jinyoung Ahn, Eunjeong Ko, Eun Yi Kim, “Highway Traffic Flow Prediction using SupportVectorRegressionand Bayesian Classifier”, IEEE International ConferenceBigData on Smart Computing, 2016, pp. 239-244. [6] Zhongsheng Hou, Senior Member, IEEE, and Xingyi Li, “Repeatability and Similarity of Freeway Traffic Flow and Long-Term Prediction Under Big Data”, IEEEtransactionson intelligent transpiration system, vol. 17, 2016, pp. 1786- 1796. [7] Jiwan Lee , Bonghee Hong , Kyungmin Lee and Yang-Ja Jang , “A Prediction Model of Traffic Congestion Using Weather Data”, IEEE International Conference on Data Science and Data Intensive Systems, 2015, pp. 81-88. [8] Zhiyuan Ma and Guangchun Luo, “Short Term Traffic Flow Prediction Based on On-line Sequential Extreme Learning Machine”, 8th International Conference on Advanced Computational Intelligence, 2016, pp. 143-149.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 09 | Sep -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1520 [9] Fuying Yu , Zhijie Song, “A MapReduce-Based Nearest Neighbor Approach for Big-Data-Driven Traffic Flow Prediction”, Sixth International Conference PICT, Department of Computer Engineering on Intelligent Systems Design and Engineering Applications, vol.4, 2015, pp. 890-893. [10] Hong-jun Yang , Xu Hu, “Wavelet neural network with improved genetic algorithm for traffic flow time series prediction”, Elsevier International Journal for Light and Electron Optics, 2016, pp. 8103-8110. [11] Xiangjie Kong , Zhenzhen Xu , Guojiang Shen , Jinzhong Wang , Qiuyuan Yang, Benshi Zhang, “Urban traffic congestion estimation and prediction based on floating car trajectory data”, Elsevier future generation computer system, vol. 6, 2016, pp. 97-107. [12] Y. Lv, Y. Duan, W. Kang, Z. Li, and F.-Y. Wang, “Traffic flow prediction with big data: A deep learning approach”, IEEE Transaction Intelligent Transportation System,vol. 16, 2015, pp. 865-873. [13] Dawen Xiaa, Binfeng Wanga, Huaqing Lic, Yantao Lia, Zili Zhang, “A distributed spatial temporal weighted model on MapReduce for short-term traffic flow forecasting”, Neurocomputing, vol. 179, 2016, pp. 246-263. [14] H. Hu, Y. Wen, T.S. Chua, and X. Li, “Toward scalable systems for big data analytics: A technology tutorial”, vol.2, IEEE Access, 2014, pp. 652-687. [15] J. Zhang, F.Y. Wang, K. Wang, W.-H. Lin, X. Xu, and C. Chen, “Data- driven intelligent transportation systems: A survey”, vol. 12, IEEE Transaction Intelligent Transportation System, 2011, pp. 1624-1639. [16] X. Chen and X. Lin, “Big data deep learning: Challenges and perspectives”, vol. 2, IEEE Access, 2014, pp. 514-525. [17] Highways England network journey time and traffic flow data[Online]. Available: https://data.gov.uk/dataset/highways-england-network- journey-time-andtraffic-flow-data