1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Building a Data Analytics
PaaS for Smart Cities
Smiti Sharma, EMC Virtustream
Keith Manthey, EMC ETD
BRDC
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Intelligent Communities
Cities and Regions that use technology not just to
save money or make things work better, but also to
create high-quality employment, increase citizen
participation and become great places to live and
work.
ICF – Intelligent Community Forum
3© Copyright 2015 EMC Corporation. All rights reserved.
“Smart Cities that use Big Data are neither about intuition nor about
looking back and analyzing what went wrong and could be better.
They spot patterns. They look forward. They predict potential crisis situations.
They find what could be better and make it better.
Smart cities don’t guess.
Theyaresure!
4© Copyright 2015 EMC Corporation. All rights reserved. 4© Copyright 2015 EMC Corporation. All rights reserved.
VISION FOR CITIES OF THE FUTURE
Become an innovative city
SAFE
Anticipate risks and
protect people and
information
EFFICIENT
Optimized use of
city resources
SEAMLESS
Integrated daily
life services
IMPACTFUL
Enriched life and
business experiences
for all
5© Copyright 2015 EMC Corporation. All rights reserved. 5© Copyright 2015 EMC Corporation. All rights reserved.
IMPLICATIONS TO THE CITY
Empower the city, citizens, visitors, and businesses
IMPROVE
Quality of
urban living
CREATE
Efficient city
and transparent
government
DEVELOP
Vital
economy
REDUCE
Environmental
impact
ADDRESS
Infrastructure,
buildings and
urban planning
IMPROVE
Tourism,
recreation,
and city image
6© Copyright 2015 EMC Corporation. All rights reserved. 6© Copyright 2015 EMC Corporation. All rights reserved.
“BIG DATA” ENABLESCITIES OF THE FUTURE
Any data-set that cannot be processed with traditional systems
Social Networks, UGCPublic records
Location DataInternet of things
Emerging Data Sources
Unstructured Data
Dark Data
Structured Data
Traditional Data Sources
7© Copyright 2015 EMC Corporation. All rights reserved. 7© Copyright 2015 EMC Corporation. All rights reserved.
BUILD SMART CITY: THE PROBLEM
Understand the city data challenge
Geo
Distributed
Data
Source
Satellite-borne
Imaging Device
Airborne
Imaging
Device
Webcam
Environmental
Monitor
Health
Monitor
Traffic
monitor
Industrial
Process
Monitor
Data Center
Centralized Storage
and Analytics Systems
City Network
DATA CHALLENGE
There are massive endpoints for these systems.
How to manage massive and heterogeneous data
becomes an enormous challenge.
Diverse data sources requires normalization
and standardization to address Data
Orchestration and integration.
DATA USAGE CHALLENGE
How could we create some innovative business
to use these data to create more value
and fully use current investment?
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Architecture
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Guiding Principles
Agile
Open
Portable
Extensible Modular/Ftl.
Blocks
Analytics
Driven
Software
Defined
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Architecture
Ingestion Layer
Spring XD
Transformation Layer
Python
/Transformed_Files
KPIs
Métricas
exploration
Maps & Graphs
Visualization Layer
API
Open Data
Data Integration Layer
Python
/Transformed_Files
Schema and
Instance
Alignment
Data Sources
GUIs, Dashboard that access the
underlying databases and promote an
excellent User experience
Data modelling, metrics , ETL
mechanisms, definitions and
variable selectionProprietary and
Open Data
sources
APIs to expose data
Analytics
Data mining prediction
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Ingestion
Ingestion Layer
Spring XD
Data Sources
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Transformation
Transformation Layer
Python
/Transformed_Files
Data Cleaning Conversion
13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Integration
Schema
Integration
Instance
Alignment
Integration Layer
Python
/Transformed_Files
Schema and
Instance
Alignment
14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Integration
• Schema DB (INPUT)
• Schema Matching (Algorithms &
Heuristics)
• Suggest Attribute Mappings
(OUTPUT : SEMI-SUPERVISED)
• Instances of DB tables & Integration Rules (INPUT)
• Deduplication, Record Consolidation (Algorithms & Heuristics)
• Instance alignment using 2 phase-pass algorithm to avoid
duplicate insertion in a semi supervised data integration
tool)
• Attribute name similarity: fuzzy string comparisons
(cosine similarity)
• Levenshtein similarity: Categorical/String Data
• http://pgsimilarity.projects.pgfoundry.org/
• List of deduplicated instances (OUTPUT - SEMI-SUPERVISED)
Schema
Integration
Instance
Alignment
Integration Layer
Python
/Transformed_Files
Schema and
Instance
Alignment
15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Integration
Schema
Integration
Instance
Alignment
Camada de Integração
Python
/Transformed_Files
Schema and
Instance
Alignment
Deduplication
Similarity Join
Mapeamento de
atributos (Inserir)
Mapeamento de
atributos (Selecionar)
Cosine
Levenshtein
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Visualization
KPIs
Métricas
exploration
Maps & Graphs
Visualization Layer
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
API implementation
API
Open Data
18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Example use Case
Transportation
19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• Available Data:
– City bus movement information from on board devices (lat-
long, time, date, bus line, bus ID)
• Goals:
– Predict the time of the arrival in a bus stop
• Challenges
– Lack of data in certain areas of the city
– GPS precision
Prediction of bus arrival
20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Architecture
GPS Ônibus Gemfire XD
Routes & Bus Stop
Data Lake
Streaming
Scheduler
Lazy-write
GPS
21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• Use each bus stop as a node, and the street as
edges
Transportation Network
𝑥𝑖 𝑥𝑗
𝑎𝑖𝑗 = +1
𝑎𝑖𝑗 = −1
𝑋 = 𝑥1, … , 𝑥 𝑁 , 𝑥𝑖 = 𝑙𝑎𝑡 𝑖 , 𝑙𝑜𝑛𝑔(𝑖)
𝐸 = 𝑒1, … , 𝑒 𝑀 , 𝑒 𝑘 = (𝑥𝑖, 𝑥𝑗)
22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• The goal is to find, for each 𝑒𝑗, an estimation of the
average speed in a instant 𝑡, 𝑣 𝑒𝑗, 𝑡 .
• Default model - estimate the velocity in each edge,
using historical data from the last month.
– Different hourly models for each day of the week
• Online Model - Use real-time date to calculate the
speed.
The model
23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Average speed(km/h)
Default Model
26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Prediction of Bus Arrival
27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• Based on the information of last use case,
extrapolate to verify the quality of the service
• Need to identify each bus trip, to evaluate the time
interval between two buses of the same line, at each
bus stop.
Another use case - Auditing
28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Screen shot - Auditing
29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Quality Issues
Route A
Route B
Bus GPS
30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Extending
PaaS for
Smart Cities
31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-time
Dashboard
Personal
Dashboard
Government
Transactional
Applications
Commercial
Big Data
Application
Government
Big Data
Application
Commercial
Transactional
Applications
Unified
Control
Center
Application
Layer
Security
Rules
Payment
Gateway
Trust
Authentication
Identity
Management
Locations &
Mapping
Platform as
a Service
Data
Governance
DATA ANALYTICS TOOLS Historic & Predictive/DATA APIs
Transactiona
l Data Store
Data
Transformation
Unstructured
Data
Structured
Data
City
Semantics
Audit
Open Standards Data Ingestion Interfaces and Storage
CITY IoT INFRASTRUCTURE CITY DATA SOURCES CITY ICT INFRASTRUCTURE
Government
Devices
Commercia
l Devices
Utility
Devices
Personal
Devices
IoT Data Aggregation
Governmen
t Systems
Social
Media
Commercial
Systems
Archived
Data
Fixed &
Wireless
Networks
Cloud
Services
Enablement
Layer
Data
Orchestration
Layer
Infrastructure
Layer
SECURITY
Smart City Platform requirements
32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-time
Dashboard
Personal
Dashboard
Government
Transactional
Applications
Commercial
Big Data
Application
Government
Big Data
Application
Commercial
Transactional
Applications
Unified
Control
Center
Application
Layer
Security
Rules
Payment
Gateway
Trust
Authentication
Identity
Management
Locations &
Mapping
Platform as
a Service
Data
Governance
DATA ANALYTICS TOOLS Historic & Predictive/DATA APIs
Transactiona
l Data Store
Data
Transformation
Unstructured
Data
Structured
Data
City
Semantics
Audit
Open Standards Data Ingestion Interfaces and Storage
CITY IoT INFRASTRUCTURE CITY DATA SOURCES CITY ICT INFRASTRUCTURE
Government
Devices
Commercia
l Devices
Utility
Devices
Personal
Devices
IoT Data Aggregation
Governmen
t Systems
Social
Media
Commercial
Systems
Archived
Data
Fixed &
Wireless
Networks
Cloud
Services
Enablement
Layer
Data
Orchestration
Layer
Infrastructure
Layer
SECURITY
High level Smart City Platform components
PCF Pivotal Cloud Foundry
E M C S T O R A G E
IISILON and./or
CLOUD NATIVE SOFTWARE DEFINED STORAGE
V M W A R E v R e a l i z e C l o u d S u i t e & B I G D A T A
E X T E N S I O N S
P I V O T A L B I G D A T A S U I T E
A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S
A T S C A L E
D A T A
P R O C E S S I N G
GREENPLUM
DATABASE
HAWQ
SPRING XD SPARK
REDIS
RABBITMQ
GEMFIRE
H A D O O P
33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal GPDB Delivers
 Massively Parallel
Analytics Performance
 In-Database Analytical
Extensions
 Industry-Leading Load
Speed
 Rich SQL with Schema
Agnosticism
 Industry-Leading
Workload Mgmt.
 SAS Acceleration
Options
 Parallel Co-Processing
with Hadoop
 No-Forklift Scalability
 Multi-Level
Redundancy
 Rich, Easy-to-Use
Administration Tools
 Big Data Backup
 Comprehensive
Security
 Software-only or DCA
34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Simple to manage
Single file system, single volume, global namespace
Massively scalable
Scales from 16 TB to over 50 PB in a single cluster
200GB/s throughput, 3.75M IOPS
Unmatched efficiency
Over 80% storage utilization, automated tiering and SmartDedupe
Enterprise data protection
Efficient backup and disaster recovery, and N+1 thru N+4 redundancy
Robust security and compliance options
RBAC, Access Zones, WORM data security, File System Auditing
Data At Rest Encryption with SEDs, STIG hardening
CAC/PIV Smartcard authentication, FIPS OpenSSL support
Operational flexibility
Multi-protocol support including NFS, SMB, HTTP, FTP and HDFS
Object and Cloud computing including OpenStack Swift
Isilon Scale-Out NAS
35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Lots of Little Files
Hadoop Impact on Telemetry
AKA - Small Files Problem for Hadoop
Rio Smart Sensors - ESRI
NameNode = 512 GB for RAM
Each file eats away 1K in RAM
512GB / 1K = At Most 500M Files
assuming no other processes on the
box.
Rio has 12.5K sensors for the 2016
Olympics. Assuming each sensor
sent a file every minute, 18M files in
1 day.
EMC believes in storing Metadata
on SSD. This allows a scale out
for the NameNode to get around
the limitations of file growth on
the scale-up NameNode.
Building a Data Analytics PaaS for Smart Cities

More Related Content

PPTX
Creating the Smart Transportation Infrastructure of the Future
PPTX
San Antonio’s electric utility making big data analytics the business of the ...
PDF
The case of vehicle networking financial services accomplished by China Mobile
PDF
Industrial Internet
PDF
GITEX Big Data Conference 2014 – SAP Presentation
PDF
Making Enterprise Big Data Small with Ease
PDF
The API Lie
PPTX
The Single Most Important Formula for Business Success
Creating the Smart Transportation Infrastructure of the Future
San Antonio’s electric utility making big data analytics the business of the ...
The case of vehicle networking financial services accomplished by China Mobile
Industrial Internet
GITEX Big Data Conference 2014 – SAP Presentation
Making Enterprise Big Data Small with Ease
The API Lie
The Single Most Important Formula for Business Success

What's hot (20)

PPTX
Cloud-Con: Integration & Web APIs
PPTX
SnapLogic Live: Enabling the Citizen Integrator
PPTX
The Life of an Internet of Things Electron
PPTX
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
PDF
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
PPTX
Hadoop for Humans: Introducing SnapReduce 2.0
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PPTX
How data modelling helps serve billions of queries in millisecond latency wit...
PDF
Postgres Vision 2018: How to Consume your Database Platform On-premises
 
PDF
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
PDF
Driving Digital Transformation Through Global Data Management
PDF
The Impact of SMACT on the Data Management Stack
PDF
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
PPTX
IoT meets AI in the Clouds
PDF
Deep Learning Image Processing Applications in the Enterprise
PDF
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
PDF
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
PPTX
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...
Cloud-Con: Integration & Web APIs
SnapLogic Live: Enabling the Citizen Integrator
The Life of an Internet of Things Electron
Global Big Data Conference Hyderabad-2Aug2013- Finance/Manufacturing Use Cases
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Hadoop for Humans: Introducing SnapReduce 2.0
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
How data modelling helps serve billions of queries in millisecond latency wit...
Postgres Vision 2018: How to Consume your Database Platform On-premises
 
Webinar: It's the 21st Century - Why Isn't Your Data Integration Loosely Coup...
Driving Digital Transformation Through Global Data Management
The Impact of SMACT on the Data Management Stack
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
IoT meets AI in the Clouds
Deep Learning Image Processing Applications in the Enterprise
Powering the Intelligent Edge: HPE's Strategy and Direction for IoT & Big Data
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Webinar: Introducing the SnapLogic Elastic Integration Platform Summer 2014 R...

Similar to Building a Data Analytics PaaS for Smart Cities (20)

PDF
Rio Info 2015 - Projetos de Big Data no Setor Público - Karin Breitman
PDF
EMC's IT Transformation Journey ( EMC Forum 2014 )
 
PPTX
Vitaly Kozlovsky
PDF
Powering Dynamic M2M Event Processing with OSGi - W Bowers
PPTX
Digital Transformation. Examples from Automotive Industry
PDF
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
PDF
End-to-End and e-Business Value from the Telematics Reference Implementation ...
PDF
What is IoT and how Modulus and Pacific can Help - Featuring Node.js and Roll...
PPTX
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
PDF
Smart & Safer Cities by Richard Knight
PPTX
How Spark Enables the Internet of Things- Paula Ta-Shma
PPTX
Cloud Infrastructure and Services (CIS) - Webinar
 
PPTX
Petit Déjeuner Expert Aproged 3ème Plateforme par Alain Le Corre / EMC
PDF
Multi Smart Parking System
PDF
Robert Harrison, WMG - IIoT and Industry 4.0 in Automation Systems Engineering
PPT
Iit 1782 designing for the internet of things (io t) v4 gb
PDF
IRJET- Smart Parking System in Multi-Storey Buildings
PPTX
Cloud Native Applications - DevOps, EMC and Cloud Foundry
PDF
Emmebrochure 4
PDF
A tool to enable cities embrasse Smart Mobility
Rio Info 2015 - Projetos de Big Data no Setor Público - Karin Breitman
EMC's IT Transformation Journey ( EMC Forum 2014 )
 
Vitaly Kozlovsky
Powering Dynamic M2M Event Processing with OSGi - W Bowers
Digital Transformation. Examples from Automotive Industry
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
End-to-End and e-Business Value from the Telematics Reference Implementation ...
What is IoT and how Modulus and Pacific can Help - Featuring Node.js and Roll...
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
Smart & Safer Cities by Richard Knight
How Spark Enables the Internet of Things- Paula Ta-Shma
Cloud Infrastructure and Services (CIS) - Webinar
 
Petit Déjeuner Expert Aproged 3ème Plateforme par Alain Le Corre / EMC
Multi Smart Parking System
Robert Harrison, WMG - IIoT and Industry 4.0 in Automation Systems Engineering
Iit 1782 designing for the internet of things (io t) v4 gb
IRJET- Smart Parking System in Multi-Storey Buildings
Cloud Native Applications - DevOps, EMC and Cloud Foundry
Emmebrochure 4
A tool to enable cities embrasse Smart Mobility

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
PPT
State of Security: Apache Spark & Apache Zeppelin
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
PDF
Enabling Digital Diagnostics with a Data Science Platform
PDF
Revolutionize Text Mining with Spark and Zeppelin
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
PDF
Hadoop Crash Course
PDF
Data Science Crash Course
PDF
Apache Spark Crash Course
PDF
Dataflow with Apache NiFi
PPTX
Schema Registry - Set you Data Free
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
PPTX
Mool - Automated Log Analysis using Data Science and ML
PPTX
How Hadoop Makes the Natixis Pack More Efficient
PPTX
HBase in Practice
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
PPTX
Backup and Disaster Recovery in Hadoop
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Data Science Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop

Recently uploaded (20)

PDF
Human Computer Interaction Miterm Lesson
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PPTX
maintenance powerrpoint for adaprive and preventive
PDF
Streamline Vulnerability Management From Minimal Images to SBOMs
PDF
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
PPTX
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PPTX
Information-Technology-in-Human-Society.pptx
PPTX
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
PPTX
How to use fields_get method in Odoo 18
PDF
Child-friendly e-learning for artificial intelligence education in Indonesia:...
PPTX
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
PDF
Optimizing bioinformatics applications: a novel approach with human protein d...
PDF
substrate PowerPoint Presentation basic one
PPTX
Presentation - Principles of Instructional Design.pptx
PPTX
From Curiosity to ROI — Cost-Benefit Analysis of Agentic Automation [3/6]
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Human Computer Interaction Miterm Lesson
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
maintenance powerrpoint for adaprive and preventive
Streamline Vulnerability Management From Minimal Images to SBOMs
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
EIS-Webinar-Regulated-Industries-2025-08.pdf
Information-Technology-in-Human-Society.pptx
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
Build automations faster and more reliably with UiPath ScreenPlay
Uncertainty-aware contextual multi-armed bandits for recommendations in e-com...
How to use fields_get method in Odoo 18
Child-friendly e-learning for artificial intelligence education in Indonesia:...
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
Optimizing bioinformatics applications: a novel approach with human protein d...
substrate PowerPoint Presentation basic one
Presentation - Principles of Instructional Design.pptx
From Curiosity to ROI — Cost-Benefit Analysis of Agentic Automation [3/6]
Report in SIP_Distance_Learning_Technology_Impact.pptx
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf

Building a Data Analytics PaaS for Smart Cities

  • 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Building a Data Analytics PaaS for Smart Cities Smiti Sharma, EMC Virtustream Keith Manthey, EMC ETD BRDC
  • 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Intelligent Communities Cities and Regions that use technology not just to save money or make things work better, but also to create high-quality employment, increase citizen participation and become great places to live and work. ICF – Intelligent Community Forum
  • 3. 3© Copyright 2015 EMC Corporation. All rights reserved. “Smart Cities that use Big Data are neither about intuition nor about looking back and analyzing what went wrong and could be better. They spot patterns. They look forward. They predict potential crisis situations. They find what could be better and make it better. Smart cities don’t guess. Theyaresure!
  • 4. 4© Copyright 2015 EMC Corporation. All rights reserved. 4© Copyright 2015 EMC Corporation. All rights reserved. VISION FOR CITIES OF THE FUTURE Become an innovative city SAFE Anticipate risks and protect people and information EFFICIENT Optimized use of city resources SEAMLESS Integrated daily life services IMPACTFUL Enriched life and business experiences for all
  • 5. 5© Copyright 2015 EMC Corporation. All rights reserved. 5© Copyright 2015 EMC Corporation. All rights reserved. IMPLICATIONS TO THE CITY Empower the city, citizens, visitors, and businesses IMPROVE Quality of urban living CREATE Efficient city and transparent government DEVELOP Vital economy REDUCE Environmental impact ADDRESS Infrastructure, buildings and urban planning IMPROVE Tourism, recreation, and city image
  • 6. 6© Copyright 2015 EMC Corporation. All rights reserved. 6© Copyright 2015 EMC Corporation. All rights reserved. “BIG DATA” ENABLESCITIES OF THE FUTURE Any data-set that cannot be processed with traditional systems Social Networks, UGCPublic records Location DataInternet of things Emerging Data Sources Unstructured Data Dark Data Structured Data Traditional Data Sources
  • 7. 7© Copyright 2015 EMC Corporation. All rights reserved. 7© Copyright 2015 EMC Corporation. All rights reserved. BUILD SMART CITY: THE PROBLEM Understand the city data challenge Geo Distributed Data Source Satellite-borne Imaging Device Airborne Imaging Device Webcam Environmental Monitor Health Monitor Traffic monitor Industrial Process Monitor Data Center Centralized Storage and Analytics Systems City Network DATA CHALLENGE There are massive endpoints for these systems. How to manage massive and heterogeneous data becomes an enormous challenge. Diverse data sources requires normalization and standardization to address Data Orchestration and integration. DATA USAGE CHALLENGE How could we create some innovative business to use these data to create more value and fully use current investment?
  • 8. 8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Architecture
  • 9. 9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Guiding Principles Agile Open Portable Extensible Modular/Ftl. Blocks Analytics Driven Software Defined
  • 10. 10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Architecture Ingestion Layer Spring XD Transformation Layer Python /Transformed_Files KPIs Métricas exploration Maps & Graphs Visualization Layer API Open Data Data Integration Layer Python /Transformed_Files Schema and Instance Alignment Data Sources GUIs, Dashboard that access the underlying databases and promote an excellent User experience Data modelling, metrics , ETL mechanisms, definitions and variable selectionProprietary and Open Data sources APIs to expose data Analytics Data mining prediction
  • 11. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Ingestion Ingestion Layer Spring XD Data Sources
  • 12. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Transformation Transformation Layer Python /Transformed_Files Data Cleaning Conversion
  • 13. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Integration Schema Integration Instance Alignment Integration Layer Python /Transformed_Files Schema and Instance Alignment
  • 14. 14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Integration • Schema DB (INPUT) • Schema Matching (Algorithms & Heuristics) • Suggest Attribute Mappings (OUTPUT : SEMI-SUPERVISED) • Instances of DB tables & Integration Rules (INPUT) • Deduplication, Record Consolidation (Algorithms & Heuristics) • Instance alignment using 2 phase-pass algorithm to avoid duplicate insertion in a semi supervised data integration tool) • Attribute name similarity: fuzzy string comparisons (cosine similarity) • Levenshtein similarity: Categorical/String Data • http://pgsimilarity.projects.pgfoundry.org/ • List of deduplicated instances (OUTPUT - SEMI-SUPERVISED) Schema Integration Instance Alignment Integration Layer Python /Transformed_Files Schema and Instance Alignment
  • 15. 15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Integration Schema Integration Instance Alignment Camada de Integração Python /Transformed_Files Schema and Instance Alignment Deduplication Similarity Join Mapeamento de atributos (Inserir) Mapeamento de atributos (Selecionar) Cosine Levenshtein
  • 16. 16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Visualization KPIs Métricas exploration Maps & Graphs Visualization Layer
  • 17. 17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. API implementation API Open Data
  • 18. 18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Example use Case Transportation
  • 19. 19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • Available Data: – City bus movement information from on board devices (lat- long, time, date, bus line, bus ID) • Goals: – Predict the time of the arrival in a bus stop • Challenges – Lack of data in certain areas of the city – GPS precision Prediction of bus arrival
  • 20. 20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Architecture GPS Ônibus Gemfire XD Routes & Bus Stop Data Lake Streaming Scheduler Lazy-write GPS
  • 21. 21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • Use each bus stop as a node, and the street as edges Transportation Network 𝑥𝑖 𝑥𝑗 𝑎𝑖𝑗 = +1 𝑎𝑖𝑗 = −1 𝑋 = 𝑥1, … , 𝑥 𝑁 , 𝑥𝑖 = 𝑙𝑎𝑡 𝑖 , 𝑙𝑜𝑛𝑔(𝑖) 𝐸 = 𝑒1, … , 𝑒 𝑀 , 𝑒 𝑘 = (𝑥𝑖, 𝑥𝑗)
  • 22. 22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • The goal is to find, for each 𝑒𝑗, an estimation of the average speed in a instant 𝑡, 𝑣 𝑒𝑗, 𝑡 . • Default model - estimate the velocity in each edge, using historical data from the last month. – Different hourly models for each day of the week • Online Model - Use real-time date to calculate the speed. The model
  • 23. 23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Average speed(km/h) Default Model
  • 24. 26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Prediction of Bus Arrival
  • 25. 27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • Based on the information of last use case, extrapolate to verify the quality of the service • Need to identify each bus trip, to evaluate the time interval between two buses of the same line, at each bus stop. Another use case - Auditing
  • 26. 28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Screen shot - Auditing
  • 27. 29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Data Quality Issues Route A Route B Bus GPS
  • 28. 30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Extending PaaS for Smart Cities
  • 29. 31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-time Dashboard Personal Dashboard Government Transactional Applications Commercial Big Data Application Government Big Data Application Commercial Transactional Applications Unified Control Center Application Layer Security Rules Payment Gateway Trust Authentication Identity Management Locations & Mapping Platform as a Service Data Governance DATA ANALYTICS TOOLS Historic & Predictive/DATA APIs Transactiona l Data Store Data Transformation Unstructured Data Structured Data City Semantics Audit Open Standards Data Ingestion Interfaces and Storage CITY IoT INFRASTRUCTURE CITY DATA SOURCES CITY ICT INFRASTRUCTURE Government Devices Commercia l Devices Utility Devices Personal Devices IoT Data Aggregation Governmen t Systems Social Media Commercial Systems Archived Data Fixed & Wireless Networks Cloud Services Enablement Layer Data Orchestration Layer Infrastructure Layer SECURITY Smart City Platform requirements
  • 30. 32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-time Dashboard Personal Dashboard Government Transactional Applications Commercial Big Data Application Government Big Data Application Commercial Transactional Applications Unified Control Center Application Layer Security Rules Payment Gateway Trust Authentication Identity Management Locations & Mapping Platform as a Service Data Governance DATA ANALYTICS TOOLS Historic & Predictive/DATA APIs Transactiona l Data Store Data Transformation Unstructured Data Structured Data City Semantics Audit Open Standards Data Ingestion Interfaces and Storage CITY IoT INFRASTRUCTURE CITY DATA SOURCES CITY ICT INFRASTRUCTURE Government Devices Commercia l Devices Utility Devices Personal Devices IoT Data Aggregation Governmen t Systems Social Media Commercial Systems Archived Data Fixed & Wireless Networks Cloud Services Enablement Layer Data Orchestration Layer Infrastructure Layer SECURITY High level Smart City Platform components PCF Pivotal Cloud Foundry E M C S T O R A G E IISILON and./or CLOUD NATIVE SOFTWARE DEFINED STORAGE V M W A R E v R e a l i z e C l o u d S u i t e & B I G D A T A E X T E N S I O N S P I V O T A L B I G D A T A S U I T E A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S A T S C A L E D A T A P R O C E S S I N G GREENPLUM DATABASE HAWQ SPRING XD SPARK REDIS RABBITMQ GEMFIRE H A D O O P
  • 31. 33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal GPDB Delivers  Massively Parallel Analytics Performance  In-Database Analytical Extensions  Industry-Leading Load Speed  Rich SQL with Schema Agnosticism  Industry-Leading Workload Mgmt.  SAS Acceleration Options  Parallel Co-Processing with Hadoop  No-Forklift Scalability  Multi-Level Redundancy  Rich, Easy-to-Use Administration Tools  Big Data Backup  Comprehensive Security  Software-only or DCA
  • 32. 34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Simple to manage Single file system, single volume, global namespace Massively scalable Scales from 16 TB to over 50 PB in a single cluster 200GB/s throughput, 3.75M IOPS Unmatched efficiency Over 80% storage utilization, automated tiering and SmartDedupe Enterprise data protection Efficient backup and disaster recovery, and N+1 thru N+4 redundancy Robust security and compliance options RBAC, Access Zones, WORM data security, File System Auditing Data At Rest Encryption with SEDs, STIG hardening CAC/PIV Smartcard authentication, FIPS OpenSSL support Operational flexibility Multi-protocol support including NFS, SMB, HTTP, FTP and HDFS Object and Cloud computing including OpenStack Swift Isilon Scale-Out NAS
  • 33. 35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Lots of Little Files Hadoop Impact on Telemetry AKA - Small Files Problem for Hadoop Rio Smart Sensors - ESRI NameNode = 512 GB for RAM Each file eats away 1K in RAM 512GB / 1K = At Most 500M Files assuming no other processes on the box. Rio has 12.5K sensors for the 2016 Olympics. Assuming each sensor sent a file every minute, 18M files in 1 day. EMC believes in storing Metadata on SSD. This allows a scale out for the NameNode to get around the limitations of file growth on the scale-up NameNode.