SlideShare a Scribd company logo
GROW WITH BIG DATA
Third Eye Consulting Services & Solutions LLC.
Crime Analysis &
Predictions System
(CAPS)
Public Safety & National Security team
at
lead by
Sanjay Jacob, Parul Bhandari & Mahesh Punyamurthula
ORIGINALLY DEVELOPED FOR
CAPS – Problem Definition
Public Governments around the world need to:
1. Do more while spending the least.
2.Better manage existing resources.
3.Be proactive in battling crime.
4.Be at the right place at the right time
– to beat crime with the lowest impact.
5.Know what to do when and why.
CAPS – Problem Definition
Other Challenges for Public Governments:
1. Lack of technical knowledge and resources.
2.Lack of management resources to manage, monitor
and operate such systems.
3.Need to analyze disparate data sets spread across
various systems and trapped in different formats.
4. Reliance on outdated infrastructure & systems –
both stationary & mobile.
• Leverages Open Data initiatives by government bodies
worldwide.
• Based on Microsoft’s Big Data technologies stack.
• Capable of handling Big Data’s Velocity, Volume and Veracity.
• Easy to integrate, assemble and develop customized end-to-
end solutions.
• Analyze various types of data feeds - real time streaming &
static data.
• Provides comprehensive analytical capabilities.
• Predict crime patterns for efficient deployment of public
safety resources.
CAPS - Solution
• CAPS is a system to analyze & detect crime hotspots & predict
crime.
• Collects data from various data sources - crime data from
OpenData sites, US census data, social media, traffic & weather
data etc.
• Leverages Azure’s Cloud and on premise technologies for back-
end processing & desktop based visualization tools.
CAPS - Solution
The police can use the system in two ways:
1. The system can alert that a crime is imminent (in the
next 4 hours) based on any new traffic or weather
event/s.
2. The police can run the system once a day and based
on the predictions, decide how to deploy resources
(policemen) in each community/district.
BENEFITS FOR THE LOCAL POLICE
TECHNICAL SECTION
• Azure HDInsight
• MapReduce
• Hive
• Stream Analytics
• Azure Queue
• Azure Storage
• SQL Azure
• SQL Server
• Power BI
• PowerQ&A
• PowerView
• PowerMap
TECHNOLOGIES USED
DATA COLLECTION LAYER
DATA COLLECTION
OPEN DATA - Static
CENSUS DATA - Static
WEATHER DATA
– Real Time
CRIME DATA - Static
TRAFFIC DATA
– Real Time
SOCIAL MEDIA DATA
– Real Time
ENTERPRISE DATA
– Real Time & Static
MACHINE DATA
– Real Time & Static
INTERNET OF THINGS
– Real Time & Static
ANY OTHER DATA
- Static
ANY OTHER DATA
– Real Time
ANY OTHER DATA
– Real Time & Static
DATA PROCESSING LAYER
Cloud or On Premise
PRESENTATION LAYER
The system can be further enhanced to include additional
data sources as available.
For ex:
• Video Data
• Images Data
• Police Systems Data
ADDITIONAL DATA SOURCES
DATA COLLECTION – Windows
Data Sources - For Chicago
 Real time Tweet streams ingested
from Twitter using Search APIs
 Facebook data ingested using Graph
Search APIs.
 Traffic data ingested from Mapquest.
 Weather data ingested from Forcast.io
 Data feed ingestion is automated and
captured using C# custom code base.
Pre-Processor
 Tweets are feed into Stream
Computing Layer for sentiment logic
processing.
 Facebook, Traffic & Weather data
parsed from JSON to csv on run time.
 All data is persisted on Azure Storage.
 Analyzed & summarized data is
persisted in SQL Azure.
Storage
 Analyzed Twitter data is pushed to
Window Azure SQL
 Parsed Twitter/Facebook/Traffic/Weather
data is persisted in Azure Storage in
different containers.
DATA PROCESSING LAYER - Windows
• Windows Azure
• Windows HDInsight
• Stream Analytics
• Azure Queue
• Azure Storage
• SQL Azure
• SQL Server PRESENTATION LAYER
DATA COLLECTION LAYER
DATA PROCESSING LAYER
DATA STORAGE & PROCESSING
STORAGE
 Processed & Aggregated data ingested into
SQL Azure.
 HDInsight blob storage provides reliable
and a scalable solution.
 All data is partitioned on dates.
Sqoop Sqoop
STORAGE
 Calls script on pre-set
schedule to ingest
data into Hive tables.
 Checks periodically to
ensure normal system
operations
 Inserts data
incrementally
 Contains all data as
per the table
schemas.
 Enables HiveQL
execution when
requests come in
from PowerBI
components.
SCHEDULER HIVE
SQL AZURE
HIVE Scheduled Jobs
 Daily scripts to create table and insert data, scheduled
with cron jobs.
HIVE Tables
 Have all data in full details from all data sources.
PRESENTATION LAYER – Windows
DATA PROCESSING LAYER • Power BI
• PowerQ&A
• PowerView
• PowerMap
• Power Query
• PowerPivot
• Windows 8 Apps
• Mobile Apps
DATA COLLECTION LAYER
PRESENTATION LAYER
DATA PRESENTATION LAYER
DATA PRESENTATION LAYER
 Excel 2013 is used as the platform and workbench for analyzing and mining
data, using functionalities which are familiar to most power users.
 PowerPivot is the semantic layer that defines the relationship between data
and calculated measures.
 Data is stored in-memory as a columnar database for faster retrievals.
 Model data is saved along with Excel as a part of it, which makes sharing of
these reports very easy.
 PowerMap provides instant and overall picture of the trends happening across
geographies over..
 PowerView is a Silverlight Add-in that provides powerful interactive and
intuitive dashboards and reports which are built on top of PowerPivot’s data
model. It enables slicing/dicing, drilling-up/down of any level of data. It’s very
useful to identify trends and root causes.
Real time
Data Sources
Data Collection Layer (C# custom code)
Data Processing Layer (Stream Computing Platform - Storm)
HDFS & Blob Storage (Azure)
Presentation Layer (Power BI)
Analytics (HDInsight Hive)
Analytics
(Stream Analytics & MapReduce)
SQL Azure
CLOUD MODEL
– Windows
• Cloud based data
processing &
transformations.
• Cloud based real
time & batch
analytics.
• Office 365’s PowerBI
components for
adhoc analytics.
• Enabled for Windows
8 based Mobile &
Desktop Apps.
Static
Data Sources
CLOUD BASED
INFRASTRUCTURE
Message Queue Layer (Azure Event Hubs)
Machine Learning Algorithms
(AzureML)
Real time
Data Sources
Data Collection Layer (C# custom code)
Data Processing Layer (Azure Stream Analytics)
HDFS & Blob Storage (Azure)
Presentation Layer (Power BI)
Analytics (HDInsight Hive)
Analytics
(Stream Analytics & MapReduce)
SQL Server
HYBRID MODEL
– Windows
Static
Data Sources
Message Queue Layer (Azure Event Hubs)
Machine Learning Algorithms
(AzureML)
• PowerBI components
for adhoc analytics.
• SQL Server based.
• Cloud based data
processing &
transformations.
• Cloud based real
time & batch
analytics.
• Enabled for Windows
8 based Mobile &
Desktop Apps.
CLOUD BASED
INFRASTRUCTURE
ON-PREMISE INFRA
DATA SOURCES – For Chicago
DATA DESCRIPTION SOURCE
Crime Data Historic crime case data over years from
2000 - present
• https://data.cityofchicago.org/Public-
Safety/Crimes-2001-to-present/ijzp-q8t2
Chicago districts Chicago Police districts address
information
• https://portal.chicagopolice.org/portal/page/p
ortal/ClearPath/Communities/Districts
Chicago
communities
Chicago community area mapping • http://en.wikipedia.org/wiki/Community_areas_
in_Chicago
Socio economic
factors
Selected socio economic indicators like
people below poverty, unemployment,
per capita income for each community
• https://data.cityofchicago.org/Health-Human-
Services/Census-Data-Selected-
socioeconomic-indicators-in-C/kn9c-c2s2
Twitter Tweets about Chicago. Twitter Streaming API
Facebook Posts about Chicago. Facebook Graph Search API
Weather Chicago weather data Forecast.io
Traffic Chicago traffic details MapQuest
ANALYTICS
CRIME ANALYTICS
Analyze Crime Levels
• Filters (depending on data)
• Number of crime
• Crime Types
• Location
• Date & Time
• Temperature
• Residents
• Graph Type
• Line
• Bar
• Pie Chart
• Table
• Bubble
CRIME ANALYTICS
Analyze Crime Levels
• Filters (depending on data)
• Number of crime
• Crime Types
• Location
• Date & Time
• Temperature
• Residents
• Graph Type
• Line
• Bar
• Pie Chart
• Table
• Bubble
CRIME ANALYTICS
Analyze Crime Levels
• Filters (depending on data)
• Number of crime
• Crime Types
• Location
• Date & Time
• Temperature
• Residents
• Graph Type
• Line
• Bar
• Pie Chart
• Table
• Bubble
PREDICTIONS
Name Values Comments
Community Community ID This is the key. The prediction is for a specific community for a specific date & time.
Date Date
Time Period
1: 12am – 4am
2: 4am – 8am
3: 8am – 12pm
4: 12pm – 4pm
5: 4pm-8pm
6: 8pm – 12am
For convenience purposes, we have broken up a day into 6 time slots.
We can change this based on the supporting data.
Weather
1- Normal
2- Abnormal
3- Extreme
All weather conditions are categorized into these values. We picked suitable values for each of the weather types to
get a good distribution.
Traffic Event
1- Normal
2- Abnormal
3- Extreme
All traffic conditions are categorized into these values. We picked suitable values for each of the traffic types to get a
good distribution.
Traffic Event Distance from Police
Station
1 – Near
2- Far
3 – Very Far
The assumption is that farther away the event from a police station, higher the chances of a crime. We picked
suitable values for each to get a good distribution.
Unemployment Rate 0 – 100 This is the unemployment rate in that precinct.
Number of police stations in District Number Assuming that propensity for crime is inversely proportional to # of police stations.
Crime
1 – Theft
2 – Assault
3 – Burglary
4 – Narcotics
5 – Battery
6 – None
This is a placeholder category. This list can be anything that is (a) supported by the underlying data and (b) what the
law enforcement are interested in seeing.
FACTORS CONSIDERED FOR PREDICTING CRIME
• With the initial dataset, an initial prediction model is constructed.
• If any of the fields change value, then the model is retrained. Some
of the fields will change infrequently and others will change on a
daily basis (ex. social media, weather & traffic events). The model is
continuously updated/upgraded with new data.
• The system periodically pulls in the latest fields (automatically) from
appropriate sources.
• Then the model runs against the new data to predict what kind of
crime is likely to be committed in each of the communities.
PREDICTION MODEL
CRIME PREDICTIONS
Predict Crime
• Filters (depending on data)
• Number of crime
• Crime Types
• Location
• Date & Time
• Temperature
• Residents
• Graph Type
• Line
• Bar
• Pie Chart
• Table
• Bubble
CRIME PREDICTIONS
Predict Crime
• Filters (depending on data)
• Crime Types
• Location
• Date
• Time
• Temperature
• Traffic
• Distance to Police Station
• Weather
The system is fully extensible and future proof.
• Lessons learned
• Patterns detected
• Observations made
for one city can be used and extended for other cities
worldwide.
The backend infrastructure will also adjust accordingly.
EXTENSIBLITY
The Crime Analysis and Prediction System (CAPS) can/is:
• Detect, Analyze & Predict Crime.
• Help public governments battle crime better with lowered
costs.
• Based on Microsoft’s Big Data technologies – both cloud
and on premise.
• Built on the robust Azure platform that can scale vertically
& horizontally.
• Customizable & Extensible to meet the needs of specific
business use cases.
SUMMARY
THANK YOU!

More Related Content

PPTX
Crime Analysis using Data Analysis
Chetan Hireholi
 
PDF
Machine Learning Approaches for Crime Pattern Detection
APNIC
 
PPTX
Crime prediction-using-data-mining
mohammed albash
 
PPTX
PredPol: How Predictive Policing Works
PredPol, Inc
 
PPTX
Predictive Policing
GAURAV. H .TANDON
 
PPTX
Crime Pattern Detection using K-Means Clustering
Reuben George
 
PPSX
06 analysis of crime
Jim Gilmer
 
PPTX
PPT.pptx
ssuseref08b9
 
Crime Analysis using Data Analysis
Chetan Hireholi
 
Machine Learning Approaches for Crime Pattern Detection
APNIC
 
Crime prediction-using-data-mining
mohammed albash
 
PredPol: How Predictive Policing Works
PredPol, Inc
 
Predictive Policing
GAURAV. H .TANDON
 
Crime Pattern Detection using K-Means Clustering
Reuben George
 
06 analysis of crime
Jim Gilmer
 
PPT.pptx
ssuseref08b9
 

What's hot (20)

PPTX
FAKE NEWS DETECTION PPT
VaishaliSrigadhi
 
PPTX
FAKE NEWS DETECTION (1).pptx
SrivarshiniInakollu
 
PPT
Using Data Mining Techniques to Analyze Crime Pattern
Zakaria Zubi
 
PPTX
Ensemble learning
Haris Jamil
 
PDF
Anomaly detection
Hitesh Mohapatra
 
PPT
Intrusion detection system ppt
Sheetal Verma
 
PDF
Analytics-Based Crime Prediction
Prodapt Solutions
 
PDF
Malware detection-using-machine-learning
Security Bootcamp
 
PPTX
lazy learners and other classication methods
rajshreemuthiah
 
PDF
Fruit Disease Detection and Classification
IRJET Journal
 
PPTX
Developing a Map Reduce Application
Dr. C.V. Suresh Babu
 
PPTX
Object Detection & Tracking
Akshay Gujarathi
 
PPT
3.2 partitioning methods
Krish_ver2
 
PPTX
Inductive analytical approaches to learning
swapnac12
 
PDF
Credit card fraud detection through machine learning
dataalcott
 
PDF
Criminal Detection System
Intrader Amit
 
PPTX
Fake news detection project
HarshdaGhai
 
PPTX
Object detection presentation
AshwinBicholiya
 
PDF
Network security & cryptography full notes
gangadhar9989166446
 
FAKE NEWS DETECTION PPT
VaishaliSrigadhi
 
FAKE NEWS DETECTION (1).pptx
SrivarshiniInakollu
 
Using Data Mining Techniques to Analyze Crime Pattern
Zakaria Zubi
 
Ensemble learning
Haris Jamil
 
Anomaly detection
Hitesh Mohapatra
 
Intrusion detection system ppt
Sheetal Verma
 
Analytics-Based Crime Prediction
Prodapt Solutions
 
Malware detection-using-machine-learning
Security Bootcamp
 
lazy learners and other classication methods
rajshreemuthiah
 
Fruit Disease Detection and Classification
IRJET Journal
 
Developing a Map Reduce Application
Dr. C.V. Suresh Babu
 
Object Detection & Tracking
Akshay Gujarathi
 
3.2 partitioning methods
Krish_ver2
 
Inductive analytical approaches to learning
swapnac12
 
Credit card fraud detection through machine learning
dataalcott
 
Criminal Detection System
Intrader Amit
 
Fake news detection project
HarshdaGhai
 
Object detection presentation
AshwinBicholiya
 
Network security & cryptography full notes
gangadhar9989166446
 
Ad

Viewers also liked (18)

PPTX
2014 Chicago Crime Data Analysis
Yawen Li
 
PPTX
Chicago crime analysis
jangyoung
 
PPT
Crime Analysis
Hi Tech Criminal Justice
 
PPTX
Crime Analytics: Analysis of crimes through news paper articles
Chamath Sajeewa
 
PPTX
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Osokop
 
PDF
Prevent the crime, don't just record it
VideoIQ
 
PPTX
Story Tellers: Hartford Crime Analysis
Neil Ryan
 
PPT
Crime Mapping & Analysis – Georgia Tech
Jonathan D'Cruz
 
PPTX
Yandex Metrica - Data Restart konference 2015
eVisions Advertising s.r.o.
 
PDF
Basic Guide To Singapore GST
Richmond SG
 
PPTX
Random Forest and KNN is fun
Zhen Li
 
PPT
CCTNS Karnataka Overview
ADGP, Public Grivences, Bangalore
 
PDF
Cloud GIS for Crime Mapping
IJORCS
 
PDF
San Francisco crime analysis
Sameer Darekar
 
PPTX
Text Mining, Association Rules and Decision Tree Learning
Adrian Cuyugan
 
PPTX
Crime Time
sbillia
 
PPTX
Building a real time html5 app for mobile devices
Tony Abou-Assaleh
 
PDF
Tajinder Presentation6
Tajinder Singh
 
2014 Chicago Crime Data Analysis
Yawen Li
 
Chicago crime analysis
jangyoung
 
Crime Analysis
Hi Tech Criminal Justice
 
Crime Analytics: Analysis of crimes through news paper articles
Chamath Sajeewa
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Osokop
 
Prevent the crime, don't just record it
VideoIQ
 
Story Tellers: Hartford Crime Analysis
Neil Ryan
 
Crime Mapping & Analysis – Georgia Tech
Jonathan D'Cruz
 
Yandex Metrica - Data Restart konference 2015
eVisions Advertising s.r.o.
 
Basic Guide To Singapore GST
Richmond SG
 
Random Forest and KNN is fun
Zhen Li
 
CCTNS Karnataka Overview
ADGP, Public Grivences, Bangalore
 
Cloud GIS for Crime Mapping
IJORCS
 
San Francisco crime analysis
Sameer Darekar
 
Text Mining, Association Rules and Decision Tree Learning
Adrian Cuyugan
 
Crime Time
sbillia
 
Building a real time html5 app for mobile devices
Tony Abou-Assaleh
 
Tajinder Presentation6
Tajinder Singh
 
Ad

Similar to Crime Analysis & Prediction System (20)

PDF
What is data-driven government for public safety?
IBM Analytics
 
PDF
Using Predictive Analytics for Anticipatory Investigation and Intervention
Jon Gosier
 
PDF
CRIME EXPLORATION AND FORECAST
IRJET Journal
 
PPTX
The Real-time Police Force: Publishing Analytic Information to the Field with...
Azavea
 
PDF
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
IRJET Journal
 
PPT
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
PPTX
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Splunk
 
PPTX
Chicago Crime Dataset Project Proposal
Aashri Tandon
 
PPTX
INTERNET OF THINGS On data acquisition m2m systems
PavanSomisetty1
 
PDF
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
Nelson Petracek
 
PDF
HunchLab 2.0 Getting Started
Azavea
 
PDF
A proposed model_for_cybercrime_detectio
Hossam Al-Ansary
 
PDF
The Impact of the Data Revolution on Official Statistics: Opportunities, Chal...
robkitchin
 
PDF
Evidence-Informed Decision Making
Communication and Media Studies, Carleton University
 
PPTX
Evidence-Informed Decision Making
Communication and Media Studies, Carleton University
 
PDF
Sensing the world with data of things
Sriskandarajah Suhothayan
 
PDF
Sensing the world with Data of Things
Sriskandarajah Suhothayan
 
PDF
Analysis of Crime Big Data using MapReduce
Kaushik Rajan
 
PPTX
Data Philly Meetup - Big (Geo) Data
Azavea
 
PDF
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
What is data-driven government for public safety?
IBM Analytics
 
Using Predictive Analytics for Anticipatory Investigation and Intervention
Jon Gosier
 
CRIME EXPLORATION AND FORECAST
IRJET Journal
 
The Real-time Police Force: Publishing Analytic Information to the Field with...
Azavea
 
CRIME ANALYSIS AND PREDICTION USING MACHINE LEARNING
IRJET Journal
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
Virtual Gov Day - Introduction & Keynote - Alan Webber, IDC Government Insights
Splunk
 
Chicago Crime Dataset Project Proposal
Aashri Tandon
 
INTERNET OF THINGS On data acquisition m2m systems
PavanSomisetty1
 
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
Nelson Petracek
 
HunchLab 2.0 Getting Started
Azavea
 
A proposed model_for_cybercrime_detectio
Hossam Al-Ansary
 
The Impact of the Data Revolution on Official Statistics: Opportunities, Chal...
robkitchin
 
Evidence-Informed Decision Making
Communication and Media Studies, Carleton University
 
Evidence-Informed Decision Making
Communication and Media Studies, Carleton University
 
Sensing the world with data of things
Sriskandarajah Suhothayan
 
Sensing the world with Data of Things
Sriskandarajah Suhothayan
 
Analysis of Crime Big Data using MapReduce
Kaushik Rajan
 
Data Philly Meetup - Big (Geo) Data
Azavea
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 

More from BigDataCloud (20)

PDF
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
BigDataCloud
 
PDF
REAL-TIME RECOMMENDATION SYSTEMS
BigDataCloud
 
PDF
Cloud Computing Services
BigDataCloud
 
PDF
Google Enterprise Cloud Platform - Resources & $2000 credit!
BigDataCloud
 
PDF
Big Data in the Cloud - Solutions & Apps
BigDataCloud
 
PDF
Big Data Analytics in Motorola on the Google Cloud Platform
BigDataCloud
 
PDF
Streak + Google Cloud Platform
BigDataCloud
 
PDF
Using Advanced Analyics to bring Business Value
BigDataCloud
 
PDF
Creating Business Value from Big Data, Analytics & Technology.
BigDataCloud
 
PDF
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
PPTX
Recommendation Engines - An Architectural Guide
BigDataCloud
 
PPTX
Why Hadoop is the New Infrastructure for the CMO?
BigDataCloud
 
PDF
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
BigDataCloud
 
PPTX
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
PPTX
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
BigDataCloud
 
PDF
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
BigDataCloud
 
PDF
What Does Big Data Mean and Who Will Win
BigDataCloud
 
PDF
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
PDF
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 
PPT
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud
 
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
BigDataCloud
 
REAL-TIME RECOMMENDATION SYSTEMS
BigDataCloud
 
Cloud Computing Services
BigDataCloud
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
BigDataCloud
 
Big Data in the Cloud - Solutions & Apps
BigDataCloud
 
Big Data Analytics in Motorola on the Google Cloud Platform
BigDataCloud
 
Streak + Google Cloud Platform
BigDataCloud
 
Using Advanced Analyics to bring Business Value
BigDataCloud
 
Creating Business Value from Big Data, Analytics & Technology.
BigDataCloud
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
BigDataCloud
 
Recommendation Engines - An Architectural Guide
BigDataCloud
 
Why Hadoop is the New Infrastructure for the CMO?
BigDataCloud
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
BigDataCloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
BigDataCloud
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
BigDataCloud
 
What Does Big Data Mean and Who Will Win
BigDataCloud
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
BigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud
 
BigDataCloud Sept 8 2011 Meetup - Fail-Proofing Hadoop Clusters with Automati...
BigDataCloud
 

Recently uploaded (20)

PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Presentation on animal welfare a good topic
kidscream385
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 

Crime Analysis & Prediction System

  • 1. GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.
  • 3. Public Safety & National Security team at lead by Sanjay Jacob, Parul Bhandari & Mahesh Punyamurthula ORIGINALLY DEVELOPED FOR
  • 4. CAPS – Problem Definition Public Governments around the world need to: 1. Do more while spending the least. 2.Better manage existing resources. 3.Be proactive in battling crime. 4.Be at the right place at the right time – to beat crime with the lowest impact. 5.Know what to do when and why.
  • 5. CAPS – Problem Definition Other Challenges for Public Governments: 1. Lack of technical knowledge and resources. 2.Lack of management resources to manage, monitor and operate such systems. 3.Need to analyze disparate data sets spread across various systems and trapped in different formats. 4. Reliance on outdated infrastructure & systems – both stationary & mobile.
  • 6. • Leverages Open Data initiatives by government bodies worldwide. • Based on Microsoft’s Big Data technologies stack. • Capable of handling Big Data’s Velocity, Volume and Veracity. • Easy to integrate, assemble and develop customized end-to- end solutions. • Analyze various types of data feeds - real time streaming & static data. • Provides comprehensive analytical capabilities. • Predict crime patterns for efficient deployment of public safety resources. CAPS - Solution
  • 7. • CAPS is a system to analyze & detect crime hotspots & predict crime. • Collects data from various data sources - crime data from OpenData sites, US census data, social media, traffic & weather data etc. • Leverages Azure’s Cloud and on premise technologies for back- end processing & desktop based visualization tools. CAPS - Solution
  • 8. The police can use the system in two ways: 1. The system can alert that a crime is imminent (in the next 4 hours) based on any new traffic or weather event/s. 2. The police can run the system once a day and based on the predictions, decide how to deploy resources (policemen) in each community/district. BENEFITS FOR THE LOCAL POLICE
  • 10. • Azure HDInsight • MapReduce • Hive • Stream Analytics • Azure Queue • Azure Storage • SQL Azure • SQL Server • Power BI • PowerQ&A • PowerView • PowerMap TECHNOLOGIES USED
  • 11. DATA COLLECTION LAYER DATA COLLECTION OPEN DATA - Static CENSUS DATA - Static WEATHER DATA – Real Time CRIME DATA - Static TRAFFIC DATA – Real Time SOCIAL MEDIA DATA – Real Time ENTERPRISE DATA – Real Time & Static MACHINE DATA – Real Time & Static INTERNET OF THINGS – Real Time & Static ANY OTHER DATA - Static ANY OTHER DATA – Real Time ANY OTHER DATA – Real Time & Static DATA PROCESSING LAYER Cloud or On Premise PRESENTATION LAYER
  • 12. The system can be further enhanced to include additional data sources as available. For ex: • Video Data • Images Data • Police Systems Data ADDITIONAL DATA SOURCES
  • 13. DATA COLLECTION – Windows Data Sources - For Chicago  Real time Tweet streams ingested from Twitter using Search APIs  Facebook data ingested using Graph Search APIs.  Traffic data ingested from Mapquest.  Weather data ingested from Forcast.io  Data feed ingestion is automated and captured using C# custom code base. Pre-Processor  Tweets are feed into Stream Computing Layer for sentiment logic processing.  Facebook, Traffic & Weather data parsed from JSON to csv on run time.  All data is persisted on Azure Storage.  Analyzed & summarized data is persisted in SQL Azure. Storage  Analyzed Twitter data is pushed to Window Azure SQL  Parsed Twitter/Facebook/Traffic/Weather data is persisted in Azure Storage in different containers.
  • 14. DATA PROCESSING LAYER - Windows • Windows Azure • Windows HDInsight • Stream Analytics • Azure Queue • Azure Storage • SQL Azure • SQL Server PRESENTATION LAYER DATA COLLECTION LAYER DATA PROCESSING LAYER
  • 15. DATA STORAGE & PROCESSING STORAGE  Processed & Aggregated data ingested into SQL Azure.  HDInsight blob storage provides reliable and a scalable solution.  All data is partitioned on dates. Sqoop Sqoop STORAGE  Calls script on pre-set schedule to ingest data into Hive tables.  Checks periodically to ensure normal system operations  Inserts data incrementally  Contains all data as per the table schemas.  Enables HiveQL execution when requests come in from PowerBI components. SCHEDULER HIVE SQL AZURE HIVE Scheduled Jobs  Daily scripts to create table and insert data, scheduled with cron jobs. HIVE Tables  Have all data in full details from all data sources.
  • 16. PRESENTATION LAYER – Windows DATA PROCESSING LAYER • Power BI • PowerQ&A • PowerView • PowerMap • Power Query • PowerPivot • Windows 8 Apps • Mobile Apps DATA COLLECTION LAYER PRESENTATION LAYER
  • 18. DATA PRESENTATION LAYER  Excel 2013 is used as the platform and workbench for analyzing and mining data, using functionalities which are familiar to most power users.  PowerPivot is the semantic layer that defines the relationship between data and calculated measures.  Data is stored in-memory as a columnar database for faster retrievals.  Model data is saved along with Excel as a part of it, which makes sharing of these reports very easy.  PowerMap provides instant and overall picture of the trends happening across geographies over..  PowerView is a Silverlight Add-in that provides powerful interactive and intuitive dashboards and reports which are built on top of PowerPivot’s data model. It enables slicing/dicing, drilling-up/down of any level of data. It’s very useful to identify trends and root causes.
  • 19. Real time Data Sources Data Collection Layer (C# custom code) Data Processing Layer (Stream Computing Platform - Storm) HDFS & Blob Storage (Azure) Presentation Layer (Power BI) Analytics (HDInsight Hive) Analytics (Stream Analytics & MapReduce) SQL Azure CLOUD MODEL – Windows • Cloud based data processing & transformations. • Cloud based real time & batch analytics. • Office 365’s PowerBI components for adhoc analytics. • Enabled for Windows 8 based Mobile & Desktop Apps. Static Data Sources CLOUD BASED INFRASTRUCTURE Message Queue Layer (Azure Event Hubs) Machine Learning Algorithms (AzureML)
  • 20. Real time Data Sources Data Collection Layer (C# custom code) Data Processing Layer (Azure Stream Analytics) HDFS & Blob Storage (Azure) Presentation Layer (Power BI) Analytics (HDInsight Hive) Analytics (Stream Analytics & MapReduce) SQL Server HYBRID MODEL – Windows Static Data Sources Message Queue Layer (Azure Event Hubs) Machine Learning Algorithms (AzureML) • PowerBI components for adhoc analytics. • SQL Server based. • Cloud based data processing & transformations. • Cloud based real time & batch analytics. • Enabled for Windows 8 based Mobile & Desktop Apps. CLOUD BASED INFRASTRUCTURE ON-PREMISE INFRA
  • 21. DATA SOURCES – For Chicago DATA DESCRIPTION SOURCE Crime Data Historic crime case data over years from 2000 - present • https://data.cityofchicago.org/Public- Safety/Crimes-2001-to-present/ijzp-q8t2 Chicago districts Chicago Police districts address information • https://portal.chicagopolice.org/portal/page/p ortal/ClearPath/Communities/Districts Chicago communities Chicago community area mapping • http://en.wikipedia.org/wiki/Community_areas_ in_Chicago Socio economic factors Selected socio economic indicators like people below poverty, unemployment, per capita income for each community • https://data.cityofchicago.org/Health-Human- Services/Census-Data-Selected- socioeconomic-indicators-in-C/kn9c-c2s2 Twitter Tweets about Chicago. Twitter Streaming API Facebook Posts about Chicago. Facebook Graph Search API Weather Chicago weather data Forecast.io Traffic Chicago traffic details MapQuest
  • 23. CRIME ANALYTICS Analyze Crime Levels • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  • 24. CRIME ANALYTICS Analyze Crime Levels • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  • 25. CRIME ANALYTICS Analyze Crime Levels • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  • 27. Name Values Comments Community Community ID This is the key. The prediction is for a specific community for a specific date & time. Date Date Time Period 1: 12am – 4am 2: 4am – 8am 3: 8am – 12pm 4: 12pm – 4pm 5: 4pm-8pm 6: 8pm – 12am For convenience purposes, we have broken up a day into 6 time slots. We can change this based on the supporting data. Weather 1- Normal 2- Abnormal 3- Extreme All weather conditions are categorized into these values. We picked suitable values for each of the weather types to get a good distribution. Traffic Event 1- Normal 2- Abnormal 3- Extreme All traffic conditions are categorized into these values. We picked suitable values for each of the traffic types to get a good distribution. Traffic Event Distance from Police Station 1 – Near 2- Far 3 – Very Far The assumption is that farther away the event from a police station, higher the chances of a crime. We picked suitable values for each to get a good distribution. Unemployment Rate 0 – 100 This is the unemployment rate in that precinct. Number of police stations in District Number Assuming that propensity for crime is inversely proportional to # of police stations. Crime 1 – Theft 2 – Assault 3 – Burglary 4 – Narcotics 5 – Battery 6 – None This is a placeholder category. This list can be anything that is (a) supported by the underlying data and (b) what the law enforcement are interested in seeing. FACTORS CONSIDERED FOR PREDICTING CRIME
  • 28. • With the initial dataset, an initial prediction model is constructed. • If any of the fields change value, then the model is retrained. Some of the fields will change infrequently and others will change on a daily basis (ex. social media, weather & traffic events). The model is continuously updated/upgraded with new data. • The system periodically pulls in the latest fields (automatically) from appropriate sources. • Then the model runs against the new data to predict what kind of crime is likely to be committed in each of the communities. PREDICTION MODEL
  • 29. CRIME PREDICTIONS Predict Crime • Filters (depending on data) • Number of crime • Crime Types • Location • Date & Time • Temperature • Residents • Graph Type • Line • Bar • Pie Chart • Table • Bubble
  • 30. CRIME PREDICTIONS Predict Crime • Filters (depending on data) • Crime Types • Location • Date • Time • Temperature • Traffic • Distance to Police Station • Weather
  • 31. The system is fully extensible and future proof. • Lessons learned • Patterns detected • Observations made for one city can be used and extended for other cities worldwide. The backend infrastructure will also adjust accordingly. EXTENSIBLITY
  • 32. The Crime Analysis and Prediction System (CAPS) can/is: • Detect, Analyze & Predict Crime. • Help public governments battle crime better with lowered costs. • Based on Microsoft’s Big Data technologies – both cloud and on premise. • Built on the robust Azure platform that can scale vertically & horizontally. • Customizable & Extensible to meet the needs of specific business use cases. SUMMARY