SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modern Application
Architecture using
Data.gov
Devin Pinkston | Solutions Engineer Ian Brooks | Solutions Engineer
Henry Sowell| Technical Director
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HORTONWORKS DATA PLATFORM
Hadoop
&YARN
DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY
HDP 2.2
Dec 2014
HDP 2.2
Dec 2014
2.2.0
2.4.0
2.6.0
2.7.1
HDP 2.3
Oct 2015
2.7.3
HDP 2.6*
1H2017
2.7.1
HDP 2.4
Mar 2016
* HDP 2.6 – Shows current Apache branches being used. Final component version subject to change based on Apache release process.
** Spark 1.6.3+ Spark 2.1 – HDP 2.6 supports both Spark 1.6.3 and Spark 2.1 as GA.
*** Hive 2.1 is GA within HDP 2.6.
**** Apache Solr is available as an add-on product HDP Search.
2.7.3
Sqoop
1.4.4
1.4.5
1.4.4
1.4.6
1.4.6
1.4.6
1.4.6
Druid
0.9.2
Knox
0.4.0
0.5.0
0.6.0
0.11.0
0.6.0
0.9.0
Ranger
0.4.0
0.5.0
0.7.0
0.5.0
0.6.0
Ambari
1.4.4
2.0.0
1.5.1
2.1.0
2.5.0
2.2.1
2.4.0
Kafka
0.8.2
0.8.1
0.10.1.0
0.9.0
0.10.0
Zookeeper
3.4.5
3.4.6
3.4.5
3.4.6
3.4.6
3.4.6
3.4.6
Flume
1.5.2
1.4.0
1.3.1
1.5.2
1.5.2
1.5.2
1.5.2
Solr
4.10.2
4.7.2
5.2.1
5.5.1
****
5.2.1
5.5.1
Slider
0.60.0
0.80.0
0.91.0
0.80.0
0.91.0
Atlas
0.5.0
0.8.0
0.5.0
0.7.0
Accumulo
1.6.1
1.5.1
1.7.0
1.7.0
1.7.0
1.7.0
Phoenix
4.0.0
4.2.0
4.4.0
4.7.0
4.4.0
4.7.0
Storm
0.9.3
0.10.0
0.9.1
1.1.0
0.10.0
1.0.1
Falcon
0.5.0
0.6.0
0.6.1
0.10.0
0.6.1
0.10.0
Tez
0.4.0
0.5.2
0.7.0
0.7.0
0.7.0
0.7.0
Hive
0.12.0
0.13.0
0.14.0
1.2.1
1.2.1+
2.1***
1.2.1
1.2.1+
2.1***
Pig
0.12.0
0.12.1
0.14.0
0.15.0
0.16.0
0.15.0
0.16.0
HDP 2.5
Aug 2016
Oozie
3.3.2
4.1.0
4.0.0
4.2.0
4.2.0
4.2.0
4.2.0
Spark
1.2.1
1.4.1
1.6.3+
2.1**
1.6.0
1.6.2+
2.0**
HBase
0.98.4
0.96.1
0.98.0
1.1.2
1.1.2
1.1.2
1.1.2
Zeppelin
0.7.0
0.6.0
HDP 2.1
April 2014
HDP 2.0
Oct 2013
Ongoing Innovation in Apache
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
How Do We Handle The Data?
Actionable
Intelligence from
Connected Data
Platforms
Capturing perishable
insights from data in motion
Ensuring rich, historical insights on
data at rest
Necessary for modern data
applications
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Single Aggregation Site to Hadoop Cluster Architecture
Core Hadoop Cluster
HCatalog: Shared Table & User Defined Metadata for All Workloads
Ambari: Provision, Manage and Monitor Cluster Resources
Stream
Data Access
° ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
1 ° ° ° ° ° ° ° ° ° ° ° ° ° °
HDFS (Hadoop Distributed File System)
YARN (Cluster Resource Management)
Accumulo
Collection Sources
NiFi
NiFi Put Ingest to
Apache Accumulo
NiFi
NiFi
NiFi
Aggregation Site
NiFi
NiFi
NiFi
NiFi
Kafka
Movement Across
Networks
Storm
Content-based
routing/enrichment
Incident
REST API
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Basic Accumulo Incident Structure
Row <incident_epoch>
Column Family “Incident” “Geo” ”temporal”
Column Qualifier <cat>, <descript>,
<pdDistrict>,
<incidentNum>, <Pdid>
<address>, <x>, <y>,
<location>
<dayOfWeek>, <date>,
<time>
Value <value> <value> <value>

More Related Content

What's hot (20)

PPTX
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
PPTX
Accelerating Big Data Insights
DataWorks Summit
 
PPTX
Insights into Real-world Data Management Challenges
DataWorks Summit
 
PDF
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
DataWorks Summit
 
PPTX
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
PPTX
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
 
PPTX
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
PPTX
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
PPTX
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
PPTX
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
PDF
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
PPTX
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
PPTX
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
PDF
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
PPTX
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
PPTX
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Accelerating Big Data Insights
DataWorks Summit
 
Insights into Real-world Data Management Challenges
DataWorks Summit
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
DataWorks Summit
 
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
 
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
 
Saving the elephant—now, not later
DataWorks Summit
 
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
IoT:what about data storage?
DataWorks Summit/Hadoop Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
DataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
Zero ETL analytics with LLAP in Azure HDInsight
DataWorks Summit
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 

Similar to Enabling Modern Application Architecture using Data.gov open government data (20)

PDF
Azure Cafe Marketplace with Hortonworks March 31 2016
Joan Novino
 
PPTX
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
PDF
Social Media Monitoring with NiFi, Druid and Superset
Thiago Santiago
 
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
PDF
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
PDF
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
PDF
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
PDF
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
PDF
Discover hdp 2.2 hdfs - final
Hortonworks
 
PDF
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks
 
PPTX
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
PDF
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
PDF
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
PPTX
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
PPTX
Internet of things Crash Course Workshop
DataWorks Summit
 
PPTX
Intro to Spark with Zeppelin
Hortonworks
 
PDF
Meetup oslo hortonworks HDP
Alexander Bakos Leirvåg
 
PDF
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Joan Novino
 
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Social Media Monitoring with NiFi, Druid and Superset
Thiago Santiago
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover hdp 2.2 hdfs - final
Hortonworks
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Hortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Hortonworks
 
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
Internet of things Crash Course Workshop
DataWorks Summit
 
Intro to Spark with Zeppelin
Hortonworks
 
Meetup oslo hortonworks HDP
Alexander Bakos Leirvåg
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Mats Johansson
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Digital Circuits, important subject in CS
contactparinay1
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 

Enabling Modern Application Architecture using Data.gov open government data

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Modern Application Architecture using Data.gov Devin Pinkston | Solutions Engineer Ian Brooks | Solutions Engineer Henry Sowell| Technical Director
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved HORTONWORKS DATA PLATFORM Hadoop &YARN DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY HDP 2.2 Dec 2014 HDP 2.2 Dec 2014 2.2.0 2.4.0 2.6.0 2.7.1 HDP 2.3 Oct 2015 2.7.3 HDP 2.6* 1H2017 2.7.1 HDP 2.4 Mar 2016 * HDP 2.6 – Shows current Apache branches being used. Final component version subject to change based on Apache release process. ** Spark 1.6.3+ Spark 2.1 – HDP 2.6 supports both Spark 1.6.3 and Spark 2.1 as GA. *** Hive 2.1 is GA within HDP 2.6. **** Apache Solr is available as an add-on product HDP Search. 2.7.3 Sqoop 1.4.4 1.4.5 1.4.4 1.4.6 1.4.6 1.4.6 1.4.6 Druid 0.9.2 Knox 0.4.0 0.5.0 0.6.0 0.11.0 0.6.0 0.9.0 Ranger 0.4.0 0.5.0 0.7.0 0.5.0 0.6.0 Ambari 1.4.4 2.0.0 1.5.1 2.1.0 2.5.0 2.2.1 2.4.0 Kafka 0.8.2 0.8.1 0.10.1.0 0.9.0 0.10.0 Zookeeper 3.4.5 3.4.6 3.4.5 3.4.6 3.4.6 3.4.6 3.4.6 Flume 1.5.2 1.4.0 1.3.1 1.5.2 1.5.2 1.5.2 1.5.2 Solr 4.10.2 4.7.2 5.2.1 5.5.1 **** 5.2.1 5.5.1 Slider 0.60.0 0.80.0 0.91.0 0.80.0 0.91.0 Atlas 0.5.0 0.8.0 0.5.0 0.7.0 Accumulo 1.6.1 1.5.1 1.7.0 1.7.0 1.7.0 1.7.0 Phoenix 4.0.0 4.2.0 4.4.0 4.7.0 4.4.0 4.7.0 Storm 0.9.3 0.10.0 0.9.1 1.1.0 0.10.0 1.0.1 Falcon 0.5.0 0.6.0 0.6.1 0.10.0 0.6.1 0.10.0 Tez 0.4.0 0.5.2 0.7.0 0.7.0 0.7.0 0.7.0 Hive 0.12.0 0.13.0 0.14.0 1.2.1 1.2.1+ 2.1*** 1.2.1 1.2.1+ 2.1*** Pig 0.12.0 0.12.1 0.14.0 0.15.0 0.16.0 0.15.0 0.16.0 HDP 2.5 Aug 2016 Oozie 3.3.2 4.1.0 4.0.0 4.2.0 4.2.0 4.2.0 4.2.0 Spark 1.2.1 1.4.1 1.6.3+ 2.1** 1.6.0 1.6.2+ 2.0** HBase 0.98.4 0.96.1 0.98.0 1.1.2 1.1.2 1.1.2 1.1.2 Zeppelin 0.7.0 0.6.0 HDP 2.1 April 2014 HDP 2.0 Oct 2013 Ongoing Innovation in Apache
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved How Do We Handle The Data? Actionable Intelligence from Connected Data Platforms Capturing perishable insights from data in motion Ensuring rich, historical insights on data at rest Necessary for modern data applications
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Single Aggregation Site to Hadoop Cluster Architecture Core Hadoop Cluster HCatalog: Shared Table & User Defined Metadata for All Workloads Ambari: Provision, Manage and Monitor Cluster Resources Stream Data Access ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) YARN (Cluster Resource Management) Accumulo Collection Sources NiFi NiFi Put Ingest to Apache Accumulo NiFi NiFi NiFi Aggregation Site NiFi NiFi NiFi NiFi Kafka Movement Across Networks Storm Content-based routing/enrichment Incident REST API
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Basic Accumulo Incident Structure Row <incident_epoch> Column Family “Incident” “Geo” ”temporal” Column Qualifier <cat>, <descript>, <pdDistrict>, <incidentNum>, <Pdid> <address>, <x>, <y>, <location> <dayOfWeek>, <date>, <time> Value <value> <value> <value>

Editor's Notes

  • #4: Hortonworks: Powering the Future of Data