SlideShare a Scribd company logo
Realtime Computation
      with Storm
                    Brad Anderson
          banderson@maprtech.com
                         @boorad
Realtime Computation with Storm
Definition & Overview
   Interoperability
     Use Cases
Stream Processing
       CEP
 Distributed RPC
Before Storm



Queues        Workers
Example




 (simplified)
Storm
Guaranteed data processing
Horizontal scalability
Fault-tolerance
No intermediate message brokers!
Higher level abstraction than message passing
Concepts
streams

Tuple   Tuple      Tuple    Tuple    Tuple     Tuple   Tuple




                Unbounded sequence of tuples
spouts



Source of streams
spouts
public interface ISpout extends Serializable {
  void open(Map conf,
         TopologyContext context,
         SpoutOutputCollector collector);
  void close();
  void nextTuple();
  void ack(Object msgId);
  void fail(Object msgId);
}
bolts



Processes input streams and produces new streams
bolts
public class DoubleAndTripleBolt extends BaseRichBolt {
  private OutputCollectorBase _collector;

    public void prepare(Map conf,
                 TopologyContext context,
                 OutputCollectorBase collector) {
      _collector = collector;
    }

    public void execute(Tuple input) {
      int val = input.getInteger(0);
      _collector.emit(input, new Values(val*2, val*3));
      _collector.ack(input);
    }

    public void declareOutputFields(OutputFieldsDeclarer declarer) {
      declarer.declare(new Fields("double", "triple"));
    }
}
topologies



Network of spouts and bolts
Trident
Cascading for Storm
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
   topology.newStream("spout1", spout)
    .each(new Fields("sentence"), new Split(), new Fields("word"))
    .groupBy(new Fields("word"))
    .persistentAggregate(new MemoryMapState.Factory(),
                  new Count(),
                   new Fields("count"))
    .parallelismHint(6);
Interoperability
spouts
•Kafka (with transactions)
• Kestrel
• JMS
• AMQP
• Beanstalkd
bolts
• Functions
• Filters
• Aggregation
• Joins
• Talk to databases, Hadoop write-behind
Storm

                realtime
               processes

       Queue                               Apps
Raw
Data                                      Business
                                           Value
                               Hadoop




                                batch
                              processes
Storm

                       realtime
                      processes

              Queue                               Apps
Raw
Data                                             Business
                                                  Value
                                      Hadoop
       Parallel Cluster Ingest


                                       batch
                                     processes
Storm

                realtime
               processes

       Queue                Apps
Raw
Data                       Business
                            Value
               Hadoop




                 batch
               processes
Storm

        realtime
       processes
                    Apps
Raw
Data               Business
                    Value
       Hadoop




         batch
       processes
Use Cases
Twitter
                  Follower

                             Distinct
        Tweeter   Follower   follower



                  Follower
                             Distinct
  URL   Tweeter              follower   Reach
                  Follower


                  Follower
                             Distinct
        Tweeter              follower

                  Follower
Heartbyte
Fleet Logistics
http://github.com/{tdunning | boorad}/mapr-spout


                                    Brad Anderson
                          banderson@maprtech.com
                                         @boorad
Thank you.
http://github.com/{tdunning | boorad}/mapr-spout


                                    Brad Anderson
                          banderson@maprtech.com
                                         @boorad

More Related Content

What's hot (20)

PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
PDF
The Future of Sharding
EDB
 
PPTX
Jeremy Foran [BAI Communications] | Detecting Subway Overcrowding in Real Tim...
InfluxData
 
PPTX
The next generation of the Montage image mosaic engine
G. Bruce Berriman
 
PPTX
Taming the Tiger: Tips and Tricks for Using Telegraf
InfluxData
 
PDF
Introduction to Spark
Carol McDonald
 
PPT
Map Reduce introduction
Muralidharan Deenathayalan
 
PPTX
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
InfluxData
 
PPTX
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
InfluxData
 
PPTX
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
InfluxData
 
PDF
Introduction to Apache Hivemall v0.5.0
Makoto Yui
 
PPTX
Graphite
David Lutz
 
PPTX
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
InfluxData
 
PDF
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
PPTX
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
InfluxData
 
PDF
OPTIMIZING THE TICK STACK
InfluxData
 
PPTX
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
PPTX
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
PDF
On-Prem Solution for the Selection of Wind Energy Models
Databricks
 
PDF
Scalding
Mario Pastorelli
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
The Future of Sharding
EDB
 
Jeremy Foran [BAI Communications] | Detecting Subway Overcrowding in Real Tim...
InfluxData
 
The next generation of the Montage image mosaic engine
G. Bruce Berriman
 
Taming the Tiger: Tips and Tricks for Using Telegraf
InfluxData
 
Introduction to Spark
Carol McDonald
 
Map Reduce introduction
Muralidharan Deenathayalan
 
Anais Dotis-Georgiou & Steven Soroka [InfluxData] | Machine Learning with Tel...
InfluxData
 
Scott Anderson [InfluxData] | Map & Reduce – The Powerhouses of Custom Flux F...
InfluxData
 
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
InfluxData
 
Introduction to Apache Hivemall v0.5.0
Makoto Yui
 
Graphite
David Lutz
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
InfluxData
 
Fast Cars, Big Data How Streaming can help Formula 1
Carol McDonald
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
InfluxData
 
OPTIMIZING THE TICK STACK
InfluxData
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB Database
 
Apache Spark Machine Learning Decision Trees
Carol McDonald
 
On-Prem Solution for the Selection of Wind Energy Models
Databricks
 

Similar to Realtime Computation with Storm (20)

PDF
Realtime Computation with Storm
boorad
 
PDF
Twitter Stream Processing
Colin Surprenant
 
PDF
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Adrianos Dadis
 
PPTX
London hug
Ted Dunning
 
PDF
Building Big Data Streaming Architectures
David Martínez Rego
 
PDF
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Richard McDougall
 
PDF
Real-time Big Data Processing with Storm
viirya
 
PDF
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
KEY
Processing Big Data
cwensel
 
PDF
Realtime Analytics with Hadoop and HBase
larsgeorge
 
PDF
Storm
nathanmarz
 
PPTX
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
PPTX
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Brian O'Neill
 
PDF
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 
PDF
Real time stream processing presentation at General Assemb.ly
Varun Vijayaraghavan
 
PDF
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
darach
 
PDF
Hadoop Ecosystem and Low Latency Streaming Architecture
InSemble
 
PPTX
Storm - SpaaS
Ernestas Vaiciukevicius
 
PPTX
Introduction to Storm
Chandler Huang
 
PDF
Kafka storm-v2
Ozgur Rahmi Donmez
 
Realtime Computation with Storm
boorad
 
Twitter Stream Processing
Colin Surprenant
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Adrianos Dadis
 
London hug
Ted Dunning
 
Building Big Data Streaming Architectures
David Martínez Rego
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Richard McDougall
 
Real-time Big Data Processing with Storm
viirya
 
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Processing Big Data
cwensel
 
Realtime Analytics with Hadoop and HBase
larsgeorge
 
Storm
nathanmarz
 
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Phily JUG : Web Services APIs for Real-time Analytics w/ Storm and DropWizard
Brian O'Neill
 
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 
Real time stream processing presentation at General Assemb.ly
Varun Vijayaraghavan
 
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
darach
 
Hadoop Ecosystem and Low Latency Streaming Architecture
InSemble
 
Introduction to Storm
Chandler Huang
 
Kafka storm-v2
Ozgur Rahmi Donmez
 
Ad

More from boorad (10)

PPTX
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
PPTX
Big Data Use Cases
boorad
 
PPTX
PhillyDB Talk - Beyond Batch
boorad
 
KEY
TriHUG - Beyond Batch
boorad
 
KEY
Large Scale Data Analysis Tools
boorad
 
KEY
DevNexus 2011
boorad
 
KEY
DevNation Atlanta
boorad
 
KEY
NOSQL, CouchDB, and the Cloud
boorad
 
PDF
Why Erlang? - Bar Camp Atlanta 2008
boorad
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
boorad
 
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
Big Data Use Cases
boorad
 
PhillyDB Talk - Beyond Batch
boorad
 
TriHUG - Beyond Batch
boorad
 
Large Scale Data Analysis Tools
boorad
 
DevNexus 2011
boorad
 
DevNation Atlanta
boorad
 
NOSQL, CouchDB, and the Cloud
boorad
 
Why Erlang? - Bar Camp Atlanta 2008
boorad
 
Ad

Realtime Computation with Storm

Editor's Notes

  • #2: \n
  • #3: C - Best accessible distributed realtime computation system going\nA - Learn about and start using Storm\nB - You will get a great new tool in your technology stack - interesting uses\n
  • #4: CEP - continuous\n\nNot HFT-grade\n\n
  • #5: \n
  • #6: scaling is painful\npoor fault tolerance\ncoding is hard\n
  • #7: \n
  • #8: \n
  • #9: tweets stock ticks manufacturing machine data sensor messages\n
  • #10: \n
  • #11: \n
  • #12: \n
  • #13: \n
  • #14: DAG\n\nruns continuously\n
  • #15: abstractions like Cascading, Hive, Pig make MR approachable\n\ncode size reduction\n
  • #16: \n
  • #17: \n
  • #18: kestrel - via thrift\nkafka - transactional topologies, idempotentcy, process only once\nactivemq\n
  • #19: \n
  • #20: current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n
  • #21: new architecture\n
  • #22: \n
  • #23: Trending Topics (stream processing of the firehose)\ncomputing the ‘reach’ of a URL (Dist RPC)\n
  • #24: \n
  • #25: Android devices, sampling geo every 5 seconds\nroute optimization\nroad tax reduction\nidle alerts\n
  • #26: C - Exciting times, much like Hadoop/NoSQL beginning\nA - Start tinkering with Storm, integrate into your workflows\nB - be more responsive in turning data into information\n