SlideShare a Scribd company logo
Clojure at BackType
How we learned to stop worrying and love the
               parentheses



                                    Nathan Marz
                                     BackType
                                    @nathanmarz
BackType

Data Services (APIs)

Social Media Analytics
      Dashboard
APIs
• Conversational graph for url
• Comment search
• #Tweets / URL
• Influence scores
• Top sites
• Trending links stream
• etc.
Clojure at BackType
URL Profiles
Site comparisons
Influencer Profiles
Twitter Account
   Analytics
Topic Analysis
Topic Analysis
BackType’s Challenges
BackType’s Challenges

 Complex analytics
BackType’s Challenges

 Complex analytics
on lots of data (> 30TB)
BackType’s Challenges

 Complex analytics
on lots of data (> 30TB)
      in realtime
Clojure at BackType

• Cascalog
• ElephantDB
• Storm
Let’s build an app
Let’s build an app
Cascalog

               Cascalog   Variables and logic
Abstraction




              Cascading   Tuples, data workflows

                              Key/value pairs,
              MapReduce        aggregation
Cascalog basics




 The “age” dataset
Cascalog basics
Cascalog basics




Define and
execute a query
Cascalog basics


        Where to
        emit results



Define and
execute a query
Cascalog basics


        Where to
        emit results

                   Output variables
Define and
execute a query
Cascalog basics


        Where to                      “Predicates”: constrain
        emit results                  the output variables

                   Output variables
Define and
execute a query
Predicates
Predicates


Input fields
Predicates


Input fields   Output fields
Predicates



Fields can be constants or variables
Predicates



Fields can be constants or variables

 Variables are prefixed with ? or !
Predicates
Predicates
• Functions
• Filters
• Aggregators
• Generators: finite sources of tuples
Example #1



    Generator   Filter
Example #2



Generator        Function
Example #3



Generator   Aggregator   Filter
Join example
Join example




     Triggers a join
Join example
Join example




Joins are an implementation detail
Cascalog demo!
Composability




 “Predicate macro”
Composability

       expands to




Using a predicate macro
Contrast to Pig




Pig’s AVG is 300 lines of code
Let’s build an app
Graph Schema
                              Reshare: true
Gender: female
                                      Property
                                                       Tweet: 456
 Property
                                                 Reaction
                    Reactor                                            Reactor
                                  Tweet: 123

            Alice
                                                                            Bob
                                                            Property
                                    Property



                    Content: RT @bob                        Content: Data is fun!
                       Data is fun!
ElephantDB
                                   Shard 0
                                   Shard 1
                                   Shard 2       Distributed
Key/Value pairs
                                   Shard 3       Filesystem
                    Pre-shard      Shard 4
                   and index in
                                   Shard 5
                   MapReduce


                  Generation of domain of data
ElephantDB
DFS                       ElephantDB
                             Server
Shard 0
Shard 1
Shard 2                   ElephantDB
                             Server
Shard 3
Shard 4
Shard 5                   ElephantDB
                             Server


     Serving domain of data
Storm

Stream Processing

 Distributed RPC
Stream processing

• Automatically distributes computation
• Horizontally scalable
• Fault-tolerant
• Guarantees processing of messages
Stream processing

                         DB

Queue
                         DB


                         DB

         Storm cluster
Raw data   What is a query?


                          View
Tweets   What is a query?


                       # Tweets for
                          a URL
Tweets   What is a query?

                        Influence
                       Score for a
                         person
Raw data   Computing a query


              Fully precompute view   DB   Query
Raw data   Computing a query


              Do a live compute from scratch   Query
Computing a query

                                 DB
Raw data




           Precompute subviews        Compute query from
                                 DB                        Query
                                       intermediate dbs

                                 DB
Distributed RPC


Application                                                           Queue

              “I want to know X, and return the results to me at Y”
Distributed RPC
              DBs




Queue                     App queries




          Storm cluster
(BackType is hiring)
Questions?

More Related Content

What's hot (20)

PPTX
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
PPTX
Introduction to Storm
Eugene Dvorkin
 
PPS
Storm presentation
Shyam Raj
 
PPTX
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
PPT
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
PDF
Real-time streams and logs with Storm and Kafka
Andrew Montalenti
 
PDF
Apache Storm Concepts
André Dias
 
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
PDF
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
PDF
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
PDF
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Databricks
 
PPTX
Apache Storm Internals
Humoyun Ahmedov
 
PDF
Storm and Cassandra
T Jake Luciani
 
PPTX
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
PPTX
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
PDF
Analysis big data by use php with storm
毅 吕
 
PDF
Storm Real Time Computation
Sonal Raj
 
PDF
Introduction to Apache Storm - Concept & Example
Dung Ngua
 
PDF
The inherent complexity of stream processing
nathanmarz
 
PPTX
Multi-tenant Apache Storm as a service
Robert Evans
 
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Introduction to Storm
Eugene Dvorkin
 
Storm presentation
Shyam Raj
 
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
Real-time streams and logs with Storm and Kafka
Andrew Montalenti
 
Apache Storm Concepts
André Dias
 
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Databricks
 
Apache Storm Internals
Humoyun Ahmedov
 
Storm and Cassandra
T Jake Luciani
 
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
Analysis big data by use php with storm
毅 吕
 
Storm Real Time Computation
Sonal Raj
 
Introduction to Apache Storm - Concept & Example
Dung Ngua
 
The inherent complexity of stream processing
nathanmarz
 
Multi-tenant Apache Storm as a service
Robert Evans
 

Viewers also liked (7)

PDF
Yet another startup built on Clojure(Script)
Paul Lam
 
PDF
Promise list
Logan Campbell
 
PDF
Clojure at a post office
Logan Campbell
 
PDF
Your Code is Wrong
nathanmarz
 
PDF
Clojure: Towards The Essence of Programming
Howard Lewis Ship
 
ODP
Getting started with Clojure
John Stevenson
 
PDF
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Codemotion
 
Yet another startup built on Clojure(Script)
Paul Lam
 
Promise list
Logan Campbell
 
Clojure at a post office
Logan Campbell
 
Your Code is Wrong
nathanmarz
 
Clojure: Towards The Essence of Programming
Howard Lewis Ship
 
Getting started with Clojure
John Stevenson
 
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Codemotion
 
Ad

Similar to Clojure at BackType (20)

PPTX
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Scott Hernandez
 
PDF
Is NoSQL The Future of Data Storage?
Saltmarch Media
 
KEY
NOSQL, CouchDB, and the Cloud
boorad
 
PDF
Slide presentation pycassa_upload
Rajini Ramesh
 
PDF
Spring one2gx2010 spring-nonrelational_data
Roger Xia
 
KEY
DevNation Atlanta
boorad
 
PPTX
Drill Bay Area HUG 2012-09-19
jasonfrantz
 
PDF
Sep 2012 HUG: Apache Drill for Interactive Analysis
Yahoo Developer Network
 
PDF
NoSQL Overview
adesso AG
 
PPTX
Drill dchug-29 nov2012
MapR Technologies
 
PDF
No SQL Technologies
Cris Holdorph
 
PPTX
Drill at the Chug 9-19-12
Ted Dunning
 
PDF
Seminar.2010.NoSql
roialdaag
 
PDF
Polygot persistence for Java Developers - August 2011 / @Oakjug
Chris Richardson
 
PPTX
Intro to Big Data and NoSQL
Don Demcsak
 
PPTX
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
PPTX
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
PPTX
PhillyDB Talk - Beyond Batch
boorad
 
PDF
Outside The Box With Apache Cassnadra
Eric Evans
 
PDF
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Jose Mº Muñoz
 
Realtime Analytics with MongoDB Counters (mongonyc 2012)
Scott Hernandez
 
Is NoSQL The Future of Data Storage?
Saltmarch Media
 
NOSQL, CouchDB, and the Cloud
boorad
 
Slide presentation pycassa_upload
Rajini Ramesh
 
Spring one2gx2010 spring-nonrelational_data
Roger Xia
 
DevNation Atlanta
boorad
 
Drill Bay Area HUG 2012-09-19
jasonfrantz
 
Sep 2012 HUG: Apache Drill for Interactive Analysis
Yahoo Developer Network
 
NoSQL Overview
adesso AG
 
Drill dchug-29 nov2012
MapR Technologies
 
No SQL Technologies
Cris Holdorph
 
Drill at the Chug 9-19-12
Ted Dunning
 
Seminar.2010.NoSql
roialdaag
 
Polygot persistence for Java Developers - August 2011 / @Oakjug
Chris Richardson
 
Intro to Big Data and NoSQL
Don Demcsak
 
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
PhillyDB Talk - Beyond Batch
boorad
 
Outside The Box With Apache Cassnadra
Eric Evans
 
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Jose Mº Muñoz
 
Ad

More from nathanmarz (10)

PDF
Demystifying Data Engineering
nathanmarz
 
PDF
The Epistemology of Software Engineering
nathanmarz
 
PDF
Runaway complexity in Big Data... and a plan to stop it
nathanmarz
 
PDF
Storm
nathanmarz
 
KEY
Become Efficient or Die: The Story of BackType
nathanmarz
 
KEY
Cascalog workshop
nathanmarz
 
PDF
Cascalog at Hadoop Day
nathanmarz
 
KEY
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
KEY
Cascalog
nathanmarz
 
KEY
Cascading
nathanmarz
 
Demystifying Data Engineering
nathanmarz
 
The Epistemology of Software Engineering
nathanmarz
 
Runaway complexity in Big Data... and a plan to stop it
nathanmarz
 
Storm
nathanmarz
 
Become Efficient or Die: The Story of BackType
nathanmarz
 
Cascalog workshop
nathanmarz
 
Cascalog at Hadoop Day
nathanmarz
 
Cascalog at May Bay Area Hadoop User Group
nathanmarz
 
Cascalog
nathanmarz
 
Cascading
nathanmarz
 

Clojure at BackType

Editor's Notes