SlideShare a Scribd company logo
Using Onyx in
anger
@sbelak
simon@goopti.com
Onyxa masterless, cloud scale, fault tolerant, high
performance distributed computation system
… written entirely in Clojure
Onyx at
• In production for almost a year 

• ETL
• online machine learning
• offline (batch) machine learning
• ad-hoc analysis
Self-service infrastructure
for data scientists
1.Onyx at a glance
2.How Onyx rewired my brain
3.Putting “data is code” to work
1.Onyx at a glance
2.How Onyx rewired my brain
3.Putting “data is code” to work
Describing computation
with data
Onyx at a glance
Job =
[[:input :processing-1]
[:input :processing-2]
[:processing-1 :output-1]
[:processing-2 :output-2]]
[{:flow/from :input-stream
:flow/to [:process-adults]
:flow/predicate :my.ns/adult?
:flow/doc "Emits segment if an adult.”}]
workflow
+ flow conditions
+ catalogue[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/doc "Writes segments to a core.async channel"}]
Catalogue
[{:onyx/name :add-5
:onyx/fn :my/adder
:onyx/type :function
:my/n 5
:onyx/params [:my/n]}
{:onyx/name :in
:onyx/plugin :onyx.plugin.core-async/input
:onyx/type :input
:onyx/medium :core.async
:onyx/batch-size batch-size
:onyx/max-peers 1
:onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out
:onyx/plugin :onyx.plugin.core-async/output
:onyx/type :output
:onyx/medium :core.async
:onyx/doc "Writes segments to a core.async channel"}]
Vanilla Clojure function


(defn adder [n {:keys [x] :as segment}]
(assoc segment :x (+ n x))))
Plugins (I/O)
seq, async, Kafka,
Datomic, SQL, S3,
SQS, …
parameter
self-documenting
Computation entirely
described with data
data
is
code!
Everything can be run
locally!
Testing without
mocking
How Onyx rewired my
brain
It’s not about scaling,
but clean architecture
My goto architecture
KafkaDB Events
Onyx Onyx
Onyx
Persist all events to S3
• time travel
• query with AWS Athena
Decomplect
everything
Computation graphs
Putting “data is code”
to work
Interlude: 

queryable data descriptions
with spec
• s/registry, s/form
• Build a graph (Datomic)
Interact with your type system!
code
is
data!
Case study: autogenerating materialised views
Kafka
Materialised
views
Events
External data
Automatic view generation
• Event & attribute ontology
• Manual (via spec)
• Inferred
• Statistical analysis (seasonality
detection, outlier removal, …)
Onyx Onyx
Onyx
Automatic view generation
1. Walk spec registry
2. Apply rules
1. Define new view (spec)
2. Trigger Onyx job that creates the view
⤾
Code is data
or
data is code?
Takeouts
Onyx 

is 

production 

ready
Everything should be
live and interactive
Computation graphs are
a great way to structure
data processing code
Queryable data and
computation descriptions
supercharge interactive
development and are a
great building block for
automation
Questions
@sbelak
simon@goopti.com
viebel.github.io/klipse/examples/onyx.html
onyxplatform.org
onyxplatform.org/jekyll/update/2017/02/08/Pyroclast-
Preview-Simulation.html

More Related Content

What's hot (20)

PPTX
Elk meetup boston - logz.io
tomerlevy9
 
PDF
Luigi future
Erik Bernhardsson
 
PDF
Streaming data to s3 using akka streams
Mikhail Girkin
 
PDF
Logs aggregation and analysis
Divante
 
PDF
Managing data workflows with Luigi
Teemu Kurppa
 
PDF
Vocanic Map Reduce Lite
Shreeniwas Iyer
 
PPTX
To kotlin or not to kotlin. That's the question
Freddie Wang
 
PPTX
Vizualize 300 tb in less than 5_seconds
Yaniv Shalev
 
PDF
Realtime Data Analytics
Bo Yang
 
PDF
Airflow introduction
Chandler Huang
 
PDF
Introduction to Apache Beam
Jean-Baptiste Onofré
 
PPTX
Introduction to ELK
YuHsuan Chen
 
PPTX
Airflow - a data flow engine
Walter Liu
 
PDF
Airflow presentation
Ilias Okacha
 
PDF
"Ускорение сборки большого проекта на Objective-C + Swift" Иван Бондарь (Avito)
AvitoTech
 
PDF
Predictive Datacenter Analytics with Strymon
Vasia Kalavri
 
PPTX
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
PDF
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Databricks
 
PDF
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
Romain Dorgueil
 
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Elk meetup boston - logz.io
tomerlevy9
 
Luigi future
Erik Bernhardsson
 
Streaming data to s3 using akka streams
Mikhail Girkin
 
Logs aggregation and analysis
Divante
 
Managing data workflows with Luigi
Teemu Kurppa
 
Vocanic Map Reduce Lite
Shreeniwas Iyer
 
To kotlin or not to kotlin. That's the question
Freddie Wang
 
Vizualize 300 tb in less than 5_seconds
Yaniv Shalev
 
Realtime Data Analytics
Bo Yang
 
Airflow introduction
Chandler Huang
 
Introduction to Apache Beam
Jean-Baptiste Onofré
 
Introduction to ELK
YuHsuan Chen
 
Airflow - a data flow engine
Walter Liu
 
Airflow presentation
Ilias Okacha
 
"Ускорение сборки большого проекта на Objective-C + Swift" Иван Бондарь (Avito)
AvitoTech
 
Predictive Datacenter Analytics with Strymon
Vasia Kalavri
 
Strata+Hadoop 2015 NYC End User Panel on Real-Time Data Analytics
SingleStore
 
Correctness and Performance of Apache Spark SQL with Bogdan Ghit and Nicolas ...
Databricks
 
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
Romain Dorgueil
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 

Viewers also liked (20)

PDF
BigchainDB: A Scalable Blockchain Database, In Python
Trent McConaghy
 
PDF
Predicting the future with goopti
Simon Belak
 
PPTX
Road Trip To Component
Marketa Adamova
 
PPTX
Alfalfa homoeopathic materia medica slide show presentation by Dr. Hansaraj s...
Dr.hansraj salve
 
PDF
Odkrivanje segmentov iz podatkov
Simon Belak
 
PDF
Dao of lisp
Simon Belak
 
PPTX
4 Dimensional Flipping: Setting the Stage for 21st Century Skills
Kelly Walsh
 
PPTX
Ansible Automation Best Practices From Startups to Enterprises - Minnebar 12
Keith Resar
 
PDF
Go-jek's Go-Food Chatbot
Irwansyah Irwansyah
 
PPTX
El currículum de mi vida
Daniel Teran Carnerero
 
PPTX
Growing Makers in Medicine, Life Sciences, and Healthcare
Bohyun Kim
 
PPTX
Iron Values TPC, Spare 3, part 1/2
Radiochocolate
 
PDF
Scrum! But ... SAP Inside Track Frankfurt 2017
Martin Fischer
 
PPTX
Игорь Фесенко "What’s New in C# 7.0"
Fwdays
 
PDF
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
Alvaro Viebrantz
 
PDF
Packet optical transformation ofc2017
domenico di mola
 
PDF
JLL's City Momentum Index 2017
JLL
 
ODT
10 bài
bap19
 
PDF
Ross broiler handbook
AbdelRahman Yousef
 
PDF
Doha courses plan 2017
Khaled Ramadan
 
BigchainDB: A Scalable Blockchain Database, In Python
Trent McConaghy
 
Predicting the future with goopti
Simon Belak
 
Road Trip To Component
Marketa Adamova
 
Alfalfa homoeopathic materia medica slide show presentation by Dr. Hansaraj s...
Dr.hansraj salve
 
Odkrivanje segmentov iz podatkov
Simon Belak
 
Dao of lisp
Simon Belak
 
4 Dimensional Flipping: Setting the Stage for 21st Century Skills
Kelly Walsh
 
Ansible Automation Best Practices From Startups to Enterprises - Minnebar 12
Keith Resar
 
Go-jek's Go-Food Chatbot
Irwansyah Irwansyah
 
El currículum de mi vida
Daniel Teran Carnerero
 
Growing Makers in Medicine, Life Sciences, and Healthcare
Bohyun Kim
 
Iron Values TPC, Spare 3, part 1/2
Radiochocolate
 
Scrum! But ... SAP Inside Track Frankfurt 2017
Martin Fischer
 
Игорь Фесенко "What’s New in C# 7.0"
Fwdays
 
Backend, app e internet das coisas com NodeJS no Google Cloud Platform
Alvaro Viebrantz
 
Packet optical transformation ofc2017
domenico di mola
 
JLL's City Momentum Index 2017
JLL
 
10 bài
bap19
 
Ross broiler handbook
AbdelRahman Yousef
 
Doha courses plan 2017
Khaled Ramadan
 
Ad

More from Simon Belak (20)

PDF
Tools for building the future
Simon Belak
 
PDF
Doing data science with clojure
Simon Belak
 
PDF
Exploratory analysis
Simon Belak
 
PDF
Levelling up your data infrastructure
Simon Belak
 
PDF
The subtle art of recommendation
Simon Belak
 
PDF
Metabase Ljubljana Meetup #2
Simon Belak
 
PDF
Metabase lj meetup
Simon Belak
 
PDF
Sketch algorithms
Simon Belak
 
PDF
Transducing for fun and profit
Simon Belak
 
PDF
Your metrics are wrong
Simon Belak
 
PDF
Writing smart contracts the sane way
Simon Belak
 
PDF
Online statistical analysis using transducers and sketch algorithms
Simon Belak
 
PDF
Data driven going to market strategy
Simon Belak
 
PDF
Living with-spec
Simon Belak
 
PDF
Living with-spec
Simon Belak
 
PDF
Doing data science with Clojure
Simon Belak
 
PDF
Doing data science with Clojure
Simon Belak
 
PDF
The log
Simon Belak
 
PDF
Doing data science with Clojure
Simon Belak
 
PDF
Turn to datadriven: the first 6 months
Simon Belak
 
Tools for building the future
Simon Belak
 
Doing data science with clojure
Simon Belak
 
Exploratory analysis
Simon Belak
 
Levelling up your data infrastructure
Simon Belak
 
The subtle art of recommendation
Simon Belak
 
Metabase Ljubljana Meetup #2
Simon Belak
 
Metabase lj meetup
Simon Belak
 
Sketch algorithms
Simon Belak
 
Transducing for fun and profit
Simon Belak
 
Your metrics are wrong
Simon Belak
 
Writing smart contracts the sane way
Simon Belak
 
Online statistical analysis using transducers and sketch algorithms
Simon Belak
 
Data driven going to market strategy
Simon Belak
 
Living with-spec
Simon Belak
 
Living with-spec
Simon Belak
 
Doing data science with Clojure
Simon Belak
 
Doing data science with Clojure
Simon Belak
 
The log
Simon Belak
 
Doing data science with Clojure
Simon Belak
 
Turn to datadriven: the first 6 months
Simon Belak
 
Ad

Recently uploaded (20)

PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 

Using Onyx in anger