SlideShare a Scribd company logo
Doing data science
with Clojure
@sbelak
simon@goopti.com
Doing data science with Clojure
Doing data science with Clojure
Design
constraints
The analytics chasm
Ideal. Almost real-time, can
be done during brainstorming
without disrupting flow
< 2min < 20min project
squeeze in
somewhere
in the day
fail
roadmap

ahoy!
Think in distributions,
not numbers
No
throwaways
Sharing results
• Have one canonical version that is always current.
• Concentrate discussion in one place and make it
searchable and persistent.
• Include methodology (=code).
The environment
REPL vs. notebook
REPL vs. notebook+
Doing data science with Clojure
(hacked) gorilla-repl.org
+
auto-refresh
+
hypothes.is
#alderaan #sales #growth
Code hidden, but
can be expanded
Questions,
comments,
&
annotations
Shareable
Periodically re-run
to keep it fresh
#alderaan #sales #growth
discoverability
Wishlist/TODO
• Better editor (shaunlebron.github.io/parinfer/ ?)
• Embedded REPL
• Better exception reporting
• Browsable data structures

(tried and miserably failed: org-babel)
The tools
Doing data science with Clojure
Data frame
• Data tends to be heterogeneous
• Clojure excels in structure manipulation/encoding
github.com/sbelak/huri
• No data structures, just functions over collections
• Composable (even DSLs — no macros!)
• Reasonably fast (transducers <3)
• Do-what-I-mean (auto-sort, liberal with inputs, …)
• Minimal buy-in
• Support reaching into nested structures everywhere
composable
data structure
based DSLs
->> and partial friendly
Support reaching into
nested structures
everywhere
vanilla vector of maps
interoperability
Provide curried versions
where possible
Composability is key to
quick iterating
• Provide curried versions where possible
• ->> and partial friendly
• encode computation in structure (comp, some-fn,
every-pred, data structure based DSLs, …)
• consistent API
Catching errors early more context
easier debugging faster iterating
<3 Bret Victor
Q: What about machine learning?
A: farm it out to
sklearn
huri.plot
• DSL on top of ggplot2 (via gg4clj)
• Targets Gorilla REPL
• Follows the rest of Huri’s design philosophy
• bar chart, scatter plot, line chart, box & violin plot,
heatmap, histogram
Doing data science with Clojure
Wishlist/TODO
• (even) better structure manipulation (via Spectre?)
• Interactive plots
• More transducer-compatible (online) math
functions
• Optimizing ->> (rewrite code on the fly to do more
with transducer composition)
Projects worth keeping
an eye on
github.com/thi-ng/geom
github.com/yieldbot/vizard
zeppelin-project.org
github.com/aphyr/tesser
github.com/nathanmarz/specter
Questions
@sbelak
github.com/sbelak/huri

More Related Content

Viewers also liked (15)

PDF
Функциональное программирование и Clojure
AnjLab
 
PDF
Predicting the future with goopti
Simon Belak
 
PDF
Spec + onyx
Simon Belak
 
PPTX
inOrbit 2015: odkrivanje segmentov iz podatkov
Red Orbit digital marketing
 
PDF
Odkrivanje segmentov iz podatkov
Simon Belak
 
PDF
The time is out of joint: O cursed spite, / That ever I was born to set it ri...
Simon Belak
 
PDF
Living with-spec
Simon Belak
 
PDF
Turn to datadriven: the first 6 months
Simon Belak
 
PPT
O Filozofih In Programih
Simon Belak
 
PDF
Dao of lisp
Simon Belak
 
PPTX
Napadi na algoritme za strojno učenje
Simon Belak
 
PDF
Turn to data-driven: the first 6 months, Simon Belak
Red Orbit digital marketing
 
PDF
Living with-spec
Simon Belak
 
PPTX
Clojure for Data Science
Mike Anderson
 
PDF
Using Onyx in anger
Simon Belak
 
Функциональное программирование и Clojure
AnjLab
 
Predicting the future with goopti
Simon Belak
 
Spec + onyx
Simon Belak
 
inOrbit 2015: odkrivanje segmentov iz podatkov
Red Orbit digital marketing
 
Odkrivanje segmentov iz podatkov
Simon Belak
 
The time is out of joint: O cursed spite, / That ever I was born to set it ri...
Simon Belak
 
Living with-spec
Simon Belak
 
Turn to datadriven: the first 6 months
Simon Belak
 
O Filozofih In Programih
Simon Belak
 
Dao of lisp
Simon Belak
 
Napadi na algoritme za strojno učenje
Simon Belak
 
Turn to data-driven: the first 6 months, Simon Belak
Red Orbit digital marketing
 
Living with-spec
Simon Belak
 
Clojure for Data Science
Mike Anderson
 
Using Onyx in anger
Simon Belak
 

More from Simon Belak (19)

PDF
Tools for building the future
Simon Belak
 
PDF
Doing data science with clojure
Simon Belak
 
PDF
Exploratory analysis
Simon Belak
 
PDF
Levelling up your data infrastructure
Simon Belak
 
PDF
The subtle art of recommendation
Simon Belak
 
PDF
Metabase Ljubljana Meetup #2
Simon Belak
 
PDF
Metabase lj meetup
Simon Belak
 
PDF
Sketch algorithms
Simon Belak
 
PDF
Transducing for fun and profit
Simon Belak
 
PDF
Your metrics are wrong
Simon Belak
 
PDF
Writing smart contracts the sane way
Simon Belak
 
PDF
Online statistical analysis using transducers and sketch algorithms
Simon Belak
 
PDF
Save the princess
Simon Belak
 
PDF
Data driven going to market strategy
Simon Belak
 
PDF
Spec: a lisp-flavoured type system
Simon Belak
 
PDF
A data layer in clojure
Simon Belak
 
PDF
Statisics for hackers
Simon Belak
 
PDF
The data driven startup
Simon Belak
 
PDF
Investor story
Simon Belak
 
Tools for building the future
Simon Belak
 
Doing data science with clojure
Simon Belak
 
Exploratory analysis
Simon Belak
 
Levelling up your data infrastructure
Simon Belak
 
The subtle art of recommendation
Simon Belak
 
Metabase Ljubljana Meetup #2
Simon Belak
 
Metabase lj meetup
Simon Belak
 
Sketch algorithms
Simon Belak
 
Transducing for fun and profit
Simon Belak
 
Your metrics are wrong
Simon Belak
 
Writing smart contracts the sane way
Simon Belak
 
Online statistical analysis using transducers and sketch algorithms
Simon Belak
 
Save the princess
Simon Belak
 
Data driven going to market strategy
Simon Belak
 
Spec: a lisp-flavoured type system
Simon Belak
 
A data layer in clojure
Simon Belak
 
Statisics for hackers
Simon Belak
 
The data driven startup
Simon Belak
 
Investor story
Simon Belak
 
Ad

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna36
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Ad

Doing data science with Clojure