SlideShare a Scribd company logo
Probabilistic Programming:
Why, What, How, When
Beau Cronin
@beaucronin
40 Action-Packed Minutes
‣ Why you should care - what’s wrong with what we’ve got?
‣ What probabilistic programming is, and what programs look like
‣ How you can get started today
‣ When will all of this be ready for production use?
Why?
We use data to learn about the world
Traditional!
Machine Learning
Hierarchical
Bayesian Modeling
Large Scale Small
Mature & Robust Tools & frameworks Immature & Spotty
Discard
Structure &
Knowledge
Keep & Leverage
Homogeneous Data Types Heterogeneous
Toolkit,
Theory-light
Philosophical
Approach
Modeling,
Theory-heavy
Why?
G = {V, E}
What order were these links added in?
What messages flow over this link?
What do we know about this user?
Why?
x1 x2 lat1 long1 t1 t2 t3 t4 address1
1 1.2 2 34.0 118.2 2.3 3.4 1.9 10.4 516 61st St,
2 0.1 1 40.7 73.9 -1.5 4.5 8.9 2305 Tustin
3 10.5 0 37.9 122.3 4.7 -2.5 -3.4 1 Market St.
4 8.3 -1 -22.9 43.2 4.2 5.6 1.6 9.5
5 4.9 5 -37.8 -145.0 1600 Pennsyl
6 1.5 1 3.4 4.0 4.6 5.2 650 7th St., S
Positive numbers
Categorical values
Locations Time Series
AddressesMissing values
Why?
Diverse Data
Most real datasets contain compositions of these and
more, but we routinely homogenize in preprocessing
Lorem Ipsum
Trees &
Graphs
Time
Series
Relations
Locations &
Addresses
Images &
Movies
Audio
Sets &
Partitions
Text
Why?
Business Data Is Heterogeneous and
Structured
id: “abcdef”
gender: “Male”
dob: 1978-12-09
twitter_id: 9458201
Profile
2014-01-21 18:41:04, “https://devcenter.heroku.com/articles/quickstart”, …
2014-01-20 12:35:56, “https://devcenter.heroku.com/categories/java”, …
2014-01-20 09:12:52, “https://devcenter.heroku.com/articles/ssl-endpoint”, …
Page Views
Order Date Order ID Title Category ASIN/ISBN Release DateConditionSeller Per Unit Price
1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B003RYQJJW new The Sock Company, Inc.$21.99
1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B004UONNXI new The Sock Company, Inc.$21.99
1/8/13 002-2593752-8837806 CivilWarLand in Bad DeclinePaperback 1573225797 1/31/97 new Amazon.com LLC $8.4
1/8/13 109-0985451-2187421 Nothing to Envy: Ordinary Lives in North KoreaPaperback 385523912 9/20/10 new Amazon.com LLC$10.88
1/12/13 109-8581642-2322617 Excession Mass Market Paperback553575376 2/1/98 new Amazon.com LLC $7.99
Transactions
[
{
text: “key to compelling VR is…”,
retweet_count: 3,
favorites_count: 5,
urls: [ ],
hashtags: [ ],
in_reply_to: 39823792801012
…
},
{
text: “@John4man really liked your piece”,
retweets: 0,
favorites: 0,
…
}
]
Social Posts
[ 657693, 7588892, 9019482, …]
Followers
blocked: False
want_retweets: True
marked_spam: False
since: 2013-09-13
Relationship
Every Domain Is Heterogeneous
‣ Health data: doctor notes, lab results, imaging, family history,
prescriptions
‣ Quantified self: motion sensors, heart rate, GPS tracks, self-
reporting, sleep patterns
‣ Autonomous vehicles: LIDAR, cameras, maps, audio, gyros,
telemetry, GPS
Why?
Mostly, no one even tries
to jointly model these
different kinds of data
Why?
A probabilistic programming system is…
a language + {compiler, interpreter}
	 or 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 	 that
a {library, framework} for an existing language
- includes random choices as native elements
- and provides a clean separation between probabilistic modeling
and inference
- and may provide automated generation of inference solutions for a
given program
What?
Probabilistic Programming
Systems Model the World
‣ Programs directly represent the data generation process
‣ Measurement processes can be modeled directly, including their
imperfections and the uncertainty that comes with them
‣ Philosophy
‣ DO: capture the essential aspects of real-world processes in a model
‣ DON’T: torture the data into the right form for an algorithm
What?
A Probability Model
✕ N
Fixed
Observable
Unknown
Constant values and !
structural assumptions
Variables that discriminate
between hypotheses
Data and potential data
What?
Obligatory Bayes’ Rule
Pr(H | D, A) ∝ Pr(D | H, A) Pr(H | A)
Data
Hypotheses
Pr(H | D) ∝ Pr(D | H) Pr(H)
Assumptions
What?
!
!
!
fair-prior = .999
!
fair-coin? = flip(fair-prior)
!
if fair-coin?:
weight = 0.5
else:
weight = 0.9
!
observe(repeat(flip(weight), 10)),
[H, H, H, H, H, H, H, H, H, H])
!
query(fair-coin?)
First example: Deciding if a coin is fair based on flips
Assumptions
!
Unknowns
!
Observables
Probabilistic Programming
Systems Are Diverse
‣ Library vs. stand-alone language
‣ Base language: Scala, Lisp, Python
‣ Manual, semi-, or fully-automated inference
‣ Modeling domain: directed/undirected graphical models, relational
data, all programs
‣ Home field: cognitive science, programming languages, databases,
Bayesian statistics, artificial intelligence
What?
PPSs Compared
Type Language Inference
BLOG Stand-alone Custom Fully Auto
BUGS / JAGS Stand-alone Custom Fully Auto
STAN Hybrid R, Python Fully Auto
PyMC Library Python Manual
Infer.net Library C# Semi-auto
Church Stand-alone Lisp Fully Auto
Venture Stand-alone Javascript, Lisp Semi-auto
Figaro Library Scala Semi-auto
factorie Library Scala Semi-auto
What?
infer.net
‣ A C# framework (also F#)
‣ Developed at MSR
‣ Under active development, with good tutorials and many well-
documented examples
How?
VariableArray<bool> controlGroup =
Variable.Observed(new bool[] { false, false, true, false, false });
VariableArray<bool> treatedGroup =
Variable.Observed(new bool[] { true, false, true, true, true });
Range i = controlGroup.Range; Range j = treatedGroup.Range;
!
Variable<bool> isEffective = Variable.Bernoulli(0.5);
!
Variable<double> probIfTreated, probIfControl;
using (Variable.If(isEffective))
{
// Model if treatment is effective
probIfControl = Variable.Beta(1, 1);
controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i);
probIfTreated = Variable.Beta(1, 1);
treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j);
}
!
using (Variable.IfNot(isEffective))
{
// Model if treatment is not effective
Variable<double> probAll = Variable.Beta(1, 1);
controlGroup[i] = Variable.Bernoulli(probAll).ForEach(i);
treatedGroup[j] = Variable.Bernoulli(probAll).ForEach(j);
}
!
InferenceEngine ie = new InferenceEngine();
Console.WriteLine("Probability treatment has an effect = " + ie.Infer(isEffective));
Infer.net example: Is a new treatment effective?
http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Clinical%20trial%20tutorial.aspx
Observations
Unknown
Assumptions &
Unknowns
Query
PyMC
‣ Python (duh)
‣ Go watch Thomas Wiecki’s talk from PyData NY
‣ http://twiecki.github.io/blog/2013/12/12/bayesian-data-analysis-pymc3/
‣ And read Bayesian Methods for Hackers by Cam Davidson-Pilon et al.
How?
Church
‣ A Lisp
‣ Originally created to model cognitive development and human reasoning
‣ Active inference research, several implementations
‣ Connection between functional purity / independence vs. stochastic
memoization / exchangeability
‣ Hypothesis space is possible program executions
‣ “Probabilistic Models of Cognition”
How?
;stochastic memoization generator for class assignments
;sometimes return a previous symbol, sometimes create a new one
(define class-distribution (DP-stochastic-mem 1.0 gensym))
!
;associate a class with an object via memoization
(define object->class
(mem (lambda (object) (class-distribution))))
!
;associate gaussian parameters with a class via memoization
(define class->gaussian-parameters
(mem (lambda (class) (list (gaussian 65 10) (gaussian 0 8)))))
!
;generate observed values for an object
(define (observe object)
(apply gaussian (class->gaussian-parameters (object->class object))))
!
;generate observations for some objects
(map observe '(tom dick harry bill fred))
modified from https://probmods.org/non-parametric-models.html
Church example: Infinite Gaussian Mixture Model
(define kind-distribution (DPmem 1.0 gensym))
!
(define feature->kind
(mem (lambda (feature) (kind-distribution))))
!
(define kind->class-distribution
(mem (lambda (kind) (DPmem 1.0 gensym))))
!
(define feature-kind/object->class
(mem (lambda (kind object)
(sample (kind->class-distribution kind)))))
!
(define class->parameters
(mem (lambda (object-class) (first (beta 1 1)))))
!
(define (observe object feature)
(flip (class->parameters (feature-kind/object->class
(feature->kind feature) object))))
!
(observe 'eggs 'breakfast)
https://probmods.org/non-parametric-models.html
Church example: Cross-categorization (BayesDB)
Churj?
!
Jurch?
How?
So Far
‣ Why
‣ What
‣ How
‣ When
What We Still Need
1. Basic CS: Improved compilers and run-times for more efficient
automatic inference
2. Tooling: Debuggers, optimizers, IDEs, visualization
3. Tribal knowledge: idioms, patterns, best practices
When?
When?
14
• Application
• Code Libraries
• Programming
Language
• Compiler
• Hardware
The Probabilistic Programming Revolution
• Model
• Model Libraries
• Probabilistic
Programming
Language
• Inference Engine
• Hardware
Traditional Programming Probabilistic Programming
Code models capture how the data was
generated using random variables to
represent uncertainty
Libraries contain common model
components: Markov chains, deep
belief networks, etc.
PPL provides probabilistic primitives &
traditional PL constructs so users can
express model, queries, and data
Inference engine analyzes probabilistic
program and chooses appropriate
solver(s) for available hardware
Hardware can include multi-core, GPU,
cloud-based resources, GraphLab,
UPSIDE/Analog Logic results, etc.
High-level programming languages facilitate building complex systems
Probabilistic programming languages facilitate building rich ML applications
Approved for Public Release; Distribution Unlimited
15
• Shorter: Reduce LOC by 100x for machine learning applications
• Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG
• Microsoft MatchBox: 15K LOC in C# vs. 300 LOC in Fun
• Faster: Reduce development time by 100x
• Seismic Monitoring: Several years vs. 1 hour
• Microsoft TrueSkill: Six months for competent developer vs. 2 hours with Infer.Net
• Enable quick exploration of many models
• More Informative: Develop models that are 10x more sophisticated
• Enable surprising, new applications
• Incorporate rich domain-knowledge
• Produce more accurate answers
• Require less data
• Increase robustness with respect to noise
• Increase ability to cope with contradiction
• With less expertise: Enable 100x more programmers
• Separate the model (the program) from the solvers (the compiler),
enabling domain experts without machine learning PhDs to write applications
The Promise of Probabilistic Programming Languages
Probabilistic Programming could empower domain experts and ML experts
Sources:
• Bayesian Data Analysis, Gelman, 2003
• Pattern Recognition and Machine Learning,
Bishop, 2007
• Science, Tanenbaum et al, 2011
DISTRIBUTION STATEMENT F. Further dissemination only as directed by DARPA, (February 20, 2013) or higher DoD authority.
Optimizer
“What is happening
when I run this?”
Profiler
“Where is the
time and memory
being used?”
Debugger
“What is the exact
state of my program at
each point in time?”
Visualization
“What is the hidden
structure of my data,
and how certain
should I be?”
http://www.icg.tugraz.at/project/caleydo/
Probabilistic Programming Workflows?
ETL
data
prep
predictive
model
data
sources
end
uses
Lingual:
DW → ANSI SQL
Pattern:
SAS, R, etc. → PMML
business logic in Java,
Clojure, Scala, etc.
sink taps for
Memcached, HBase,
MongoDB, etc.
source taps for
Cassandra, JDBC,
Splunk, etc.
Definition: Data Workflows	

For example, Cascading and related projects implement the following
components, based on 100% open source:
cascading.org
adapted from
Paco Nathan:
Data Workflows
for Machine
Learning
Evolution of PPSs
When?
Bottom Line
‣ Go experiment and learn! - there are several good options
‣ But be realistic about the current state of the art
‣ And keep your ear to the ground - this area is moving fast
Parting Questions
‣ Which projects are good fits for probabilistic programming today?
‣ Exploration and prototyping vs. scaled production deployment?
‣ How long before we have the Python, Ruby, and even PHP of PPSs?
‣ Is there a unification with the log-centric view of big data processing?
‣ Can natively stochastic hardware provide compelling performance
gains?
When?
Resources
‣ probabilistic-programming.org
‣ Probabilistic Programming and Bayesian Methods for Hackers
‣ Probabilistic Models of Cognition
‣ Mathematica Journal article
‣ Thomas Wiecki’s PyData talk on PyMC
People To Watch
Vikash Mansinghka (MIT)
!
Noah Goodman (Stanford)
!
David Wingate (Lyric Labs)
!
Avi Pfeffer (CRA)
Rob Zinkov (USC)
!
Andrew Gordon (MSR)
!
John Winn (MSR)
!
Dan Roy (Cambridge)
Languages and Systems
‣ PyMC
‣ infer.net
‣ STAN
‣ Figaro
!
‣ BLOG
‣ Church
‣ factor.ie
‣ BUGS / JAGS
@beaucronin

More Related Content

What's hot (20)

PDF
Brief Introduction to Boltzmann Machine
Arunabha Saha
 
PDF
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
PPT
Knowledge Representation in Artificial intelligence
Yasir Khan
 
PPTX
Text Classification
RAX Automation Suite
 
PDF
LSTM Basics
Akshay Sehgal
 
PDF
NLP using transformers
Arvind Devaraj
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
Data Mining: Data cube computation and data generalization
Datamining Tools
 
PPTX
Introduction to Text Mining and Topic Modelling
David Paule
 
PDF
Feature Engineering
HJ van Veen
 
PPT
Logical Agents
Yasir Khan
 
PPT
Rule Based System
Suresh Sambandam
 
PPT
M4 Heuristics
guestd3d0fb
 
PPTX
Knowledge Representation, Inference and Reasoning
Sagacious IT Solution
 
PPTX
0.0 Introduction to theory of computation
Sampath Kumar S
 
PPT
Schemaless Databases
Dan Gunter
 
PPTX
Classification in data mining
Sulman Ahmed
 
PPTX
Language models
Maryam Khordad
 
PPT
AI Lecture 6 (logical agents)
Tajim Md. Niamat Ullah Akhund
 
PPTX
Knowledge representation in AI
Vishal Singh
 
Brief Introduction to Boltzmann Machine
Arunabha Saha
 
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Knowledge Representation in Artificial intelligence
Yasir Khan
 
Text Classification
RAX Automation Suite
 
LSTM Basics
Akshay Sehgal
 
NLP using transformers
Arvind Devaraj
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Data Mining: Data cube computation and data generalization
Datamining Tools
 
Introduction to Text Mining and Topic Modelling
David Paule
 
Feature Engineering
HJ van Veen
 
Logical Agents
Yasir Khan
 
Rule Based System
Suresh Sambandam
 
M4 Heuristics
guestd3d0fb
 
Knowledge Representation, Inference and Reasoning
Sagacious IT Solution
 
0.0 Introduction to theory of computation
Sampath Kumar S
 
Schemaless Databases
Dan Gunter
 
Classification in data mining
Sulman Ahmed
 
Language models
Maryam Khordad
 
AI Lecture 6 (logical agents)
Tajim Md. Niamat Ullah Akhund
 
Knowledge representation in AI
Vishal Singh
 

Similar to Probabilistic Programming: Why, What, How, When? (20)

PDF
Introduction to Model-Based Machine Learning
Daniel Emaasit
 
PDF
Introduction to Model-Based Machine Learning for Transportation
Daniel Emaasit
 
PDF
Graphical Models In Python | Edureka
Edureka!
 
PPTX
Jay Yagnik at AI Frontiers : A History Lesson on AI
AI Frontiers
 
PDF
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
ODP
Implementation of Variational Inference for Non-Parametric Hidden Markov Models
James McInerney
 
PDF
Introduction to Bayesian Analysis in Python
Peadar Coyle
 
PPTX
planning and decision making
AdengappaUnavu
 
PDF
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Feynman Liang
 
PDF
Machine Learning Foundations
Albert Y. C. Chen
 
PPT
ProbabilisticModeling20080411
Clay Stanek
 
PPTX
Learn from Example and Learn Probabilistic Model
Junya Tanaka
 
PDF
Strata 2014: Design Challenges for Real Predictive Platforms
Max Gasner
 
PDF
Striving to Demystify Bayesian Computational Modelling
Marco Wirthlin
 
PDF
Machine Learning: Past, Present and Future - by Tom Dietterich
BigML, Inc
 
PDF
Probably, Definitely, Maybe
James McGivern
 
ODP
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Marco Wirthlin
 
PPTX
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
MLconf
 
PDF
“Probabilistic Logic Programs and Their Applications”
diannepatricia
 
PPT
AML_030607.ppt
butest
 
Introduction to Model-Based Machine Learning
Daniel Emaasit
 
Introduction to Model-Based Machine Learning for Transportation
Daniel Emaasit
 
Graphical Models In Python | Edureka
Edureka!
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
AI Frontiers
 
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Implementation of Variational Inference for Non-Parametric Hidden Markov Models
James McInerney
 
Introduction to Bayesian Analysis in Python
Peadar Coyle
 
planning and decision making
AdengappaUnavu
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Feynman Liang
 
Machine Learning Foundations
Albert Y. C. Chen
 
ProbabilisticModeling20080411
Clay Stanek
 
Learn from Example and Learn Probabilistic Model
Junya Tanaka
 
Strata 2014: Design Challenges for Real Predictive Platforms
Max Gasner
 
Striving to Demystify Bayesian Computational Modelling
Marco Wirthlin
 
Machine Learning: Past, Present and Future - by Tom Dietterich
BigML, Inc
 
Probably, Definitely, Maybe
James McGivern
 
Gentle Introduction: Bayesian Modelling and Probabilistic Programming in R
Marco Wirthlin
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
MLconf
 
“Probabilistic Logic Programs and Their Applications”
diannepatricia
 
AML_030607.ppt
butest
 
Ad

More from Salesforce Engineering (20)

PPTX
Locker Service Ready Lightning Components With Webpack
Salesforce Engineering
 
PPTX
Scaling HBase for Big Data
Salesforce Engineering
 
PPTX
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Salesforce Engineering
 
PPTX
Predictive System Performance Data Analysis
Salesforce Engineering
 
PPTX
Apache HBase State of the Project
Salesforce Engineering
 
PPTX
Hit the Trail with Trailhead
Salesforce Engineering
 
PPTX
HBase/PHOENIX @ Scale
Salesforce Engineering
 
PPTX
Scaling up data science applications
Salesforce Engineering
 
PPTX
Containers and Security for DevOps
Salesforce Engineering
 
PPTX
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Salesforce Engineering
 
PPTX
Monitoring @ Scale in Salesforce
Salesforce Engineering
 
PPTX
Performance Tuning with XHProf
Salesforce Engineering
 
PPTX
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
Salesforce Engineering
 
PPTX
Implementing a Content Strategy Is Like Running 100 Miles
Salesforce Engineering
 
PPTX
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Engineering
 
PDF
Koober Preduction IO Presentation
Salesforce Engineering
 
PPTX
Finding Security Issues Fast!
Salesforce Engineering
 
PDF
Microservices
Salesforce Engineering
 
PPTX
Global State Management of Micro Services
Salesforce Engineering
 
PPTX
The Future of Hbase
Salesforce Engineering
 
Locker Service Ready Lightning Components With Webpack
Salesforce Engineering
 
Scaling HBase for Big Data
Salesforce Engineering
 
Techniques to Effectively Monitor the Performance of Customers in the Cloud
Salesforce Engineering
 
Predictive System Performance Data Analysis
Salesforce Engineering
 
Apache HBase State of the Project
Salesforce Engineering
 
Hit the Trail with Trailhead
Salesforce Engineering
 
HBase/PHOENIX @ Scale
Salesforce Engineering
 
Scaling up data science applications
Salesforce Engineering
 
Containers and Security for DevOps
Salesforce Engineering
 
Aspect Oriented Programming: Hidden Toolkit That You Already Have
Salesforce Engineering
 
Monitoring @ Scale in Salesforce
Salesforce Engineering
 
Performance Tuning with XHProf
Salesforce Engineering
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
Salesforce Engineering
 
Implementing a Content Strategy Is Like Running 100 Miles
Salesforce Engineering
 
Salesforce Cloud Infrastructure and Challenges - A Brief Overview
Salesforce Engineering
 
Koober Preduction IO Presentation
Salesforce Engineering
 
Finding Security Issues Fast!
Salesforce Engineering
 
Microservices
Salesforce Engineering
 
Global State Management of Micro Services
Salesforce Engineering
 
The Future of Hbase
Salesforce Engineering
 
Ad

Recently uploaded (20)

PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PDF
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
PDF
Data structures notes for unit 2 in computer science.pdf
sshubhamsingh265
 
PPTX
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
PDF
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
PPTX
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
PDF
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
Knowledge Representation : Semantic Networks
Amity University, Patna
 
PPTX
MODULE 05 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
PDF
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPTX
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
PDF
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PDF
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PDF
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
Data structures notes for unit 2 in computer science.pdf
sshubhamsingh265
 
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
Water Resources Engineering (CVE 728)--Slide 3.pptx
mohammedado3
 
SERVERLESS PERSONAL TO-DO LIST APPLICATION
anushaashraf20
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Knowledge Representation : Semantic Networks
Amity University, Patna
 
MODULE 05 - CLOUD COMPUTING AND SECURITY.pptx
Alvas Institute of Engineering and technology, Moodabidri
 
Submit Your Papers-International Journal on Cybernetics & Informatics ( IJCI)
IJCI JOURNAL
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
澳洲电子毕业证澳大利亚圣母大学水印成绩单UNDA学生证网上可查学历
Taqyea
 
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
20ES1152 Programming for Problem Solving Lab Manual VRSEC.pdf
Ashutosh Satapathy
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
AN EMPIRICAL STUDY ON THE USAGE OF SOCIAL MEDIA IN GERMAN B2C-ONLINE STORES
ijait
 

Probabilistic Programming: Why, What, How, When?

  • 1. Probabilistic Programming: Why, What, How, When Beau Cronin @beaucronin
  • 2. 40 Action-Packed Minutes ‣ Why you should care - what’s wrong with what we’ve got? ‣ What probabilistic programming is, and what programs look like ‣ How you can get started today ‣ When will all of this be ready for production use?
  • 4. We use data to learn about the world Traditional! Machine Learning Hierarchical Bayesian Modeling Large Scale Small Mature & Robust Tools & frameworks Immature & Spotty Discard Structure & Knowledge Keep & Leverage Homogeneous Data Types Heterogeneous Toolkit, Theory-light Philosophical Approach Modeling, Theory-heavy Why?
  • 5. G = {V, E} What order were these links added in? What messages flow over this link? What do we know about this user? Why?
  • 6. x1 x2 lat1 long1 t1 t2 t3 t4 address1 1 1.2 2 34.0 118.2 2.3 3.4 1.9 10.4 516 61st St, 2 0.1 1 40.7 73.9 -1.5 4.5 8.9 2305 Tustin 3 10.5 0 37.9 122.3 4.7 -2.5 -3.4 1 Market St. 4 8.3 -1 -22.9 43.2 4.2 5.6 1.6 9.5 5 4.9 5 -37.8 -145.0 1600 Pennsyl 6 1.5 1 3.4 4.0 4.6 5.2 650 7th St., S Positive numbers Categorical values Locations Time Series AddressesMissing values Why?
  • 7. Diverse Data Most real datasets contain compositions of these and more, but we routinely homogenize in preprocessing Lorem Ipsum Trees & Graphs Time Series Relations Locations & Addresses Images & Movies Audio Sets & Partitions Text Why?
  • 8. Business Data Is Heterogeneous and Structured id: “abcdef” gender: “Male” dob: 1978-12-09 twitter_id: 9458201 Profile 2014-01-21 18:41:04, “https://devcenter.heroku.com/articles/quickstart”, … 2014-01-20 12:35:56, “https://devcenter.heroku.com/categories/java”, … 2014-01-20 09:12:52, “https://devcenter.heroku.com/articles/ssl-endpoint”, … Page Views Order Date Order ID Title Category ASIN/ISBN Release DateConditionSeller Per Unit Price 1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B003RYQJJW new The Sock Company, Inc.$21.99 1/5/13 002-1139353-0278652 Under Armour Men's Resistor No Show Socks,pack of 6 SocksApparel B004UONNXI new The Sock Company, Inc.$21.99 1/8/13 002-2593752-8837806 CivilWarLand in Bad DeclinePaperback 1573225797 1/31/97 new Amazon.com LLC $8.4 1/8/13 109-0985451-2187421 Nothing to Envy: Ordinary Lives in North KoreaPaperback 385523912 9/20/10 new Amazon.com LLC$10.88 1/12/13 109-8581642-2322617 Excession Mass Market Paperback553575376 2/1/98 new Amazon.com LLC $7.99 Transactions [ { text: “key to compelling VR is…”, retweet_count: 3, favorites_count: 5, urls: [ ], hashtags: [ ], in_reply_to: 39823792801012 … }, { text: “@John4man really liked your piece”, retweets: 0, favorites: 0, … } ] Social Posts [ 657693, 7588892, 9019482, …] Followers blocked: False want_retweets: True marked_spam: False since: 2013-09-13 Relationship
  • 9. Every Domain Is Heterogeneous ‣ Health data: doctor notes, lab results, imaging, family history, prescriptions ‣ Quantified self: motion sensors, heart rate, GPS tracks, self- reporting, sleep patterns ‣ Autonomous vehicles: LIDAR, cameras, maps, audio, gyros, telemetry, GPS Why?
  • 10. Mostly, no one even tries to jointly model these different kinds of data Why?
  • 11. A probabilistic programming system is… a language + {compiler, interpreter} or that a {library, framework} for an existing language - includes random choices as native elements - and provides a clean separation between probabilistic modeling and inference - and may provide automated generation of inference solutions for a given program What?
  • 12. Probabilistic Programming Systems Model the World ‣ Programs directly represent the data generation process ‣ Measurement processes can be modeled directly, including their imperfections and the uncertainty that comes with them ‣ Philosophy ‣ DO: capture the essential aspects of real-world processes in a model ‣ DON’T: torture the data into the right form for an algorithm What?
  • 13. A Probability Model ✕ N Fixed Observable Unknown Constant values and ! structural assumptions Variables that discriminate between hypotheses Data and potential data What?
  • 14. Obligatory Bayes’ Rule Pr(H | D, A) ∝ Pr(D | H, A) Pr(H | A) Data Hypotheses Pr(H | D) ∝ Pr(D | H) Pr(H) Assumptions What?
  • 15. ! ! ! fair-prior = .999 ! fair-coin? = flip(fair-prior) ! if fair-coin?: weight = 0.5 else: weight = 0.9 ! observe(repeat(flip(weight), 10)), [H, H, H, H, H, H, H, H, H, H]) ! query(fair-coin?) First example: Deciding if a coin is fair based on flips Assumptions ! Unknowns ! Observables
  • 16. Probabilistic Programming Systems Are Diverse ‣ Library vs. stand-alone language ‣ Base language: Scala, Lisp, Python ‣ Manual, semi-, or fully-automated inference ‣ Modeling domain: directed/undirected graphical models, relational data, all programs ‣ Home field: cognitive science, programming languages, databases, Bayesian statistics, artificial intelligence What?
  • 17. PPSs Compared Type Language Inference BLOG Stand-alone Custom Fully Auto BUGS / JAGS Stand-alone Custom Fully Auto STAN Hybrid R, Python Fully Auto PyMC Library Python Manual Infer.net Library C# Semi-auto Church Stand-alone Lisp Fully Auto Venture Stand-alone Javascript, Lisp Semi-auto Figaro Library Scala Semi-auto factorie Library Scala Semi-auto What?
  • 18. infer.net ‣ A C# framework (also F#) ‣ Developed at MSR ‣ Under active development, with good tutorials and many well- documented examples How?
  • 19. VariableArray<bool> controlGroup = Variable.Observed(new bool[] { false, false, true, false, false }); VariableArray<bool> treatedGroup = Variable.Observed(new bool[] { true, false, true, true, true }); Range i = controlGroup.Range; Range j = treatedGroup.Range; ! Variable<bool> isEffective = Variable.Bernoulli(0.5); ! Variable<double> probIfTreated, probIfControl; using (Variable.If(isEffective)) { // Model if treatment is effective probIfControl = Variable.Beta(1, 1); controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i); probIfTreated = Variable.Beta(1, 1); treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j); } ! using (Variable.IfNot(isEffective)) { // Model if treatment is not effective Variable<double> probAll = Variable.Beta(1, 1); controlGroup[i] = Variable.Bernoulli(probAll).ForEach(i); treatedGroup[j] = Variable.Bernoulli(probAll).ForEach(j); } ! InferenceEngine ie = new InferenceEngine(); Console.WriteLine("Probability treatment has an effect = " + ie.Infer(isEffective)); Infer.net example: Is a new treatment effective? http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Clinical%20trial%20tutorial.aspx Observations Unknown Assumptions & Unknowns Query
  • 20. PyMC ‣ Python (duh) ‣ Go watch Thomas Wiecki’s talk from PyData NY ‣ http://twiecki.github.io/blog/2013/12/12/bayesian-data-analysis-pymc3/ ‣ And read Bayesian Methods for Hackers by Cam Davidson-Pilon et al. How?
  • 21. Church ‣ A Lisp ‣ Originally created to model cognitive development and human reasoning ‣ Active inference research, several implementations ‣ Connection between functional purity / independence vs. stochastic memoization / exchangeability ‣ Hypothesis space is possible program executions ‣ “Probabilistic Models of Cognition” How?
  • 22. ;stochastic memoization generator for class assignments ;sometimes return a previous symbol, sometimes create a new one (define class-distribution (DP-stochastic-mem 1.0 gensym)) ! ;associate a class with an object via memoization (define object->class (mem (lambda (object) (class-distribution)))) ! ;associate gaussian parameters with a class via memoization (define class->gaussian-parameters (mem (lambda (class) (list (gaussian 65 10) (gaussian 0 8))))) ! ;generate observed values for an object (define (observe object) (apply gaussian (class->gaussian-parameters (object->class object)))) ! ;generate observations for some objects (map observe '(tom dick harry bill fred)) modified from https://probmods.org/non-parametric-models.html Church example: Infinite Gaussian Mixture Model
  • 23. (define kind-distribution (DPmem 1.0 gensym)) ! (define feature->kind (mem (lambda (feature) (kind-distribution)))) ! (define kind->class-distribution (mem (lambda (kind) (DPmem 1.0 gensym)))) ! (define feature-kind/object->class (mem (lambda (kind object) (sample (kind->class-distribution kind))))) ! (define class->parameters (mem (lambda (object-class) (first (beta 1 1))))) ! (define (observe object feature) (flip (class->parameters (feature-kind/object->class (feature->kind feature) object)))) ! (observe 'eggs 'breakfast) https://probmods.org/non-parametric-models.html Church example: Cross-categorization (BayesDB)
  • 25. So Far ‣ Why ‣ What ‣ How ‣ When
  • 26. What We Still Need 1. Basic CS: Improved compilers and run-times for more efficient automatic inference 2. Tooling: Debuggers, optimizers, IDEs, visualization 3. Tribal knowledge: idioms, patterns, best practices When?
  • 27. When?
  • 28. 14 • Application • Code Libraries • Programming Language • Compiler • Hardware The Probabilistic Programming Revolution • Model • Model Libraries • Probabilistic Programming Language • Inference Engine • Hardware Traditional Programming Probabilistic Programming Code models capture how the data was generated using random variables to represent uncertainty Libraries contain common model components: Markov chains, deep belief networks, etc. PPL provides probabilistic primitives & traditional PL constructs so users can express model, queries, and data Inference engine analyzes probabilistic program and chooses appropriate solver(s) for available hardware Hardware can include multi-core, GPU, cloud-based resources, GraphLab, UPSIDE/Analog Logic results, etc. High-level programming languages facilitate building complex systems Probabilistic programming languages facilitate building rich ML applications Approved for Public Release; Distribution Unlimited
  • 29. 15 • Shorter: Reduce LOC by 100x for machine learning applications • Seismic Monitoring: 28K LOC in C vs. 25 LOC in BLOG • Microsoft MatchBox: 15K LOC in C# vs. 300 LOC in Fun • Faster: Reduce development time by 100x • Seismic Monitoring: Several years vs. 1 hour • Microsoft TrueSkill: Six months for competent developer vs. 2 hours with Infer.Net • Enable quick exploration of many models • More Informative: Develop models that are 10x more sophisticated • Enable surprising, new applications • Incorporate rich domain-knowledge • Produce more accurate answers • Require less data • Increase robustness with respect to noise • Increase ability to cope with contradiction • With less expertise: Enable 100x more programmers • Separate the model (the program) from the solvers (the compiler), enabling domain experts without machine learning PhDs to write applications The Promise of Probabilistic Programming Languages Probabilistic Programming could empower domain experts and ML experts Sources: • Bayesian Data Analysis, Gelman, 2003 • Pattern Recognition and Machine Learning, Bishop, 2007 • Science, Tanenbaum et al, 2011 DISTRIBUTION STATEMENT F. Further dissemination only as directed by DARPA, (February 20, 2013) or higher DoD authority.
  • 31. Profiler “Where is the time and memory being used?”
  • 32. Debugger “What is the exact state of my program at each point in time?”
  • 33. Visualization “What is the hidden structure of my data, and how certain should I be?” http://www.icg.tugraz.at/project/caleydo/
  • 34. Probabilistic Programming Workflows? ETL data prep predictive model data sources end uses Lingual: DW → ANSI SQL Pattern: SAS, R, etc. → PMML business logic in Java, Clojure, Scala, etc. sink taps for Memcached, HBase, MongoDB, etc. source taps for Cassandra, JDBC, Splunk, etc. Definition: Data Workflows For example, Cascading and related projects implement the following components, based on 100% open source: cascading.org adapted from Paco Nathan: Data Workflows for Machine Learning
  • 36. Bottom Line ‣ Go experiment and learn! - there are several good options ‣ But be realistic about the current state of the art ‣ And keep your ear to the ground - this area is moving fast
  • 37. Parting Questions ‣ Which projects are good fits for probabilistic programming today? ‣ Exploration and prototyping vs. scaled production deployment? ‣ How long before we have the Python, Ruby, and even PHP of PPSs? ‣ Is there a unification with the log-centric view of big data processing? ‣ Can natively stochastic hardware provide compelling performance gains? When?
  • 38. Resources ‣ probabilistic-programming.org ‣ Probabilistic Programming and Bayesian Methods for Hackers ‣ Probabilistic Models of Cognition ‣ Mathematica Journal article ‣ Thomas Wiecki’s PyData talk on PyMC
  • 39. People To Watch Vikash Mansinghka (MIT) ! Noah Goodman (Stanford) ! David Wingate (Lyric Labs) ! Avi Pfeffer (CRA) Rob Zinkov (USC) ! Andrew Gordon (MSR) ! John Winn (MSR) ! Dan Roy (Cambridge)
  • 40. Languages and Systems ‣ PyMC ‣ infer.net ‣ STAN ‣ Figaro ! ‣ BLOG ‣ Church ‣ factor.ie ‣ BUGS / JAGS