SAME DATA.
BETTER RESULTS.
PAUL SALAZAR
PAUL@SKYTREE.NET!
1
SKYTREE’S FOCUS
"
PRODUCTION GRADE"
MACHINE LEARNING
Machine learning: the modern science of finding patterns and making predictions from data.!
aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
Machine Learning Use Cases!
Predict categories and classes!
Predict values and numbers!
Grouping and segmentation!
Detection and characterization!
Visualization and reduction!
Find similar items !
Classification !
Regression!
Clustering!
Density Estimation !
Dimension Reduction!
Multidimensional Querying!
Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest
Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine,
2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression
Recommendations Predictions
Outlier
Detection
What are the current options for ML for Big Data!
1.  Just use a subset of the data!!
–  e.g. just take the first 1,000 rows. Result to expect: Capture only
the broadest patterns. à Lower accuracy."
2.  Just use a simple ML method!!
–  e.g. use logistic regression instead of nonlinear SVM. Result to
expect: Entire types of patterns cannot be found. à Lower
accuracy."
3.  Just use simple parallelism/MapReduce!!
–  i.e. replace all the for-loops with parallel ones. Result to expect:
Only the simplest of ML methods (not O(N2)/O(N3)) can be
significantly sped up this way. à See #2."
4.  Just throw it in the cloud!!
–  i.e. somehow use the large compute power of the cloud. Result
to expect: The cost of sending it to the cloud is even greater than
the compute cost. à See #1.  See also #3."
Skytree’s Unique Differentiation:

Fundamental Technology Breakthrough!
Complexity of State-of-the-Art Machine Learning methods:!
1.  Querying: all-nearest-neighbors O(N2)!
2.  Density estimation: kernel density estimation O(N2), kernel conditional density est.
O(N3) !
3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 

classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), !
4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 

Gaussian process regression O(N3)!
5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 

maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical
models!
6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)!
7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 

2-sample testing O(Nn), n=2, 3, 4, …!
►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data!
Skytree has invented a way to reduce the complexity of above
methods from O(N2) and O(N3) to O(N) or O(N log N).
5
Performance!
Up to 10,000x !
speedups!
(on one CPU)!
6
How Does Skytree Do This?!
7
Deep knowledge of algorithms
Drawing from the latest from academia
Smart programming
Efficient ways to compute order N(2) and N(3)
Distributed systems
Take advantage of parallel computing speed
Team!
8
Martin Hack, CEO & Co-Founder

Sun, GreenBorder (Google)!
Alexander Gray, PhD, CTO & Co-Founder

Leading Light for Large-Scale, Fast Algorithms!
Paul Salazar, VP Sales

RedHat, Greenplum!
Leland Wilkinson, PhD, VP Data Visualization

Creator of SYSTAT (SPSS/IBM).!
Tim Marsland, PhD, VP Engineering

Sun Fellow, CTO Software, Apple, Oracle!
!
!
!
EXECUTIVE
TEAM!
BOARD OF
DIRECTORS!
Rick Lewis, USVP

Noah Doyle, Javelin Venture Partners!
David Toth, Founder and CEO NetRatings (Nielsen)!
Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’!
Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)!
Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)!
Prof. James Demmel, UC Berkeley: high-performance computing!
INVESTORS!
TECH!
ADVISORY!
BOARD!
USVP, Javelin Venture Partners, Scott McNealy, UPS
Product Overview!
9
Skytree Adviser
for Desktop
Data Science for Everyone
Skytree Server
for Enterprises
Enterprise Machine Learning
•  Predict Categories/Classes
•  Detect Anomalies
•  Find Trends
•  Predict Values/Numbers
•  Identify Patterns
•  Find Outliers
Advanced Analytics:
Thank you for learning about Skytree
Read more at www.skytree.net
!
•  We’re hiring: check out our careers page.!
•  Download Skytree Adviser for Free.!
•  Pick up a T-Shirt.!

More Related Content

PDF
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
PPTX
Beyond Kaggle: Solving Data Science Challenges at Scale
PDF
Data Science as Scale
PDF
Scalable Distributed Real-Time Clustering for Big Data Streams
ODP
Google's Dremel
PDF
If the Data Cannot Come To The Algorithm...
PPT
Dremel: Interactive Analysis of Web-Scale Datasets
PPTX
Session 09 learning relationships.pptx
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Beyond Kaggle: Solving Data Science Challenges at Scale
Data Science as Scale
Scalable Distributed Real-Time Clustering for Big Data Streams
Google's Dremel
If the Data Cannot Come To The Algorithm...
Dremel: Interactive Analysis of Web-Scale Datasets
Session 09 learning relationships.pptx

What's hot (18)

PDF
Data clustering using map reduce
PPTX
Deep learning with Tensorflow in R
PDF
Research Papers Recommender based on Digital Repositories Metadata
PPT
NBITSearch. Features.
PPT
Object multifunctional indexing with an open API
PPTX
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
PDF
Introduction to Data streaming - 05/12/2014
PDF
Distributed machine learning
PDF
PyTables
PDF
Large Data Analyze With PyTables
PDF
Data Wrangling and Visualization Using Python
PPTX
RasterFrames + STAC
PPTX
Slide 1
PPTX
Similar image search
PDF
Current clustering techniques
PPTX
Big dataanalyticsbeyondhadoop public_20_june_2013
PDF
Distributed Decision Tree Learning for Mining Big Data Streams
PPTX
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Data clustering using map reduce
Deep learning with Tensorflow in R
Research Papers Recommender based on Digital Repositories Metadata
NBITSearch. Features.
Object multifunctional indexing with an open API
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
Introduction to Data streaming - 05/12/2014
Distributed machine learning
PyTables
Large Data Analyze With PyTables
Data Wrangling and Visualization Using Python
RasterFrames + STAC
Slide 1
Similar image search
Current clustering techniques
Big dataanalyticsbeyondhadoop public_20_june_2013
Distributed Decision Tree Learning for Mining Big Data Streams
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Ad

Similar to Skytree big data london meetup - may 2013 (20)

PPTX
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
PPTX
Primer on major data mining algorithms
PDF
The Sky’s the Limit – The Rise of Machine Learnin
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Machine learning
PDF
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
DOCX
Performance analysis of machine learning algorithms on self localization system1
PPTX
ML howtodo.pptx. Get learning how to do a
PPTX
ML notes from janvi to study ml in easy way
PDF
Mahout and Distributed Machine Learning 101
PPTX
Machine learning - session 3
DOCX
Performance analysis of machine learning algorithms on self localization system1
PPTX
Apache mahout and R-mining complex dataobject
ODP
Online advertising and large scale model fitting
PDF
Machine Learning for Fraud Detection
PDF
Introduction To Machine Learning With Python A Guide For Data Scientists 1st ...
PPTX
Data mining
PDF
A few Challenges to Make Machine Learning Easy
PPTX
Deep learning from mashine learning AI..
PDF
Machine Learning - Supervised Learning
IMPLEMENTATION OF MACHINE LEARNING IN E-COMMERCE & BEYOND
Primer on major data mining algorithms
The Sky’s the Limit – The Rise of Machine Learnin
Introduction to Machine Learning with SciKit-Learn
Machine learning
Introduction to Machine Learning with Python ( PDFDrive.com ).pdf
Performance analysis of machine learning algorithms on self localization system1
ML howtodo.pptx. Get learning how to do a
ML notes from janvi to study ml in easy way
Mahout and Distributed Machine Learning 101
Machine learning - session 3
Performance analysis of machine learning algorithms on self localization system1
Apache mahout and R-mining complex dataobject
Online advertising and large scale model fitting
Machine Learning for Fraud Detection
Introduction To Machine Learning With Python A Guide For Data Scientists 1st ...
Data mining
A few Challenges to Make Machine Learning Easy
Deep learning from mashine learning AI..
Machine Learning - Supervised Learning
Ad

Recently uploaded (20)

PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
SaaS reusability assessment using machine learning techniques
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PPTX
Internet of Everything -Basic concepts details
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
SaaS reusability assessment using machine learning techniques
Training Program for knowledge in solar cell and solar industry
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
Module 1 Introduction to Web Programming .pptx
Introduction to MCP and A2A Protocols: Enabling Agent Communication
SGT Report The Beast Plan and Cyberphysical Systems of Control
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
giants, standing on the shoulders of - by Daniel Stenberg
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Electrocardiogram sequences data analytics and classification using unsupervi...
Internet of Everything -Basic concepts details
Rapid Prototyping: A lecture on prototyping techniques for interface design
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
EIS-Webinar-Regulated-Industries-2025-08.pdf
4 layer Arch & Reference Arch of IoT.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj

Skytree big data london meetup - may 2013

  • 2. SKYTREE’S FOCUS " PRODUCTION GRADE" MACHINE LEARNING Machine learning: the modern science of finding patterns and making predictions from data.! aka: multivariate statistics, data mining, pattern recognition, or advanced/predictive analytics.!
  • 3. Machine Learning Use Cases! Predict categories and classes! Predict values and numbers! Grouping and segmentation! Detection and characterization! Visualization and reduction! Find similar items ! Classification ! Regression! Clustering! Density Estimation ! Dimension Reduction! Multidimensional Querying! Example Skytree Algorithms: Random Decision Forests, Gradient Boosting Machines, Nearest Neighbor, Kernel Density Estimation, K-means, Linear Regression, Support Vector Machine, 2-point Correlation, Decision Tree, Singular Value Decomposition, Range Search, Logistic Regression Recommendations Predictions Outlier Detection
  • 4. What are the current options for ML for Big Data! 1.  Just use a subset of the data!! –  e.g. just take the first 1,000 rows. Result to expect: Capture only the broadest patterns. à Lower accuracy." 2.  Just use a simple ML method!! –  e.g. use logistic regression instead of nonlinear SVM. Result to expect: Entire types of patterns cannot be found. à Lower accuracy." 3.  Just use simple parallelism/MapReduce!! –  i.e. replace all the for-loops with parallel ones. Result to expect: Only the simplest of ML methods (not O(N2)/O(N3)) can be significantly sped up this way. à See #2." 4.  Just throw it in the cloud!! –  i.e. somehow use the large compute power of the cloud. Result to expect: The cost of sending it to the cloud is even greater than the compute cost. à See #1.  See also #3."
  • 5. Skytree’s Unique Differentiation:
 Fundamental Technology Breakthrough! Complexity of State-of-the-Art Machine Learning methods:! 1.  Querying: all-nearest-neighbors O(N2)! 2.  Density estimation: kernel density estimation O(N2), kernel conditional density est. O(N3) ! 3.  Classification: logistic regression, decision tree, neural nets, nearest-neighbor 
 classifier O(N2), kernel discriminant O(N2), support vector machine O(N3), ! 4.  Regression: linear regression, LASSO, kernel regression O(N2), regression tree, 
 Gaussian process regression O(N3)! 5.  Dimension reduction: PCA, non-negative matrix factorization, kernel PCA O(N3), 
 maximum variance unfolding O(N3); Gaussian graphical models, discrete graphical models! 6.  Clustering: k-means, mean-shift O(N2), hierarchical clustering O(N3)! 7.  Testing and matching: MST O(N3), bipartite cross-matching O(N3), n-point correlation 
 2-sample testing O(Nn), n=2, 3, 4, …! ►  Unfortunately O(N2), O(N3) are computationally prohibitive for big data! Skytree has invented a way to reduce the complexity of above methods from O(N2) and O(N3) to O(N) or O(N log N). 5
  • 6. Performance! Up to 10,000x ! speedups! (on one CPU)! 6
  • 7. How Does Skytree Do This?! 7 Deep knowledge of algorithms Drawing from the latest from academia Smart programming Efficient ways to compute order N(2) and N(3) Distributed systems Take advantage of parallel computing speed
  • 8. Team! 8 Martin Hack, CEO & Co-Founder
 Sun, GreenBorder (Google)! Alexander Gray, PhD, CTO & Co-Founder
 Leading Light for Large-Scale, Fast Algorithms! Paul Salazar, VP Sales
 RedHat, Greenplum! Leland Wilkinson, PhD, VP Data Visualization
 Creator of SYSTAT (SPSS/IBM).! Tim Marsland, PhD, VP Engineering
 Sun Fellow, CTO Software, Apple, Oracle! ! ! ! EXECUTIVE TEAM! BOARD OF DIRECTORS! Rick Lewis, USVP
 Noah Doyle, Javelin Venture Partners! David Toth, Founder and CEO NetRatings (Nielsen)! Prof. Michael Jordan, UC Berkeley: machine learning ‘godfather’! Prof. David Patterson, UC Berkeley: systems (inventor RISC, RAID)! Prof. Pat Hanrahan, Stanford: data visualization (Tableau, Pixar)! Prof. James Demmel, UC Berkeley: high-performance computing! INVESTORS! TECH! ADVISORY! BOARD! USVP, Javelin Venture Partners, Scott McNealy, UPS
  • 9. Product Overview! 9 Skytree Adviser for Desktop Data Science for Everyone Skytree Server for Enterprises Enterprise Machine Learning •  Predict Categories/Classes •  Detect Anomalies •  Find Trends •  Predict Values/Numbers •  Identify Patterns •  Find Outliers Advanced Analytics:
  • 10. Thank you for learning about Skytree Read more at www.skytree.net ! •  We’re hiring: check out our careers page.! •  Download Skytree Adviser for Free.! •  Pick up a T-Shirt.!