SlideShare a Scribd company logo
Fast Machine Learning
with
by Fujio Turner
@FujioTurner
Current & Future Problems
Churn Prediction Truth and Veracity
Recommendations Online Advertisement
News Aggregation
Scalability
Content Discovery/Search
Intelligent Learning Machine Learning for Medicine
Source: Abhishek Shivkumar
LexisNexis is a provider of legal,
tax, regulatory, news, business
information, and analysis to
legal, corporate, government,
accounting and academic
markets.
LexisNexis has been in
business since 1977 with over
30,000 employees worldwide. 
What is HPCC Systems?Who is ?
LexisNexis Risk is the division
of the LexisNexis which focuses
on data, Big Data processing,
linking and vertical expertise
and supports HPCC Systems
as an open source project
under Apache 2.0 License.
http://hpccsystems.com/
Problems
Data from 10,000+
Different Source
Different Needs
for the Data
Different Levels
of Proficiency
Lots of Data
Different Needs
for the Data
Different Levels
of Proficiency
Alot of Data
Normalized / Denormalized
Structured / Unstructured
Data from 10,000+
Different Source
DEDUP, JOIN , INDEX ,
COUNT , REGEX, K-Means
BETWEEN, GROUP, CASE, Custom
1 Easy Language (ECL)
or
SQL , R , JAVA , Python , C++, SAS
Reliable Data Distribution & Processing
System that scales to exabytes+
Solutions
Machine Learning Built-in
Regression
Linear Regression
Classification
Naive Bayes
Perceptron
Decisions Trees
Logistic Regression
Clustering
K-Means
KD Trees
Agglomerative/Hierarchical
Association Analysis
AprioriN
EclatN
Rules
http://hpccsystems.com/ml
Michael Payne ,of Clemson University,
on high speed machine learning with
PB-BLAS in HPCC Systems.
http://youtu.be/s_HWlMwi6iI
“I’m sub-second
fast.”
“I can query all
or part of your
data.”
Thor Roxie
Single Threaded
Hard Disk
Index(optional)
Multi-Threaded
Hard Disk
Index(optional)
In-memory
SSD
Either/Both
Cluster Architecture
Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of
~/facebook_2013
Query is Completed in a Single Job
Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
SORT
GROUP
DEDUP
JOIN
MERGE
BETWEEN
LENGTH
REGEX
ROUND
SUM
COUNT
TRIM
WHEN
AVE
CASE
NORMALIZE
DENORMALIZE
K-MEANS
more ….
+
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install
HPCC Systems
in 5 Minutes
Download HPCC Systems
Open Source
Community Edition
or
Source Code
https://github.com/hpcc-systems
http://hpccsystems.com/download/
+
Common Big Data Setup
What is Couchbase ?
Open Source
Memcached Built-In
What is Couchbase ?
Open Source
Memcached Built-In w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Flexible Schema (JSON)
w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Key/Value & Distributed
Flexible Schema (JSON)
Cross Data Center Replication
w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Flexible Schema (JSON)
SQL++ (N1QL)
w/ Replicas
What is Couchbase ?
Key/Value & Distributed
Cross Data Center Replication
Open Source
+
Sub-Millisecond
SQL++(N1QL)
JSON
Distributed & Reliable
Distributed & Reliable
1 Language
Flexible Data Types
Ready for the Future
XDCR
Couchbase Mobile
.
.
.
.
.
Embedded JSON NoSQL Database
.
.
.
.
.
+ Sync Data Online / Offline
Embedded JSON NoSQL Database
+ Sync & Channel Data Peer-To-Peer
+ Sync Data Peer-To-Peer (directly)
Couchbase Mobile
Couchbase Mobile + HPCC Systems
.
.
.
.
.
Process & Store Data to Scale
INSTALL in 5 Minutes
Download
Source Code
Learning More - Couchbase Server & Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA
San Francisco ,CA
https://www.youtube.com/
user/CouchbaseVideo

More Related Content

What's hot (20)

PPTX
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
PPTX
Polyglot metadata for Hadoop
Jim Dowling
 
PPTX
Hadoop
Jaydeep Patel
 
PPTX
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Itamar Haber
 
PDF
SQL for Elasticsearch
Jodok Batlogg
 
PPTX
Redis/Lessons learned
Tit Petric
 
PPTX
Practical Hadoop using Pig
David Wellman
 
PPTX
Redis 101 Data Structure
Ismaeel Enjreny
 
PPTX
GlobalLogic Webinar: Massive aggregations with Spark and Hadoop
GlobalLogic Ukraine
 
PDF
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
PDF
Beginner Apache Spark Presentation
Nidhin Pattaniyil
 
PPTX
Apache Spark and Object Stores
Steve Loughran
 
PPTX
January 2011 HUG: Howl Presentation
Yahoo Developer Network
 
PPTX
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
PDF
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Duyhai Doan
 
PDF
Interview questions on Apache spark [part 2]
knowbigdata
 
PDF
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
 
PDF
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
 
PDF
Apache SOLR in AEM 6
Yash Mody
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
Polyglot metadata for Hadoop
Jim Dowling
 
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Itamar Haber
 
SQL for Elasticsearch
Jodok Batlogg
 
Redis/Lessons learned
Tit Petric
 
Practical Hadoop using Pig
David Wellman
 
Redis 101 Data Structure
Ismaeel Enjreny
 
GlobalLogic Webinar: Massive aggregations with Spark and Hadoop
GlobalLogic Ukraine
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Beginner Apache Spark Presentation
Nidhin Pattaniyil
 
Apache Spark and Object Stores
Steve Loughran
 
January 2011 HUG: Howl Presentation
Yahoo Developer Network
 
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Duyhai Doan
 
Interview questions on Apache spark [part 2]
knowbigdata
 
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
 
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
 
Apache SOLR in AEM 6
Yash Mody
 

Similar to Big Data - Fast Machine Learning at Scale + Couchbase (20)

PPT
Adarsh grid
Adarsh Patil
 
PPT
Adarsh grid
Adarsh Patil
 
PPTX
Essential Data Engineering for Data Scientist
SoftServe
 
PPTX
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Dez Blanchfield
 
PDF
broadfield_vm_cv
Peter Broadfield
 
PPTX
Elastic search overview
ABC Talks
 
PDF
Elasticsearch quick Intro (English)
Federico Panini
 
PDF
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Rod Soto
 
PDF
Automated prevention of ransomware with machine learning and gpos
Priyanka Aash
 
PPT
Sem tech 2011 v8
dallemang
 
PPTX
Fosdem17 honeypot your database server
Georgi Kodinov
 
PDF
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
PDF
Sintelix Software is Fantastic For Text Mining Software
nonstopshopper249
 
PDF
TDC2016SP - Trilha BigData
tdc-globalcode
 
PDF
SQL In The Big Data Era
Rafael Felipe Nascimento de Aguiar
 
PDF
4AA4-1812ENW
Petar Dimitrov
 
PPTX
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
PDF
Taming Big Data with Big SQL 3.0
Nicolas Morales
 
PPT
Big Data Learnings from a Vendor's Perspective
Aerospike, Inc.
 
PDF
Modern apps with dcos
Sam Chen
 
Adarsh grid
Adarsh Patil
 
Adarsh grid
Adarsh Patil
 
Essential Data Engineering for Data Scientist
SoftServe
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Dez Blanchfield
 
broadfield_vm_cv
Peter Broadfield
 
Elastic search overview
ABC Talks
 
Elasticsearch quick Intro (English)
Federico Panini
 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Rod Soto
 
Automated prevention of ransomware with machine learning and gpos
Priyanka Aash
 
Sem tech 2011 v8
dallemang
 
Fosdem17 honeypot your database server
Georgi Kodinov
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
Sintelix Software is Fantastic For Text Mining Software
nonstopshopper249
 
TDC2016SP - Trilha BigData
tdc-globalcode
 
SQL In The Big Data Era
Rafael Felipe Nascimento de Aguiar
 
4AA4-1812ENW
Petar Dimitrov
 
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 
Taming Big Data with Big SQL 3.0
Nicolas Morales
 
Big Data Learnings from a Vendor's Perspective
Aerospike, Inc.
 
Modern apps with dcos
Sam Chen
 
Ad

Recently uploaded (20)

PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Designing Production-Ready AI Agents
Kunal Rai
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
July Patch Tuesday
Ivanti
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Designing Production-Ready AI Agents
Kunal Rai
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
July Patch Tuesday
Ivanti
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Ad

Big Data - Fast Machine Learning at Scale + Couchbase

  • 1. Fast Machine Learning with by Fujio Turner @FujioTurner
  • 2. Current & Future Problems Churn Prediction Truth and Veracity Recommendations Online Advertisement News Aggregation Scalability Content Discovery/Search Intelligent Learning Machine Learning for Medicine Source: Abhishek Shivkumar
  • 3. LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government, accounting and academic markets. LexisNexis has been in business since 1977 with over 30,000 employees worldwide.  What is HPCC Systems?Who is ? LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License. http://hpccsystems.com/
  • 4. Problems Data from 10,000+ Different Source Different Needs for the Data Different Levels of Proficiency Lots of Data
  • 5. Different Needs for the Data Different Levels of Proficiency Alot of Data Normalized / Denormalized Structured / Unstructured Data from 10,000+ Different Source DEDUP, JOIN , INDEX , COUNT , REGEX, K-Means BETWEEN, GROUP, CASE, Custom 1 Easy Language (ECL) or SQL , R , JAVA , Python , C++, SAS Reliable Data Distribution & Processing System that scales to exabytes+ Solutions
  • 6. Machine Learning Built-in Regression Linear Regression Classification Naive Bayes Perceptron Decisions Trees Logistic Regression Clustering K-Means KD Trees Agglomerative/Hierarchical Association Analysis AprioriN EclatN Rules http://hpccsystems.com/ml Michael Payne ,of Clemson University, on high speed machine learning with PB-BLAS in HPCC Systems. http://youtu.be/s_HWlMwi6iI
  • 7. “I’m sub-second fast.” “I can query all or part of your data.” Thor Roxie Single Threaded Hard Disk Index(optional) Multi-Threaded Hard Disk Index(optional) In-memory SSD Either/Both Cluster Architecture
  • 8. Sort Count Group Classification (ROXIE) 0.27 seconds to (THOR) few hours Country = ‘US’ Join Index of ~/facebook_2013 Query is Completed in a Single Job Asynchronously ~/facebook_2013 Country = ‘US’ ~/twitter_2013 SORT GROUP DEDUP JOIN MERGE BETWEEN LENGTH REGEX ROUND SUM COUNT TRIM WHEN AVE CASE NORMALIZE DENORMALIZE K-MEANS more …. +
  • 9. http://www.youtube.com/watch?v=8SV43DCUqJg Watch how to install HPCC Systems in 5 Minutes Download HPCC Systems Open Source Community Edition or Source Code https://github.com/hpcc-systems http://hpccsystems.com/download/
  • 11. What is Couchbase ? Open Source
  • 12. Memcached Built-In What is Couchbase ? Open Source
  • 13. Memcached Built-In w/ Replicas What is Couchbase ? Open Source
  • 14. Memcached Built-In Flexible Schema (JSON) w/ Replicas What is Couchbase ? Open Source
  • 15. Memcached Built-In Key/Value & Distributed Flexible Schema (JSON) Cross Data Center Replication w/ Replicas What is Couchbase ? Open Source
  • 16. Memcached Built-In Flexible Schema (JSON) SQL++ (N1QL) w/ Replicas What is Couchbase ? Key/Value & Distributed Cross Data Center Replication Open Source
  • 17. + Sub-Millisecond SQL++(N1QL) JSON Distributed & Reliable Distributed & Reliable 1 Language Flexible Data Types Ready for the Future XDCR
  • 19. . . . . . + Sync Data Online / Offline Embedded JSON NoSQL Database + Sync & Channel Data Peer-To-Peer + Sync Data Peer-To-Peer (directly) Couchbase Mobile
  • 20. Couchbase Mobile + HPCC Systems . . . . . Process & Store Data to Scale
  • 21. INSTALL in 5 Minutes Download Source Code Learning More - Couchbase Server & Lite http://couchbase.com/download https://github.com/couchbase Mountain View, CA San Francisco ,CA https://www.youtube.com/ user/CouchbaseVideo