SlideShare a Scribd company logo
Qubole
Click to Query your Big Data on the Cloud
A company like Facebook provides Data
infrastructure as a service (created by the founders
of Qubole)
- More than 30% of the company uses this infrastructure
every month

- Users range from developers, analysts, business analysts or
business users

- Manages over an Exabyte of data

- Has made the company more data driven and agile with
data use
-It took the founders a team of over 30 people to create
this infrastructure and currently the team managing this
infrastructure has more than 100 people
2
Operations
Analyst
Marketing Ops
Analyst
Data
Architect
Business
Users
Product
Support
Customer
Support
Developer
Sales Ops
Product
Managers
Data
Infrastructure
QUBOLE VISION DATA FOR ALL CLICK-T0-QUERY
3
~ 170+ PB of data processed
per month
10 – 3000 node clusters
on a daily basis
300,000 machines per month
20,000 jobs on a daily
basis
AGILITY TIME-TO-INSIGHT CLICK-T0-QUERY
CONFIDENTIAL. SUBJECT TO NDA PROVISIONS.
Industries and Use Cases
Media &
Advertising
Oil & Gas Retail Life Sciences Financial
Services
Security
Social
Networking &
Gaming
Targeted
Advertising
Seismic
Analysis
Image and
Video
Processing
Customer
Profile
Transaction
Analysis
Genome
Analysis
Monte Carlo
Simulations
Risk
Analysis
Fraud
Detection
Anti-virus
Image
Recognition
In-game
Metrics
Usage
Analysis
User
Demographics
Predefined
Reporting
Ad Hoc
Analytics
Statistical
Analytics
Predictive
Analytics
Machine
Learning
MapReduce Streaming
Workload Classifications
Match Your Processing Engines to Your Workload Parameters
SQL Data Pipeline MapReduce Spark NoSQL Store
AGILITY TIME-TO-INSIGHT CLICK-T0-QUERY
5
5
• 10-1000+ Nodes in <5min
• Flexible - different nodes for different loads
• Data For All - usable by many
• Low TCO - Only ON when needed
• Extensive planning required - Inflexible and Static.
• Not built for Cloud.
• Need Hadoop experts to install, maintain and use.
• High TCO - Always ON
Qubole UI via
Browser
SDK
ODBC
User Access
Qubole’s

AWS Account
Customer’s AWS Account
REST API

(HTTPS)
SSH
Ephemeral Hadoop Clusters,
Managed by Qubole
Slave
Master
Data Flow within
Customer’s AWS
(optional)
Other RDS,
Redshift
Ephemeral
Web Tier
Web Servers
Encrypted
Result Cache
Encrypted
HDFS
Slave
Encrypted
HDFS
RDS – Qubole
User, Account
Configurations
(Encrypted
credentials
Amazon S3
No HDFS Load
w/S3 Server Side
Encryption
Default Hive
Metastore
Encryption Options:
a) Qubole can encrypt the result cache
b) Qubole supports encryption of the ephemeral drives used for HDFS
c) Qubole supports S3 Server Side Encryption
(c)
(b)
(a)
(optional)
Custom
Hive
Metastore
SSH
BUILT FOR CLOUD PERFORMANCE COST-EFFICIENT
Ephemeral Clusters:
• Auto-Scaling - both up and down
• Spot Instances - data management and back-fill
• VMs deployed with awareness of time
Demo
7
Why Qubole?
8
“Qubole has enabled more users within Pinterest to get to the
data and has made the data platform lot more scalable and
stable”

Mohammad Shahangian - Lead, Data Science and Infrastructure
Moved to Qubole from Amazon EMR because
of stability and rapidly expanded big data usage by
giving access to data to users beyond developers.
Rapid expansion of big data beyond developers (240 users
out of 600 person company)
Use CasesUser and Query Growth
Rapid expansion in use cases ranging from ETL, search,
adhoc querying, product analytics etc.
Rock solid infrastructure sees 50% less failures as
compared to AWS Elastic Map/Reduce
Enterprise scale processing and data access
Why Qubole?
9
“We needed something that was reliable and easy to learn,
setup, use and put into production without the risk and high
expectations that comes with committing millions of dollars in
upfront investment. Qubole was that thing.”
Marc Rosen - Sr. Director, Data Analytics
Moved to Big data on the cloud (from internal Oracle
clusters) because getting to analysis was much
quicker than operating infrastructure themselves.
Used to answer client queries and power client
dashboards.
Use Cases# Commands Per Month
0
1250
2500
3750
5000
Aug-13
Sept-13
Oct-13
Nov-13
Dec-13
Jan-14
Feb-14
Number of queries
Segment audiences based on their behavior including
such topics as user pathway and multi-dimensional recency
analysis
Build customer profiles (both uni/multivariate) across
thousands of first party (i.e., client CRM files) and third
party (i.e., demographic) segments
Simplify attribution insights showing the effects of upper
funnel prospecting on lower funnel remarketing media
strategies

More Related Content

PDF
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
Amazon Web Services Korea
 
PPTX
AWS featuring Mechanical Turk for Financial Services_2014
Daniel Gray
 
PPTX
Big data on cloud infrastructure
PT Datacomm Diangraha
 
PDF
Redefine Triage by Learning the Golden Nuggets of APM From Noted "APM Best Pr...
CA Technologies
 
PPTX
TBuntel WebDU 2011 Preso
Tim Buntel
 
PPTX
OpsRamp Spring Release Webinar | May 2021
OpsRamp
 
PDF
Infographic POWER8
NOVIPRO
 
PDF
Equinix Big Data Platform and Cassandra - A view into the journey
Praveen Kumar
 
엔터프라이즈의 AI/ML 활용을 돕는 Paxata 지능형 데이터 전처리 플랫폼 (최문규 이사, PAXATA) :: AWS Techforum...
Amazon Web Services Korea
 
AWS featuring Mechanical Turk for Financial Services_2014
Daniel Gray
 
Big data on cloud infrastructure
PT Datacomm Diangraha
 
Redefine Triage by Learning the Golden Nuggets of APM From Noted "APM Best Pr...
CA Technologies
 
TBuntel WebDU 2011 Preso
Tim Buntel
 
OpsRamp Spring Release Webinar | May 2021
OpsRamp
 
Infographic POWER8
NOVIPRO
 
Equinix Big Data Platform and Cassandra - A view into the journey
Praveen Kumar
 

What's hot (18)

PDF
Industry trends.v0.1pptx
Arindam Banerji
 
PDF
Keith Prabhu - Big Data Cloud Computing
administrator_confidis
 
PDF
Auto AI : AI used to create AI applications
Karan Sachdeva
 
PPTX
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Tyler Wishnoff
 
PDF
Big Data and Analytics Innovation Summit
Martin Yan
 
PDF
A journey to faster, repeatable data commercialization
Institute of Contemporary Sciences
 
PDF
AWS Financial Services - Michael Needham
Synthesis Software
 
PPTX
IBM Cloud Pak for Data Improves Cataloging Technologies for Enterprise
Timothy Valihora
 
PDF
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summits
 
PPTX
Cloud Computing Basics III
RightScale
 
PDF
AI/ML is a Means to Digital Transformation, Not an End Itself
BESPIN GLOBAL
 
PPTX
Supercharging Self-Service API Integration with AI
SnapLogic
 
PDF
Democratize ai with google cloud
Henrik Hammer Eliassen
 
PPTX
AWS DC Summit - Data Led Migration
Sandy Carter
 
PPTX
SnapLogic Technology Open House – January 2018
SnapLogic
 
PPTX
Master the art of Data Science
InTTrust S.A.
 
PPTX
Big data use-cases for AWS
Madhumita Mantri
 
PDF
Jakarta keynote
Karan Sachdeva
 
Industry trends.v0.1pptx
Arindam Banerji
 
Keith Prabhu - Big Data Cloud Computing
administrator_confidis
 
Auto AI : AI used to create AI applications
Karan Sachdeva
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Tyler Wishnoff
 
Big Data and Analytics Innovation Summit
Martin Yan
 
A journey to faster, repeatable data commercialization
Institute of Contemporary Sciences
 
AWS Financial Services - Michael Needham
Synthesis Software
 
IBM Cloud Pak for Data Improves Cataloging Technologies for Enterprise
Timothy Valihora
 
AWS Summit Singapore 2019 | Accelerating Enterprise Cloud Transformation by M...
AWS Summits
 
Cloud Computing Basics III
RightScale
 
AI/ML is a Means to Digital Transformation, Not an End Itself
BESPIN GLOBAL
 
Supercharging Self-Service API Integration with AI
SnapLogic
 
Democratize ai with google cloud
Henrik Hammer Eliassen
 
AWS DC Summit - Data Led Migration
Sandy Carter
 
SnapLogic Technology Open House – January 2018
SnapLogic
 
Master the art of Data Science
InTTrust S.A.
 
Big data use-cases for AWS
Madhumita Mantri
 
Jakarta keynote
Karan Sachdeva
 
Ad

Viewers also liked (20)

PPTX
Azure stream analytics by Nico Jacobs
ITProceed
 
PDF
Creating a fortigate vpn network & security blog
Kamlesh Mishra Sr. Executive - IT Infra "IT infra Lead"
 
PPTX
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole
 
PPTX
Azure ARM’d and Ready
mscug
 
PPTX
Azure Document Db
Marco Parenzan
 
PDF
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
NoSQLmatters
 
PDF
Qubole hadoop-summit-2013-europe
Joydeep Sen Sarma
 
PPTX
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Qubole
 
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
PDF
RDO-Packstack Workshop
Thamrongtawal Hashim
 
PDF
5 Crucial Considerations for Big data adoption
Qubole
 
PPTX
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
 
PPTX
Atlanta Data Science Meetup | Qubole slides
Qubole
 
PPTX
Nw qubole overview_033015
Michael Mersch
 
PPTX
DataXu: Programmatic Premium Webinar - June 7, 2012
dataxu
 
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
PPTX
15 Years of Web Security: The Rebellious Teenage Years
Jeremiah Grossman
 
PPTX
Overview on Azure Machine Learning
James Serra
 
PPTX
Cortana Analytics Suite
James Serra
 
PPTX
Microsoft cloud big data strategy
James Serra
 
Azure stream analytics by Nico Jacobs
ITProceed
 
Creating a fortigate vpn network & security blog
Kamlesh Mishra Sr. Executive - IT Infra "IT infra Lead"
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole
 
Azure ARM’d and Ready
mscug
 
Azure Document Db
Marco Parenzan
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
NoSQLmatters
 
Qubole hadoop-summit-2013-europe
Joydeep Sen Sarma
 
Getting to 1.5M Ads/sec: How DataXu manages Big Data
Qubole
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
RDO-Packstack Workshop
Thamrongtawal Hashim
 
5 Crucial Considerations for Big data adoption
Qubole
 
Qubole @ AWS Meetup Bangalore - July 2015
Joydeep Sen Sarma
 
Atlanta Data Science Meetup | Qubole slides
Qubole
 
Nw qubole overview_033015
Michael Mersch
 
DataXu: Programmatic Premium Webinar - June 7, 2012
dataxu
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
15 Years of Web Security: The Rebellious Teenage Years
Jeremiah Grossman
 
Overview on Azure Machine Learning
James Serra
 
Cortana Analytics Suite
James Serra
 
Microsoft cloud big data strategy
James Serra
 
Ad

Similar to BIPD Tech Tuesday Presentation - Qubole (20)

PPTX
Building Confidence in Big Data - IBM Smarter Business 2013
IBM Sverige
 
PDF
How Analytics Optimize Migration to Amazon Web Services, Microsoft Azure and ...
Enterprise Management Associates
 
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
PPTX
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
PDF
The Cloud Imperative – What, Why, When and How
Inside Analysis
 
PDF
CL2015 - Datacenter and Cloud Strategy and Planning
Cisco
 
PPTX
There are 250 Database products, are you running the right one?
Aerospike, Inc.
 
PDF
Cloud Computing and CDO (April 29).pdf
Pablo Junco
 
PDF
Future of Power: Power Strategy and Offerings for Denmark - Steve Sibley
IBM Danmark
 
PDF
Thought leadership Oct2015 selfserve
Ron Krzoska
 
PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 
PDF
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Precisely
 
PDF
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Renee Yao
 
PDF
Ahluwalia ibm up con keynote (published)
sapenov
 
PPTX
Primend Pilvekonverents - Azure Infrastruktuur
Primend
 
DOCX
Final Report
San Kai Hong
 
PDF
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
Rio Info
 
PDF
Big data for product managers
AIPMM Administration
 
PDF
Machine Data Analytics
Nicolas Morales
 
PDF
Qubole on AWS - White paper
Vasu S
 
Building Confidence in Big Data - IBM Smarter Business 2013
IBM Sverige
 
How Analytics Optimize Migration to Amazon Web Services, Microsoft Azure and ...
Enterprise Management Associates
 
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
Top Trends in Building Data Lakes for Machine Learning and AI
Holden Ackerman
 
The Cloud Imperative – What, Why, When and How
Inside Analysis
 
CL2015 - Datacenter and Cloud Strategy and Planning
Cisco
 
There are 250 Database products, are you running the right one?
Aerospike, Inc.
 
Cloud Computing and CDO (April 29).pdf
Pablo Junco
 
Future of Power: Power Strategy and Offerings for Denmark - Steve Sibley
IBM Danmark
 
Thought leadership Oct2015 selfserve
Ron Krzoska
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Precisely
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Renee Yao
 
Ahluwalia ibm up con keynote (published)
sapenov
 
Primend Pilvekonverents - Azure Infrastruktuur
Primend
 
Final Report
San Kai Hong
 
Rio Info 2015 - Painel Oportunidades para o Brasil na era da Computação em Nu...
Rio Info
 
Big data for product managers
AIPMM Administration
 
Machine Data Analytics
Nicolas Morales
 
Qubole on AWS - White paper
Vasu S
 

More from Qubole (17)

PPTX
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
PDF
7 Big Data Challenges and How to Overcome Them
Qubole
 
PDF
State of Big Data Adoption
Qubole
 
PPTX
Big Data at Pinterest - Presented by Qubole
Qubole
 
PDF
5 Factors Impacting Your Big Data Project's Performance
Qubole
 
PPTX
Spark on Yarn
Qubole
 
PPTX
Atlanta MLConf
Qubole
 
PDF
Running Spark on Cloud
Qubole
 
PDF
Qubole State of the Big Data Industry
Qubole
 
PPTX
Big Data Platform at Pinterest
Qubole
 
PDF
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Qubole
 
PPTX
Optimizing Big Data to run in the Public Cloud
Qubole
 
PDF
Expert Big Data Tips
Qubole
 
PPTX
Big dataproposal
Qubole
 
PDF
Presto in the cloud
Qubole
 
PPTX
Basic Sentiment Analysis using Hive
Qubole
 
PDF
Effective Hive Queries
Qubole
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
7 Big Data Challenges and How to Overcome Them
Qubole
 
State of Big Data Adoption
Qubole
 
Big Data at Pinterest - Presented by Qubole
Qubole
 
5 Factors Impacting Your Big Data Project's Performance
Qubole
 
Spark on Yarn
Qubole
 
Atlanta MLConf
Qubole
 
Running Spark on Cloud
Qubole
 
Qubole State of the Big Data Industry
Qubole
 
Big Data Platform at Pinterest
Qubole
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Qubole
 
Optimizing Big Data to run in the Public Cloud
Qubole
 
Expert Big Data Tips
Qubole
 
Big dataproposal
Qubole
 
Presto in the cloud
Qubole
 
Basic Sentiment Analysis using Hive
Qubole
 
Effective Hive Queries
Qubole
 

Recently uploaded (20)

PPTX
GAMABA AWARDEES GINAW BILOG AND SALINTA MONON BY REYMART
purezagambala458
 
PPTX
PHILIPPINE LITERATURE DURING SPANISH ERA
AllizaJoyMendigoria
 
PPTX
2025-07-27 Abraham 09 (shared slides).pptx
Dale Wells
 
PPTX
Working-with-HTML-CSS-and-JavaScript.pptx
badalsenma5
 
PDF
Advanced-Web-Design-Crafting-the-Future-Web (1).pdf
vaghelavidhiba591
 
PDF
COSHH - Sri Ramachandar Bandi HSE in the Oil & Gas Industry (COSHH) Training ...
babufastdeals
 
PDF
Something I m waiting to tell you By Shravya Bhinder
patelprushti2007
 
PPTX
Iconic Destinations in India: Explore Heritage and Beauty
dhorashankar
 
PDF
50 Breathtaking WWII Colorized Photos Look Like They Were Taken Yesterday
Ivan Consiglio
 
PPTX
Describing the Organization's General Environment Identifying the Most Impact...
auntorkhastagirpujan
 
PPTX
Building a Strong and Ethical Digital Professional Identity
khalyaniramjan49
 
PPTX
AMFI - Investor Awareness Presentation.pptx
ssuser89d308
 
PPTX
milgram study as level psychology core study (social approach)
dinhminhthu1405
 
PDF
Mathematics Grade 11 Term 1 Week 1_2021.pdf
MalepyaneMokgatle
 
PPTX
Bob Stewart Journey to Rome 07 30 2025.pptx
FamilyWorshipCenterD
 
PPTX
“Mastering Digital Professionalism: Your Online Image Matters”
ramjankhalyani
 
PPTX
Introduction_to_Python_Presentation.pptx
vikashkumargaya5861
 
PPTX
THE school_exposure_presentation[1].pptx
sayanmondal3500
 
PPTX
Ocean_and_Freshwater_Awareness_Presentation.pptx
Suhaira9
 
PDF
Developing Accessible and Usable Security Heuristics
Daniela Napoli
 
GAMABA AWARDEES GINAW BILOG AND SALINTA MONON BY REYMART
purezagambala458
 
PHILIPPINE LITERATURE DURING SPANISH ERA
AllizaJoyMendigoria
 
2025-07-27 Abraham 09 (shared slides).pptx
Dale Wells
 
Working-with-HTML-CSS-and-JavaScript.pptx
badalsenma5
 
Advanced-Web-Design-Crafting-the-Future-Web (1).pdf
vaghelavidhiba591
 
COSHH - Sri Ramachandar Bandi HSE in the Oil & Gas Industry (COSHH) Training ...
babufastdeals
 
Something I m waiting to tell you By Shravya Bhinder
patelprushti2007
 
Iconic Destinations in India: Explore Heritage and Beauty
dhorashankar
 
50 Breathtaking WWII Colorized Photos Look Like They Were Taken Yesterday
Ivan Consiglio
 
Describing the Organization's General Environment Identifying the Most Impact...
auntorkhastagirpujan
 
Building a Strong and Ethical Digital Professional Identity
khalyaniramjan49
 
AMFI - Investor Awareness Presentation.pptx
ssuser89d308
 
milgram study as level psychology core study (social approach)
dinhminhthu1405
 
Mathematics Grade 11 Term 1 Week 1_2021.pdf
MalepyaneMokgatle
 
Bob Stewart Journey to Rome 07 30 2025.pptx
FamilyWorshipCenterD
 
“Mastering Digital Professionalism: Your Online Image Matters”
ramjankhalyani
 
Introduction_to_Python_Presentation.pptx
vikashkumargaya5861
 
THE school_exposure_presentation[1].pptx
sayanmondal3500
 
Ocean_and_Freshwater_Awareness_Presentation.pptx
Suhaira9
 
Developing Accessible and Usable Security Heuristics
Daniela Napoli
 

BIPD Tech Tuesday Presentation - Qubole

  • 1. Qubole Click to Query your Big Data on the Cloud
  • 2. A company like Facebook provides Data infrastructure as a service (created by the founders of Qubole) - More than 30% of the company uses this infrastructure every month
 - Users range from developers, analysts, business analysts or business users
 - Manages over an Exabyte of data
 - Has made the company more data driven and agile with data use -It took the founders a team of over 30 people to create this infrastructure and currently the team managing this infrastructure has more than 100 people 2 Operations Analyst Marketing Ops Analyst Data Architect Business Users Product Support Customer Support Developer Sales Ops Product Managers Data Infrastructure QUBOLE VISION DATA FOR ALL CLICK-T0-QUERY
  • 3. 3 ~ 170+ PB of data processed per month 10 – 3000 node clusters on a daily basis 300,000 machines per month 20,000 jobs on a daily basis AGILITY TIME-TO-INSIGHT CLICK-T0-QUERY
  • 4. CONFIDENTIAL. SUBJECT TO NDA PROVISIONS. Industries and Use Cases Media & Advertising Oil & Gas Retail Life Sciences Financial Services Security Social Networking & Gaming Targeted Advertising Seismic Analysis Image and Video Processing Customer Profile Transaction Analysis Genome Analysis Monte Carlo Simulations Risk Analysis Fraud Detection Anti-virus Image Recognition In-game Metrics Usage Analysis User Demographics Predefined Reporting Ad Hoc Analytics Statistical Analytics Predictive Analytics Machine Learning MapReduce Streaming Workload Classifications Match Your Processing Engines to Your Workload Parameters SQL Data Pipeline MapReduce Spark NoSQL Store
  • 5. AGILITY TIME-TO-INSIGHT CLICK-T0-QUERY 5 5 • 10-1000+ Nodes in <5min • Flexible - different nodes for different loads • Data For All - usable by many • Low TCO - Only ON when needed • Extensive planning required - Inflexible and Static. • Not built for Cloud. • Need Hadoop experts to install, maintain and use. • High TCO - Always ON
  • 6. Qubole UI via Browser SDK ODBC User Access Qubole’s
 AWS Account Customer’s AWS Account REST API
 (HTTPS) SSH Ephemeral Hadoop Clusters, Managed by Qubole Slave Master Data Flow within Customer’s AWS (optional) Other RDS, Redshift Ephemeral Web Tier Web Servers Encrypted Result Cache Encrypted HDFS Slave Encrypted HDFS RDS – Qubole User, Account Configurations (Encrypted credentials Amazon S3 No HDFS Load w/S3 Server Side Encryption Default Hive Metastore Encryption Options: a) Qubole can encrypt the result cache b) Qubole supports encryption of the ephemeral drives used for HDFS c) Qubole supports S3 Server Side Encryption (c) (b) (a) (optional) Custom Hive Metastore SSH BUILT FOR CLOUD PERFORMANCE COST-EFFICIENT Ephemeral Clusters: • Auto-Scaling - both up and down • Spot Instances - data management and back-fill • VMs deployed with awareness of time
  • 8. Why Qubole? 8 “Qubole has enabled more users within Pinterest to get to the data and has made the data platform lot more scalable and stable”
 Mohammad Shahangian - Lead, Data Science and Infrastructure Moved to Qubole from Amazon EMR because of stability and rapidly expanded big data usage by giving access to data to users beyond developers. Rapid expansion of big data beyond developers (240 users out of 600 person company) Use CasesUser and Query Growth Rapid expansion in use cases ranging from ETL, search, adhoc querying, product analytics etc. Rock solid infrastructure sees 50% less failures as compared to AWS Elastic Map/Reduce Enterprise scale processing and data access
  • 9. Why Qubole? 9 “We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high expectations that comes with committing millions of dollars in upfront investment. Qubole was that thing.” Marc Rosen - Sr. Director, Data Analytics Moved to Big data on the cloud (from internal Oracle clusters) because getting to analysis was much quicker than operating infrastructure themselves. Used to answer client queries and power client dashboards. Use Cases# Commands Per Month 0 1250 2500 3750 5000 Aug-13 Sept-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Number of queries Segment audiences based on their behavior including such topics as user pathway and multi-dimensional recency analysis Build customer profiles (both uni/multivariate) across thousands of first party (i.e., client CRM files) and third party (i.e., demographic) segments Simplify attribution insights showing the effects of upper funnel prospecting on lower funnel remarketing media strategies