SlideShare a Scribd company logo
Jianqiang (Jay) Wang
Dec 6, 2014
Intro to Data Science and
Candidate Projects for Bootcamp
About me
B.S. degree in Management Science; Ph.D. in Statistics;
Data scientist in twitter ads-ranking;
HP Labs : pricing & portfolio management, marketing;
USDA : yield forecasting with satellite & survey data;
Instructor at Colorado State University;
Innovations in the intersection of statistics, computer science
and business
Applications in online advertising and e-commerce.
Ads on twitter platform
Ads serving pipeline
Statistical Demand Modeling
@HP
Introduce yourself
Name
Where are you from
Background
Current position
Expectation from the bootcamp (e-form)
What is data science?
Is the traffic on 101-N heavier on Wednesdays? Why?
Why is swipe to dismiss decreasing ad engagements?
Analytical : think like a data scientist
Finding pattern in data
Tease out signals from noise
Educating engineers about variation (e.g. conversion)
Delineate the effects of various factors
Hypothesize root causes and figure out contribution of each
possibility (e.g., swipe to dismiss image viewer)
Prediction, forecasting, optimization
Building data products
Analytical : think like a data scientist
70% data munging + EDA, 20% modeling, 10% viz &
presentation, reporting
Data munging
Data
Transactional, web clicks and logs, sensor data (satellite,
wearable device...), ...
Docs, emails, social feeds,..
What questions to ask about a data source?
Munging process:
Extracting from raw form,.
Filtering, selecting, transforming,.
Restructuring, aggregating, sinking,.
Techniques
SQL or similar, ETL tools in data warehouse, Hadoop
MapReduce, dim reduction, sampling, R (*apply, pylr)..
Techniques
Distribution & summary statistics: centrality, variation,
outliers
Scatterplot, side-by-side boxplot, histogram
PCA, multidimensional scaling, projection pursuit..
Toolset
Hadoop & equivalents: read terabytes of data and
aggregate
R, python, ruby, excel, …
Exploratory Data Analysis
42 heads out of 100 coin flips, does it indicate the
coin is unfair?
Is the traffic on 101-N heavier on Wednesdays?
Techniques
A/B testing
Time series analysis
Toolset : statistical packages like R
Teasing out signal from noise
Techniques
Regression
A/B testing
Contrast
Computer simulation
Toolset
Statistical packages like R
Experimentation framework : twitter ads
Estimate the effects of various
factors
Techniques
Classification
Prediction/forecasting
Recommendation/ranking
Optimization
Toolset
R, Python MLlib, weka (java), VW (C++)…
mahout, spark
Examples
Recommendations
Fluc food delivery: driver assignment, route opt.
Machine learning, optimization...
● Visualization of analytics data demand in US
https://carterlin.shinyapps.io/brilent/
● Topsy : social search, analytics and draw insights using
entirety of twitter data
● Placepicker : help couples decide where to live
o Commute times, rent or house prices, safety, school quality,
walkability
● Tools for interactive visualization : R shiny package, tableau,
D3.js, ruby/python,
Building Data Products
Healthcare Drug development
Patient monitoring
Electronic Medical Records
Utilities Smart grid optimization (generation,
transmission, distribution, demand)
Retail &
marketing
Customer loyalty and churn analysis
Targeted product and services offerings
Product sentiment analysis
Marketing campaign optimization
Financial
services
Fraud detection & prevention
Anti-money laundering
Telecom Customer churn mitigation
Geospatial analytics
Call data record (CDR) analysis
Analytics Use Cases by Industry
Crawl twitter data in R (or python)
user info
user tweets
user network
Search results
Text analytics and unsupervised learning, interactive
visualization
Organize twitter users into groups based on similarity of their tweets
Display search results on chosen topic (e.g. Iphone 6) with sentiment
analysis
Phase 1: data crawling and parsing, word cloud and frequency;
Phase 2: several similarity metrics; extract sentiment from
tweets;
Mine Twitter on a Topic
Anonymized bike trip data :
Trip start/end time
Trip start/end station
Rider type and member gender & birth year
Visualization and prediction :
Where are riders going? When are they going there? How far do they
ride?
Top stations? Interesting usage pattern?
Similar : Hubway bike trip history (metro-Boston)
Phase 1: exploratory data analysis, design doc of visualization;
Phase 2: EDA; iterate on design doc, simple examples using
interactive viz tools;
Chicago Divvy Bike Usage
Public dataset of startup ecosystem
Company: name, homepage, category, total funding;
Rounds: funding amount at each round (seed, A, B, ...);
Investments : investor info & raised amount at each round;
Acquisitions : acquisition and acquirer information.
Problems:
Interactive visualization : rounds of funding? What is total funding
distrn for each category? Distn for a location?
Predict : total funding amount with missing or whether a company is
acquired in k years (k =2, 5, ...), more?
Phase 1: exploratory data analysis, design doc of visualization,
and scope of prediction
Crunchbase Startup Data**
Predict monthly sales of consumer products following
initial advertising campaign
Monthly online sales for the first 12 months after the product
launches.
Product and campaign features.
EDA, statistical modeling, visualization
Phase 1: exploratory data analysis, 12-month sales curve or
time series
Phase 2: extract features from 12-month sales curve,
predict with off-the-shelf methods;
Predicting Consumer Product
Sales based on Features
How does your smartphone know what you are doing now?
Activity label : walking, walking up/down, sitting, standing, lying
Galaxy SII : Acceleration and angular velocity
Subject identifier, time & frequency domain variables
Supervised machine learning, feature engineering
Phase 1: exploratory data analysis, off-the-shelf ML
Phase 2: more off-the-shelf ML and performance comparison;
ensemble methods?
Human activity recognition using
smartphone data
Resources
tryr.codeschool.com
Coursera classes
Intro to statistics
R/Python programming
Machine learning
Intro to data science
Web intelligence and big data (DS)
Books
Statistical sleuth
Big data governance (quality, privacy, application in various verticals)
Data just right (DS)
the Startup of you
7 habits of highly effective people
glassdoor, careercup,...
Brilent Online Learning
http://54.67.61.190:8080/olat
Questions
We can not teach you passion and attitude, but we will
influence you with our passion and attitude.

More Related Content

PPTX
Introduction to data science and its application in online advertising
Jay (Jianqiang) Wang
 
PPTX
How to prepare for data science interviews
Jay (Jianqiang) Wang
 
PDF
The Practice of Data Driven Products in Kuaishou
Jay (Jianqiang) Wang
 
PPTX
Mixed Methods Research in the Age of Big Data: A Primer for UX Researchers
UXPA International
 
PPTX
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
PPTX
Predictive Analytics: Business Perspective & Use Cases
Cagri Sarigoz
 
PPTX
Machine Learning for Sales & Marketing
Piyush Saggi
 
PDF
Digital analytics: Wrap-up (Lecture 12)
Joni Salminen
 
Introduction to data science and its application in online advertising
Jay (Jianqiang) Wang
 
How to prepare for data science interviews
Jay (Jianqiang) Wang
 
The Practice of Data Driven Products in Kuaishou
Jay (Jianqiang) Wang
 
Mixed Methods Research in the Age of Big Data: A Primer for UX Researchers
UXPA International
 
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
Predictive Analytics: Business Perspective & Use Cases
Cagri Sarigoz
 
Machine Learning for Sales & Marketing
Piyush Saggi
 
Digital analytics: Wrap-up (Lecture 12)
Joni Salminen
 

What's hot (20)

PPTX
Sentiment Analysis: The Marketplace and Providers
Seth Grimes
 
PDF
Data Analytics in Azure Cloud
Microsoft Canada
 
PDF
1305 track 3 siegel
Rising Media, Inc.
 
PPTX
Is deep learning is a game changer for marketing analytics
BindhuBhargaviTalasi
 
PPTX
Predictive Analytics: An Executive Primer
Ryan Withop
 
PPTX
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
Media Needle
 
PPTX
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Professor Lili Saghafi
 
PDF
Digital analytics lecture1
Joni Salminen
 
PDF
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
PPT
The Future of Applied Marketing Research
Kelly Page
 
PDF
¿Como los modelos predictivos cambian los negocios?
Fabricio Quintanilla
 
PPTX
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Institute of Contemporary Sciences
 
PPTX
Predictive Marketing Analytics
Lori Fisher
 
PPTX
predictive analytics
Astha Jagetiya
 
PPTX
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
MLconf
 
PDF
DEMISTIFYING THE ROLES, POSITIONS AND (INTER)RELATIONSHIPS IN HOTEL ONLINE CH...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
PPTX
The Impact of Data Science on Finance
Roger Fried
 
PDF
Il ruolo chiave degli Advanced Analytics per la Supply Chain
ACTOR
 
PDF
ACTOR - "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
logisticaefficiente
 
PDF
Business Analytics Pitfalls
Parasuram Balasubramanian
 
Sentiment Analysis: The Marketplace and Providers
Seth Grimes
 
Data Analytics in Azure Cloud
Microsoft Canada
 
1305 track 3 siegel
Rising Media, Inc.
 
Is deep learning is a game changer for marketing analytics
BindhuBhargaviTalasi
 
Predictive Analytics: An Executive Primer
Ryan Withop
 
Predictive Analytics: How This Revolutionary Technology for Strategic Marketi...
Media Needle
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Professor Lili Saghafi
 
Digital analytics lecture1
Joni Salminen
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
The Future of Applied Marketing Research
Kelly Page
 
¿Como los modelos predictivos cambian los negocios?
Fabricio Quintanilla
 
Supporting B2Bsales forecasting by machine learning - Mirjana Klajic Borstnar
Institute of Contemporary Sciences
 
Predictive Marketing Analytics
Lori Fisher
 
predictive analytics
Astha Jagetiya
 
Beverly Wright, Executive Director, Business Analytics Center, Georgia Instit...
MLconf
 
DEMISTIFYING THE ROLES, POSITIONS AND (INTER)RELATIONSHIPS IN HOTEL ONLINE CH...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
The Impact of Data Science on Finance
Roger Fried
 
Il ruolo chiave degli Advanced Analytics per la Supply Chain
ACTOR
 
ACTOR - "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
logisticaefficiente
 
Business Analytics Pitfalls
Parasuram Balasubramanian
 
Ad

Viewers also liked (15)

PDF
Human activity recognition
Randhir Gupta
 
PDF
Introduction to Data Science with Hadoop
Dr. Volkan OBAN
 
PPTX
Introduction to Data Science: A Practical Approach to Big Data Analytics
Ivan Khvostishkov
 
PDF
Introduction to data science intro,ch(1,2,3)
heba_ahmad
 
PPS
Big Data Science: Intro and Benefits
Chandan Rajah
 
PDF
Data Science
Prithwis Mukerjee
 
PPTX
Introduction to data science
Vignesh Prajapati
 
PDF
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
PPTX
Introduction to Data Science
Caserta
 
PPTX
Introduction to (Big) Data Science
InfoFarm
 
PDF
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
 
PDF
Introduction to Data Science
Anastasiia Kornilova
 
PPTX
Intro to Data Science Concepts
University of Washington
 
PDF
Introduction to Data Science and Large-scale Machine Learning
Nik Spirin
 
PPTX
Introduction to Data Science
LivePerson
 
Human activity recognition
Randhir Gupta
 
Introduction to Data Science with Hadoop
Dr. Volkan OBAN
 
Introduction to Data Science: A Practical Approach to Big Data Analytics
Ivan Khvostishkov
 
Introduction to data science intro,ch(1,2,3)
heba_ahmad
 
Big Data Science: Intro and Benefits
Chandan Rajah
 
Data Science
Prithwis Mukerjee
 
Introduction to data science
Vignesh Prajapati
 
Introduction to Data Science with H2O- Mountain View
Sri Ambati
 
Introduction to Data Science
Caserta
 
Introduction to (Big) Data Science
InfoFarm
 
Demystifying Data Science with an introduction to Machine Learning
Julian Bright
 
Introduction to Data Science
Anastasiia Kornilova
 
Intro to Data Science Concepts
University of Washington
 
Introduction to Data Science and Large-scale Machine Learning
Nik Spirin
 
Introduction to Data Science
LivePerson
 
Ad

Similar to Introduction to data science and candidate data science projects (20)

PPTX
Advanced Analytics
Joe Brandenburg
 
PDF
Building an accurate understanding of consumers based on real-world signals
TigerGraph
 
PDF
Digital Marketing Executive Qualification Summary - Marty Terbrack
Digital Marketing, Inc.
 
PPTX
Hedge Fund case study solution - Credit default swaps execution system and Gr...
Naveen Kumar
 
PPT
Marketing CRM Intelligence Open Source
Stratebi
 
PPTX
Notes on Machine Learning and Data-centric Startups
Jay (Jianqiang) Wang
 
PDF
Tha Vin Portfolio Volume 2
Tha Vin
 
PDF
From Digital Analytics to Insight
Pithan Rojanawong
 
PPTX
Beyond the Dashboard - Exploratory Analytics
MISNet - Integeo SE Asia
 
PDF
Karan_CV
Karan Dhapade
 
PPT
Designing Outcomes For Usability Nycupa Hurst Final
Marko Hurst
 
PDF
Random notes on big data
Jay Wu
 
PPT
Benchmarking Your Online Impact: From Stats to Reputation Management
NSI Partners, LLC
 
PDF
TargetSummit Berlin Meetup - Aso in 5 quick tips, Moritz Daan
TargetSummit
 
DOCX
Updated Resume.docx Secondary research,Business Analyst, data analytics,Data ...
NEERAJ✔ MATHUR🙏
 
PDF
Mashups Ensight offline event analytics
Mashups
 
PPTX
Narrative Mind Week 8 H4D Stanford 2016
Stanford University
 
PPT
Sc Mbd Portfolio 2009 Eng Mg
guestc1ef753
 
PDF
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Digital Insurer
 
PPTX
Four Ways to Leverage Mobile for Economic Development
Atlas Integrated
 
Advanced Analytics
Joe Brandenburg
 
Building an accurate understanding of consumers based on real-world signals
TigerGraph
 
Digital Marketing Executive Qualification Summary - Marty Terbrack
Digital Marketing, Inc.
 
Hedge Fund case study solution - Credit default swaps execution system and Gr...
Naveen Kumar
 
Marketing CRM Intelligence Open Source
Stratebi
 
Notes on Machine Learning and Data-centric Startups
Jay (Jianqiang) Wang
 
Tha Vin Portfolio Volume 2
Tha Vin
 
From Digital Analytics to Insight
Pithan Rojanawong
 
Beyond the Dashboard - Exploratory Analytics
MISNet - Integeo SE Asia
 
Karan_CV
Karan Dhapade
 
Designing Outcomes For Usability Nycupa Hurst Final
Marko Hurst
 
Random notes on big data
Jay Wu
 
Benchmarking Your Online Impact: From Stats to Reputation Management
NSI Partners, LLC
 
TargetSummit Berlin Meetup - Aso in 5 quick tips, Moritz Daan
TargetSummit
 
Updated Resume.docx Secondary research,Business Analyst, data analytics,Data ...
NEERAJ✔ MATHUR🙏
 
Mashups Ensight offline event analytics
Mashups
 
Narrative Mind Week 8 H4D Stanford 2016
Stanford University
 
Sc Mbd Portfolio 2009 Eng Mg
guestc1ef753
 
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Digital Insurer
 
Four Ways to Leverage Mobile for Economic Development
Atlas Integrated
 

More from Jay (Jianqiang) Wang (7)

PDF
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Jay (Jianqiang) Wang
 
PDF
Making data-informed decisions and building intelligent products (Chinese)
Jay (Jianqiang) Wang
 
PDF
Boosted multinomial logit model (working manuscript)
Jay (Jianqiang) Wang
 
PDF
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Jay (Jianqiang) Wang
 
PPT
Multivariate outlier detection
Jay (Jianqiang) Wang
 
PPT
Multivariate outlier detection
Jay (Jianqiang) Wang
 
PDF
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
Jay (Jianqiang) Wang
 
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Jay (Jianqiang) Wang
 
Making data-informed decisions and building intelligent products (Chinese)
Jay (Jianqiang) Wang
 
Boosted multinomial logit model (working manuscript)
Jay (Jianqiang) Wang
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Jay (Jianqiang) Wang
 
Multivariate outlier detection
Jay (Jianqiang) Wang
 
Multivariate outlier detection
Jay (Jianqiang) Wang
 
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
Jay (Jianqiang) Wang
 

Recently uploaded (20)

PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 

Introduction to data science and candidate data science projects

  • 1. Jianqiang (Jay) Wang Dec 6, 2014 Intro to Data Science and Candidate Projects for Bootcamp
  • 2. About me B.S. degree in Management Science; Ph.D. in Statistics; Data scientist in twitter ads-ranking; HP Labs : pricing & portfolio management, marketing; USDA : yield forecasting with satellite & survey data; Instructor at Colorado State University; Innovations in the intersection of statistics, computer science and business Applications in online advertising and e-commerce.
  • 3. Ads on twitter platform
  • 6. Introduce yourself Name Where are you from Background Current position Expectation from the bootcamp (e-form)
  • 7. What is data science? Is the traffic on 101-N heavier on Wednesdays? Why? Why is swipe to dismiss decreasing ad engagements? Analytical : think like a data scientist
  • 8. Finding pattern in data Tease out signals from noise Educating engineers about variation (e.g. conversion) Delineate the effects of various factors Hypothesize root causes and figure out contribution of each possibility (e.g., swipe to dismiss image viewer) Prediction, forecasting, optimization Building data products Analytical : think like a data scientist 70% data munging + EDA, 20% modeling, 10% viz & presentation, reporting
  • 9. Data munging Data Transactional, web clicks and logs, sensor data (satellite, wearable device...), ... Docs, emails, social feeds,.. What questions to ask about a data source? Munging process: Extracting from raw form,. Filtering, selecting, transforming,. Restructuring, aggregating, sinking,. Techniques SQL or similar, ETL tools in data warehouse, Hadoop MapReduce, dim reduction, sampling, R (*apply, pylr)..
  • 10. Techniques Distribution & summary statistics: centrality, variation, outliers Scatterplot, side-by-side boxplot, histogram PCA, multidimensional scaling, projection pursuit.. Toolset Hadoop & equivalents: read terabytes of data and aggregate R, python, ruby, excel, … Exploratory Data Analysis
  • 11. 42 heads out of 100 coin flips, does it indicate the coin is unfair? Is the traffic on 101-N heavier on Wednesdays? Techniques A/B testing Time series analysis Toolset : statistical packages like R Teasing out signal from noise
  • 12. Techniques Regression A/B testing Contrast Computer simulation Toolset Statistical packages like R Experimentation framework : twitter ads Estimate the effects of various factors
  • 13. Techniques Classification Prediction/forecasting Recommendation/ranking Optimization Toolset R, Python MLlib, weka (java), VW (C++)… mahout, spark Examples Recommendations Fluc food delivery: driver assignment, route opt. Machine learning, optimization...
  • 14. ● Visualization of analytics data demand in US https://carterlin.shinyapps.io/brilent/ ● Topsy : social search, analytics and draw insights using entirety of twitter data ● Placepicker : help couples decide where to live o Commute times, rent or house prices, safety, school quality, walkability ● Tools for interactive visualization : R shiny package, tableau, D3.js, ruby/python, Building Data Products
  • 15. Healthcare Drug development Patient monitoring Electronic Medical Records Utilities Smart grid optimization (generation, transmission, distribution, demand) Retail & marketing Customer loyalty and churn analysis Targeted product and services offerings Product sentiment analysis Marketing campaign optimization Financial services Fraud detection & prevention Anti-money laundering Telecom Customer churn mitigation Geospatial analytics Call data record (CDR) analysis Analytics Use Cases by Industry
  • 16. Crawl twitter data in R (or python) user info user tweets user network Search results Text analytics and unsupervised learning, interactive visualization Organize twitter users into groups based on similarity of their tweets Display search results on chosen topic (e.g. Iphone 6) with sentiment analysis Phase 1: data crawling and parsing, word cloud and frequency; Phase 2: several similarity metrics; extract sentiment from tweets; Mine Twitter on a Topic
  • 17. Anonymized bike trip data : Trip start/end time Trip start/end station Rider type and member gender & birth year Visualization and prediction : Where are riders going? When are they going there? How far do they ride? Top stations? Interesting usage pattern? Similar : Hubway bike trip history (metro-Boston) Phase 1: exploratory data analysis, design doc of visualization; Phase 2: EDA; iterate on design doc, simple examples using interactive viz tools; Chicago Divvy Bike Usage
  • 18. Public dataset of startup ecosystem Company: name, homepage, category, total funding; Rounds: funding amount at each round (seed, A, B, ...); Investments : investor info & raised amount at each round; Acquisitions : acquisition and acquirer information. Problems: Interactive visualization : rounds of funding? What is total funding distrn for each category? Distn for a location? Predict : total funding amount with missing or whether a company is acquired in k years (k =2, 5, ...), more? Phase 1: exploratory data analysis, design doc of visualization, and scope of prediction Crunchbase Startup Data**
  • 19. Predict monthly sales of consumer products following initial advertising campaign Monthly online sales for the first 12 months after the product launches. Product and campaign features. EDA, statistical modeling, visualization Phase 1: exploratory data analysis, 12-month sales curve or time series Phase 2: extract features from 12-month sales curve, predict with off-the-shelf methods; Predicting Consumer Product Sales based on Features
  • 20. How does your smartphone know what you are doing now? Activity label : walking, walking up/down, sitting, standing, lying Galaxy SII : Acceleration and angular velocity Subject identifier, time & frequency domain variables Supervised machine learning, feature engineering Phase 1: exploratory data analysis, off-the-shelf ML Phase 2: more off-the-shelf ML and performance comparison; ensemble methods? Human activity recognition using smartphone data
  • 21. Resources tryr.codeschool.com Coursera classes Intro to statistics R/Python programming Machine learning Intro to data science Web intelligence and big data (DS) Books Statistical sleuth Big data governance (quality, privacy, application in various verticals) Data just right (DS) the Startup of you 7 habits of highly effective people glassdoor, careercup,...
  • 23. Questions We can not teach you passion and attitude, but we will influence you with our passion and attitude.