SlideShare a Scribd company logo
Introduction to Data Science
Outline
Data, Big Data and Challenges
Data Science
Introduction
Why Data Science
Data Scientists
What do they do?
Major/Concentration in Data Science
What courses to take.
Data All Around
Lots of data is being collected
and warehoused
Web data, e-commerce
Financial transactions, bank/credit
transactions
Online trading and purchasing
Social Network
How Much Data Do We have?
Google processes 20 PB a day (2008)
Facebook has 60 TB of daily logs
eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
1000 genomes project: 200 TB
Cost of 1 TB of disk: $35
Time to read 1 TB disk: 3 hrs
(100 MB/s)
Big Data
Big Data is any data that is expensive to manage
and hard to extract value from
Volume
The size of the data
Velocity
The latency of data processing relative to the
growing demand for interactivity
Variety and Complexity
the diversity of sources, formats, quality, structures.
Big Data
Types of Data We Have
Relational Data
(Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
You can afford to scan the data once
What To Do With These Data?
Aggregation and Statistics
Data warehousing and OLAP
Indexing, Searching, and Querying
Keyword based search
Pattern matching (XML/RDF)
Knowledge discovery
Data Mining
Statistical Modeling
Big Data and Data Science
“… the sexy job in the next 10 years will be
statisticians,” Hal Varian, Google Chief Economist
The U.S. will need 140,000-190,000 predictive
analysts and 1.5 million managers/analysts by 2018.
McKinsey Global Institute’s June 2011
New Data Science institutes being created or
repurposed – NYU, Columbia, Washington, UCB,...
New degree programs, courses, boot-camps:
e.g., at Berkeley: Stats, I-School, CS, Astronomy…
One proposal (elsewhere) for an MS in “Big Data Science”
What is Data Science?
An area that manages, manipulates,
extracts, and interprets knowledge from
tremendous amount of data
Data science (DS) is a multidisciplinary
field of study with goal to address the
challenges in big data
Data science principles apply to all data –
big and small
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
What is Data Science?
Theories and techniques from many fields and
disciplines are used to investigate and analyze a
large amount of data to help decision makers in
many industries such as science, engineering,
economics, politics, finance, and education
Computer Science
Pattern recognition, visualization, data warehousing, High
performance computing, Databases, AI
Mathematics
Mathematical Modeling
Statistics
Statistical and Stochastic modeling, Probability.
Why is it sexy?
Gartner’s 2014 Hype Cycle
Data Science
Data Science
Real Life Examples
Companies learn your secrets, shopping
patterns, and preferences
For example, can we know if a woman is
pregnant, even if she doesn’t want us to
know? Target case study
Data Science and election (2008, 2012)
1 million people installed the Obama
Facebook app that gave access to info on
“friends”
Data Scientists
Data Scientist
The Sexiest Job of the 21st Century
They find stories, extract knowledge. They
are not reporters
Data Scientists
Data scientists are the key to realizing the
opportunities presented by big data. They
bring structure to it, find compelling
patterns in it, and advise executives on the
implications for products, processes, and
decisions
What do Data Scientists do?
National Security
Cyber Security
Business Analytics
Engineering
Healthcare
And more ….
Concentration in Data Science
Mathematics and Applied Mathematics
Applied Statistics/Data Analysis
Solid Programming Skills (R, Python, Julia, SQL)
Data Mining
Data Base Storage and Management
Machine Learning and discovery

More Related Content

PPTX
NumPy_ SciPy_ _ DatiiiikaFrames (2).pptx
smartashammari
 
PPTX
Introduction to Data Science
SarmiHarsha
 
PPTX
Introduction to Data Science 5-13.pptx
devakisharma1
 
PPTX
Introduction to Data Science\
Rajuyadav887963
 
PPTX
Introduction to Data Science
Rajuyadav887963
 
PPTX
Introduction to Data Science
Rajuyadav887963
 
PPTX
Introduction to Data Science 5-13.pptx
datapro2
 
PDF
Introduction to Data Science 5-13 (1).pdf
ssuser2d043c
 
NumPy_ SciPy_ _ DatiiiikaFrames (2).pptx
smartashammari
 
Introduction to Data Science
SarmiHarsha
 
Introduction to Data Science 5-13.pptx
devakisharma1
 
Introduction to Data Science\
Rajuyadav887963
 
Introduction to Data Science
Rajuyadav887963
 
Introduction to Data Science
Rajuyadav887963
 
Introduction to Data Science 5-13.pptx
datapro2
 
Introduction to Data Science 5-13 (1).pdf
ssuser2d043c
 

Similar to Introduction to Data Science 5-13.pptx (20)

PPTX
Introduction to Data Science 5-13.pptx
Sanmati Jain
 
PPTX
mkol.pptx
JubairKhan15
 
PPTX
hjol.pptx
JubairKhan15
 
PPTX
Introduction to Data Science 1113.pptx
mark828
 
PPTX
Introduction to Data Science 1114.pptx
mark828
 
PPTX
Introduction to Data Science 5-13.pptx
Nilesh Raj
 
PPTX
Introduction to Data Science 1115.pptx
mark828
 
PPTX
Introduction to Data Science 1116.pptx
mark828
 
PPTX
Introduction to Data Science 1117.pptx
mark828
 
PPTX
Introduction to Data Science 1118.pptx
mark828
 
PPTX
Introduction to Data Science - Overview and application
AyyappanGurusamySiva
 
PPTX
Introduction to Data Science 1119.pptx
mark828
 
PPTX
Introduction to Data Science 112.pptx
mark828
 
PPTX
Introduction to Data Science 1121.pptx
mark828
 
PPTX
Introduction to Data Science Presentation
SwarnaSLcse
 
PPTX
Introduction to Big Data and Data Science
Feyzi R. Bagirov
 
PPSX
Data science
Dr. Hemant Kumar Singh
 
PPTX
Datascience
Dr. Hemant Kumar Singh
 
PDF
00-01 DSnDA.pdf
SugumarSarDurai
 
PPTX
Real-time applications of Data Science.pptx
shalini s
 
Introduction to Data Science 5-13.pptx
Sanmati Jain
 
mkol.pptx
JubairKhan15
 
hjol.pptx
JubairKhan15
 
Introduction to Data Science 1113.pptx
mark828
 
Introduction to Data Science 1114.pptx
mark828
 
Introduction to Data Science 5-13.pptx
Nilesh Raj
 
Introduction to Data Science 1115.pptx
mark828
 
Introduction to Data Science 1116.pptx
mark828
 
Introduction to Data Science 1117.pptx
mark828
 
Introduction to Data Science 1118.pptx
mark828
 
Introduction to Data Science - Overview and application
AyyappanGurusamySiva
 
Introduction to Data Science 1119.pptx
mark828
 
Introduction to Data Science 112.pptx
mark828
 
Introduction to Data Science 1121.pptx
mark828
 
Introduction to Data Science Presentation
SwarnaSLcse
 
Introduction to Big Data and Data Science
Feyzi R. Bagirov
 
00-01 DSnDA.pdf
SugumarSarDurai
 
Real-time applications of Data Science.pptx
shalini s
 

More from Aravind Reddy (15)

PPTX
ChatGPT and AI and ha bkjjwnaskcfwnascfsacas
Aravind Reddy
 
PPTX
Patient’s Condition Classification Using Drug Reviews.pptx
Aravind Reddy
 
PDF
Natural Language Processing for development
Aravind Reddy
 
PPTX
Final ppt.pptx
Aravind Reddy
 
PPTX
Tech Jobs Green Jobs- Deck . 24-11.pptx
Aravind Reddy
 
PPT
Recommenders.ppt
Aravind Reddy
 
PPT
The Normal Distribution.ppt
Aravind Reddy
 
PPT
Princuiples of pimary data.ppt
Aravind Reddy
 
PPT
Culbert.ppt
Aravind Reddy
 
PPT
Types of Primary and Secondary Sources.ppt
Aravind Reddy
 
PPT
FILE MANAGEMENT1.ppt
Aravind Reddy
 
PPTX
adminsitarative data data-57511556 (1).pptx
Aravind Reddy
 
PPT
3 Missing data12256429.ppt
Aravind Reddy
 
DOCX
loan.docx
Aravind Reddy
 
PPT
Lec1cgu13updated.ppt
Aravind Reddy
 
ChatGPT and AI and ha bkjjwnaskcfwnascfsacas
Aravind Reddy
 
Patient’s Condition Classification Using Drug Reviews.pptx
Aravind Reddy
 
Natural Language Processing for development
Aravind Reddy
 
Final ppt.pptx
Aravind Reddy
 
Tech Jobs Green Jobs- Deck . 24-11.pptx
Aravind Reddy
 
Recommenders.ppt
Aravind Reddy
 
The Normal Distribution.ppt
Aravind Reddy
 
Princuiples of pimary data.ppt
Aravind Reddy
 
Culbert.ppt
Aravind Reddy
 
Types of Primary and Secondary Sources.ppt
Aravind Reddy
 
FILE MANAGEMENT1.ppt
Aravind Reddy
 
adminsitarative data data-57511556 (1).pptx
Aravind Reddy
 
3 Missing data12256429.ppt
Aravind Reddy
 
loan.docx
Aravind Reddy
 
Lec1cgu13updated.ppt
Aravind Reddy
 

Recently uploaded (20)

PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
DOCX
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PPTX
CDH. pptx
AneetaSharma15
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
SAROCES Action-Plan FOR ARAL PROGRAM IN DEPED
Levenmartlacuna1
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
CDH. pptx
AneetaSharma15
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 

Introduction to Data Science 5-13.pptx

  • 2. Outline Data, Big Data and Challenges Data Science Introduction Why Data Science Data Scientists What do they do? Major/Concentration in Data Science What courses to take.
  • 3. Data All Around Lots of data is being collected and warehoused Web data, e-commerce Financial transactions, bank/credit transactions Online trading and purchasing Social Network
  • 4. How Much Data Do We have? Google processes 20 PB a day (2008) Facebook has 60 TB of daily logs eBay has 6.5 PB of user data + 50 TB/day (5/2009) 1000 genomes project: 200 TB Cost of 1 TB of disk: $35 Time to read 1 TB disk: 3 hrs (100 MB/s)
  • 5. Big Data Big Data is any data that is expensive to manage and hard to extract value from Volume The size of the data Velocity The latency of data processing relative to the growing demand for interactivity Variety and Complexity the diversity of sources, formats, quality, structures.
  • 7. Types of Data We Have Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can afford to scan the data once
  • 8. What To Do With These Data? Aggregation and Statistics Data warehousing and OLAP Indexing, Searching, and Querying Keyword based search Pattern matching (XML/RDF) Knowledge discovery Data Mining Statistical Modeling
  • 9. Big Data and Data Science “… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief Economist The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. McKinsey Global Institute’s June 2011 New Data Science institutes being created or repurposed – NYU, Columbia, Washington, UCB,... New degree programs, courses, boot-camps: e.g., at Berkeley: Stats, I-School, CS, Astronomy… One proposal (elsewhere) for an MS in “Big Data Science”
  • 10. What is Data Science? An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data Data science principles apply to all data – big and small https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
  • 11. What is Data Science? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education Computer Science Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI Mathematics Mathematical Modeling Statistics Statistical and Stochastic modeling, Probability.
  • 12. Why is it sexy? Gartner’s 2014 Hype Cycle
  • 15. Real Life Examples Companies learn your secrets, shopping patterns, and preferences For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case study Data Science and election (2008, 2012) 1 million people installed the Obama Facebook app that gave access to info on “friends”
  • 16. Data Scientists Data Scientist The Sexiest Job of the 21st Century They find stories, extract knowledge. They are not reporters
  • 17. Data Scientists Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions
  • 18. What do Data Scientists do? National Security Cyber Security Business Analytics Engineering Healthcare And more ….
  • 19. Concentration in Data Science Mathematics and Applied Mathematics Applied Statistics/Data Analysis Solid Programming Skills (R, Python, Julia, SQL) Data Mining Data Base Storage and Management Machine Learning and discovery