SlideShare a Scribd company logo
6
Most read
9
Most read
10
Most read
Introduction to Data Science
Prepared by
S.L.Swarna AP/AI&DS
S.Santhiya AP/AI&DS
EXCEL ENGINEERING COLLEGE
Data All Around
• Data, Big Data and Challenges
• Data Science
– Introduction
– Why Data Science
• Data Scientists
– What do they do?
• Major/Concentration in Data Science
– What courses to take.
Data All Around
• Lots of data is being collected
and warehoused
– Web data, e-commerce
– Financial transactions, bank/credit transactions
– Online trading and purchasing
– Social Network
How Much Data Do We have?
• Google processes 20 PB a day (2008)
• Facebook has 60 TB of daily logs
• eBay has 6.5 PB of user data + 50 TB/day
(5/2009)
• 1000 genomes project: 200 TB
Types of Data We Have
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
• Social Network, Semantic Web (RDF), …
• Streaming Data
• You can afford to scan the data once
What is Data Science?
• Data Science is about data gathering, analysis and
decision-making.
• Data Science is about finding patterns in data,
through analysis, and make future predictions.
• By using Data Science, companies are able to
make:
• Better decisions (should we choose A or B)
• Predictive analysis (what will happen next?)
• Pattern discoveries (find pattern, or maybe
hidden information in the data)
Where is Data Science Needed?
Examples of where Data Science is needed:
• For route planning: To discover the best routes to
ship
• To foresee delays for flight/ship/train etc.
(through predictive analysis)
• To create promotional offers
• To find the best suited time to deliver goods
• To forecast the next years revenue for a company
• To analyze health benefit of training
• To predict who will win elections
How Does a Data Scientist Work?
• A Data Scientist requires expertise in several
backgrounds:
• Machine Learning
• Statistics
• Programming (Python or R)
• Mathematics
• Databases
• A Data Scientist must find patterns within the
data. Before he/she can find the patterns, he/she
must organize the data in a standard format.
Here is how a Data Scientist works:
• Ask the right questions - To understand the business
problem.
• Explore and collect data - From database, web logs,
customer feedback, etc.
• Extract the data - Transform the data to a standardized
format.
• Clean the data - Remove erroneous values from the data.
• Find and replace missing values - Check for missing values
and replace them with a suitable value (e.g. an average
value).
• Normalize data - Scale the values in a practical
range (e.g. 140 cm is smaller than 1,8 m.
However, the number 140 is larger than 1,8. - so
scaling is important).
• Analyze data, find patterns and make future
predictions.
• Represent the result - Present the result with
useful insights in a way the "company" can
understand.
•

More Related Content

Similar to Introduction to Data Science Presentation (20)

PPTX
Data science
DeekshaSrivas
 
PPTX
intro to data science Clustering and visualization of data science subfields ...
jybufgofasfbkpoovh
 
PPTX
Career_Jobs_in_Data_Science.pptx
HarpreetSharma14
 
PDF
Untitled document.pdf
MuhammadTahiriqbal13
 
PDF
00-01 DSnDA.pdf
SugumarSarDurai
 
PPTX
Welcome to CS310!
Dmitry Zinoviev
 
PPTX
Introduction to Data Science 5-13.pptx
devakisharma1
 
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
PDF
Introduction to Data Science.pdf
University of Sindh
 
PPTX
Data science training institute in hyderabad
VamsiNihal
 
PPTX
Best data science training in Hyderabad
KumarNaik21
 
PPTX
Data science training in hyd ppt (1)
SayyedYusufali
 
PPTX
data science training and placement
SaiprasadVella
 
PPTX
Best Selenium certification course
KumarNaik21
 
PPTX
Data science online training in hyderabad
VamsiNihal
 
PPTX
Data science online training in hyderabad
VamsiNihal
 
PPTX
online data science training
DIGITALSAI1
 
PPTX
Data science training in hyd ppt (1)
SayyedYusufali
 
PPTX
Data science training in Hyderabad
saitejavella
 
PPTX
data science online training in hyderabad
VamsiNihal
 
Data science
DeekshaSrivas
 
intro to data science Clustering and visualization of data science subfields ...
jybufgofasfbkpoovh
 
Career_Jobs_in_Data_Science.pptx
HarpreetSharma14
 
Untitled document.pdf
MuhammadTahiriqbal13
 
00-01 DSnDA.pdf
SugumarSarDurai
 
Welcome to CS310!
Dmitry Zinoviev
 
Introduction to Data Science 5-13.pptx
devakisharma1
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Introduction to Data Science.pdf
University of Sindh
 
Data science training institute in hyderabad
VamsiNihal
 
Best data science training in Hyderabad
KumarNaik21
 
Data science training in hyd ppt (1)
SayyedYusufali
 
data science training and placement
SaiprasadVella
 
Best Selenium certification course
KumarNaik21
 
Data science online training in hyderabad
VamsiNihal
 
Data science online training in hyderabad
VamsiNihal
 
online data science training
DIGITALSAI1
 
Data science training in hyd ppt (1)
SayyedYusufali
 
Data science training in Hyderabad
saitejavella
 
data science online training in hyderabad
VamsiNihal
 

Recently uploaded (20)

PPT
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
PDF
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
PDF
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
PPTX
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
PPTX
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
PDF
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
PDF
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
PPTX
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
PPT
Testing and final inspection of a solar PV system
MuhammadSanni2
 
PPTX
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
PPTX
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
PDF
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
Footbinding.pptmnmkjkjkknmnnjkkkkkkkkkkkkkk
mamadoundiaye42742
 
Water Industry Process Automation & Control Monthly July 2025
Water Industry Process Automation & Control
 
aAn_Introduction_to_Arcadia_20150115.pdf
henriqueltorres1
 
How Industrial Project Management Differs From Construction.pptx
jamespit799
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Biosensors, BioDevices, Biomediccal.pptx
AsimovRiyaz
 
Introduction to Internal Combustion Engines - Types, Working and Camparison.pptx
UtkarshPatil98
 
WD2(I)-RFQ-GW-1415_ Shifting and Filling of Sand in the Pond at the WD5 Area_...
ShahadathHossain23
 
Halide Perovskites’ Multifunctional Properties: Coordination Engineering, Coo...
TaameBerhe2
 
Water Resources Engineering (CVE 728)--Slide 4.pptx
mohammedado3
 
Testing and final inspection of a solar PV system
MuhammadSanni2
 
Numerical-Solutions-of-Ordinary-Differential-Equations.pptx
SAMUKTHAARM
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
methodology-driven-mbse-murphy-july-hsv-huntsville6680038572db67488e78ff00003...
henriqueltorres1
 
Final Major project a b c d e f g h i j k l m
bharathpsnab
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
3rd International Conference on Machine Learning and IoT (MLIoT 2025)
ClaraZara1
 
REINFORCEMENT LEARNING IN DECISION MAKING SEMINAR REPORT
anushaashraf20
 
Ad

Introduction to Data Science Presentation

  • 1. Introduction to Data Science Prepared by S.L.Swarna AP/AI&DS S.Santhiya AP/AI&DS EXCEL ENGINEERING COLLEGE
  • 2. Data All Around • Data, Big Data and Challenges • Data Science – Introduction – Why Data Science • Data Scientists – What do they do? • Major/Concentration in Data Science – What courses to take.
  • 3. Data All Around • Lots of data is being collected and warehoused – Web data, e-commerce – Financial transactions, bank/credit transactions – Online trading and purchasing – Social Network
  • 4. How Much Data Do We have? • Google processes 20 PB a day (2008) • Facebook has 60 TB of daily logs • eBay has 6.5 PB of user data + 50 TB/day (5/2009) • 1000 genomes project: 200 TB
  • 5. Types of Data We Have • Relational Data (Tables/Transaction/Legacy Data) • Text Data (Web) • Semi-structured Data (XML) • Graph Data • Social Network, Semantic Web (RDF), … • Streaming Data • You can afford to scan the data once
  • 6. What is Data Science? • Data Science is about data gathering, analysis and decision-making. • Data Science is about finding patterns in data, through analysis, and make future predictions. • By using Data Science, companies are able to make: • Better decisions (should we choose A or B) • Predictive analysis (what will happen next?) • Pattern discoveries (find pattern, or maybe hidden information in the data)
  • 7. Where is Data Science Needed? Examples of where Data Science is needed: • For route planning: To discover the best routes to ship • To foresee delays for flight/ship/train etc. (through predictive analysis) • To create promotional offers • To find the best suited time to deliver goods • To forecast the next years revenue for a company • To analyze health benefit of training • To predict who will win elections
  • 8. How Does a Data Scientist Work? • A Data Scientist requires expertise in several backgrounds: • Machine Learning • Statistics • Programming (Python or R) • Mathematics • Databases • A Data Scientist must find patterns within the data. Before he/she can find the patterns, he/she must organize the data in a standard format.
  • 9. Here is how a Data Scientist works: • Ask the right questions - To understand the business problem. • Explore and collect data - From database, web logs, customer feedback, etc. • Extract the data - Transform the data to a standardized format. • Clean the data - Remove erroneous values from the data. • Find and replace missing values - Check for missing values and replace them with a suitable value (e.g. an average value).
  • 10. • Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m. However, the number 140 is larger than 1,8. - so scaling is important). • Analyze data, find patterns and make future predictions. • Represent the result - Present the result with useful insights in a way the "company" can understand. •