www.edureka.co/data-science
Edureka’s Data Science Certification Training
What is Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Will You Learn Today?
What is Data Science
Need Of Data Science Use case of Data Science
Business Intelligence
vs. Data Science
Tools used in Data Science Lifecycle of Data Science
1 2 3
4 6
5
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Need Of Data Science
Revolution
of
Technology
Unstructured Data
Data Storage
Lack of scientific
insights
Data Science
Prediction
Decision making
Pattern discovery
Data Flow
Lack of predictive
analytics
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Need Of Data Science
THEN
NOW
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Need Of Data Science
You can use Data Science to
 Recommend the right product to the right customer to
enhance business.
 Predict the characteristics of high LTV customers and
helps in customer segmentation.
 Build intelligence and ability in machines.
 Predict fraudulent transactions beforehand.
 Perform sentiment analysis to predict the outcome
of elections.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
 Data Science is a blend of various tools,
algorithms, and machine learning principles with
the goal to discover hidden patterns from the raw
data.
 Data Science is primarily used to make decisions
and predictions.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Now, lets understand Data
Science with the help of some
use cases.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
 Basketball teams are using data for tracking team
strategies and outcome of matches.
 Below parameters will be used for model building.
• Average pass time of ball.
• Number of successful passes.
• Speed and accuracy of successful baskets.
• Area of court the player on average is
shadowing.
 Models built on the basis of data science algorithms
help in pattern discovery of player game.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
 Amazon has huge amount of consumer purchasing
data.
 The data consists of consumer demographics (age,
sex, location), purchasing history, past browsing
history.
 Based on this data, Amazon segments its
customers, draws a pattern and recommends the
right product to the right customer at the right
time.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
Google self driving car is a smart, driverless car.
 It collects data from environment through
sensors.
 Takes decisions like when to speed up, when to
speed down, when to overtake and when to turn.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Use Cases Of Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Skills Of Data Scientist
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Role Of A Data Scientist
The Data Scientist will be responsible for designing and creating processes and layouts for complex, large-
scale data sets used for modeling, data mining, and research purposes.
Responsibilities
 Selecting features, building and optimizing classifiers using machine learning techniques.
 Data mining using state-of-the-art methods.
 Extending company’s data with third party sources of information when needed.
 Processing, cleansing, and verifying the integrity of data for analysis.
 Building predictive models using Machine Learning algorithms.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
BI Vs. Data Science
Characteristics Business Intelligence Data Science
Perspective Looking Backward Looking Forward
Data Sources
Structured
(Usually SQL, often Data Warehouse)
Both Structured and Unstructured
( logs, cloud data, SQL, NoSQL, text)
Approach Statistics and Visualization
Statistics, Machine Learning, Graph
Analysis, Neuro- linguistic Programming
(NLP)
Focus Past and Present Present and Future
Tools Pentaho, Microsoft BI, QlikView, R RapidMiner, BigML, Weka, R
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Tools Used In Data Science
Data analysis Data warehousing Data visualization Machine learning
• R
• Spark
• Python
• SAS
• Hadoop
• SQL
• Hive
• R
• Tableau
• Raw
• Spark
• Mahout
• Azure ML studio
Commonly used tools by Data Scientists
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What if we could predict the
occurrence of diabetes and
take appropriate measures
beforehand to prevent it?
Definitely! Let me take you
through the steps to
predict the vulnerable
patients.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 Discovery involves acquiring data from all the identified internal and
external sources that can help answer the business question.
 This data could be
• logs from webservers
• social media data
• census datasets
• data streamed from online sources via APIs
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
Doctor gets this data from the medical
history of the patient.
Attributes:
npreg – Number of times pregnant
glucose – Plasma glucose concentration
bp – Blood pressure
skin – Triceps skinfold thickness
bmi – Body mass index
ped – Diabetes pedigree function
age – Age
income – Income
Income is an irrelevant attribute in the
prediction of diabetes
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 The data can have a lot of inconsistencies like missing values, blank
columns, abrupt values and incorrect data format which need to be
cleaned.
 It is required to explore, preprocess and condition data prior to modeling.
 This will help you to spot the outliers and establish a relationship between
the variables.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
This data has lot of anomalies and needs cleansing before further
analysis can be done.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
We clean and preprocess this data by removing the outliers, filling up the
null values and normalizing the data type.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 Here, we determine the methods and techniques to draw the relationships
between variable.
 Apply Exploratory Data Analytics (EDA) using various statistical formulas and
visualization tools.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
Use of visualization techniques like histograms, line graphs, box plots to get a fair
idea of the distribution of data.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 Develop datasets for training and testing purposes.
 Consider whether existing tools will suffice for running the models.
 Analyze various learning techniques like classification, association and
clustering to build the model.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
This is a decision tree based on different attributes.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
Deliver final reports, briefings, code and technical documents.
Implement pilot project in a real-time production environment.
Look for performance constraints if any.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Initialization
Model Planning
Model Building
Deployment
Communicate Results
 Identify all the key findings and communicate to the stakeholders.
 Explaining the model and result to medical authorities.
 Determine if the results of the project are a success or a failure based on the
criteria developed.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Initialization
Model Planning
Model Building
Deployment
Communicate Results
 Diabetes Positive set:
• glucose > 154
• glucose >127 & <= 154 + bmi >30.9
• glucose<=127 + pregnant >5
• glucose<=127 + pregnant <=5 + age >28
• glucose<=127 + pregnant <=5 + age <=28 +bmi > 30.9
 Diabetes Negative set:
• glucose > 154
• glucose >127 & <= 154 + bmi <=30.9
• glucose<=127 + pregnant <=5 + age <=28 +bmi <= 30.9
 We can use this decision tree result to know whether the patient is
vulnerable to diabetes or not.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Course Details
Go to www.edureka.co/data-science
Get Edureka Certified in Data Science Today!
What our learners have to say about us!
Shravan Reddy says- “I would like to recommend any one who
wants to be a Data Scientist just one place: Edureka. Explanations
are clean, clear, easy to understand. Their support team works
very well.. I took the Data Science course and I'm going to take
Machine Learning with Mahout and then Big Data and Hadoop”.
Gnana Sekhar says - “Edureka Data science course provided me a very
good mixture of theoretical and practical training. LMS pre recorded
sessions and assignments were very good as there is a lot of
information in them that will help me in my job. Edureka is my
teaching GURU now...Thanks EDUREKA.”
Balu Samaga says - “It was a great experience to undergo and get
certified in the Data Science course from Edureka. Quality of the
training materials, assignments, project, support and other
infrastructures are a top notch.”
www.edureka.co/data-science
Edureka’s Data Science Certification Training

More Related Content

PDF
Data Science Training | Data Science Tutorial | Data Science Certification | ...
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PDF
Introduction to Data Science
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PPTX
Introduction to data science club
PPTX
Data science & data scientist
PDF
Introduction To Data Science
PDF
Introduction on Data Science
Data Science Training | Data Science Tutorial | Data Science Certification | ...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Introduction to Data Science
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Introduction to data science club
Data science & data scientist
Introduction To Data Science
Introduction on Data Science

What's hot (20)

PDF
Data Science Introduction
PPTX
Data science
PDF
Introduction to data science
PPTX
Introduction to data science
PPTX
Introduction to data science.pptx
PDF
Data science
PPTX
Introduction of Data Science
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PPTX
Data Science
PPTX
Data Science Training | Data Science For Beginners | Data Science With Python...
PPTX
Data analytics
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PPTX
Introduction to Data Science
PDF
Introduction to Data Science
PPTX
Introduction to Data Analytics
PDF
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
PPTX
Data analytics
PDF
Introduction to Data Science and Analytics
PPT
Introduction to Data Mining
PPTX
Data Lake Overview
Data Science Introduction
Data science
Introduction to data science
Introduction to data science
Introduction to data science.pptx
Data science
Introduction of Data Science
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science
Data Science Training | Data Science For Beginners | Data Science With Python...
Data analytics
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Introduction to Data Science
Introduction to Data Science
Introduction to Data Analytics
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data analytics
Introduction to Data Science and Analytics
Introduction to Data Mining
Data Lake Overview
Ad

Viewers also liked (20)

PDF
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
PDF
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
PDF
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
PDF
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
PDF
Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
PDF
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
PDF
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
PDF
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
PDF
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
PDF
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
PDF
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
PDF
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
PDF
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
PDF
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
PDF
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
PDF
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
PDF
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
PDF
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
PDF
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Angular 4 Components | Angular 4 Tutorial For Beginners | Learn Angular 4 | E...
Android Studio Tutorial For Beginners -2 | Android Development Tutorial | And...
Angular 4 Tutorial For Beginners | Angular 4 Introduction | Angular 4 Trainin...
Docker Compose | Containerizing MEAN Stack Application | DevOps Tutorial | Ed...
Selenium Page Object Model Using Page Factory | Selenium Tutorial For Beginne...
Docker Swarm For High Availability | Docker Tutorial | DevOps Tutorial | Edureka
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
What Is DevOps? | Introduction To DevOps | DevOps Tools | DevOps Tutorial | D...
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Bitcoin Blockchain Explained | Understanding Bitcoin and Blockchain | Blockch...
Power BI Training | Getting Started with Power BI | Power BI Tutorial | Power...
Django Rest Framework | How to Create a RESTful API Using Django | Django Tut...
Introduction To TensorFlow | Deep Learning Using TensorFlow | TensorFlow Tuto...
Angular 4 Data Binding | Two Way Data Binding in Angular 4 | Angular 4 Tutori...
Cloud Computing Tutorial For Beginners | What is Cloud Computing | AWS Traini...
Azure Interview Questions And Answers | Azure Tutorial For Beginners | Azure ...
React Components Lifecycle | React Tutorial for Beginners | ReactJS Training ...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
ReactJS Tutorial For Beginners | ReactJS Redux Training For Beginners | React...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Ad

Similar to What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka (20)

PDF
Economics & Statistics Insights in Data Science by DataPerts Technologies
PDF
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
PDF
A Beginner’s Guide to An Incredible Technology Data Science.pdf
PDF
a-beginner-guide-to-an-incredible-technology-data-science.pdf
PDF
Making an impact with data science
PDF
iTrain Malaysia: Data Science by Tarun Sukhani
PPTX
Best Selenium certification course
PPTX
Data science online training in hyderabad
PPTX
Data science online training in hyderabad
PPTX
Data science training in hyd ppt (1)
PPTX
Data science training in Hyderabad
PPTX
Which institute is best for data science?
PPTX
Data science training institute in hyderabad
PPTX
Best data science training in Hyderabad
PPTX
Data science training in hyd ppt (1)
PPTX
data science training and placement
PPTX
online data science training
PPTX
data science online training in hyderabad
PPTX
Data science training Hyderabad
PDF
Data science training Hyderabad
Economics & Statistics Insights in Data Science by DataPerts Technologies
How to Become a Data Scientist | Data Scientist Skills | Data Science Trainin...
A Beginner’s Guide to An Incredible Technology Data Science.pdf
a-beginner-guide-to-an-incredible-technology-data-science.pdf
Making an impact with data science
iTrain Malaysia: Data Science by Tarun Sukhani
Best Selenium certification course
Data science online training in hyderabad
Data science online training in hyderabad
Data science training in hyd ppt (1)
Data science training in Hyderabad
Which institute is best for data science?
Data science training institute in hyderabad
Best data science training in Hyderabad
Data science training in hyd ppt (1)
data science training and placement
online data science training
data science online training in hyderabad
Data science training Hyderabad
Data science training Hyderabad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
PDF
Top 5 Trending Business Intelligence Tools | Edureka
PDF
Tableau Tutorial for Data Science | Edureka
PDF
Python Programming Tutorial | Edureka
PDF
Top 5 PMP Certifications | Edureka
PDF
Top Maven Interview Questions in 2020 | Edureka
PDF
Linux Mint Tutorial | Edureka
PDF
How to Deploy Java Web App in AWS| Edureka
PDF
Importance of Digital Marketing | Edureka
PDF
RPA in 2020 | Edureka
PDF
Email Notifications in Jenkins | Edureka
PDF
EA Algorithm in Machine Learning | Edureka
PDF
Cognitive AI Tutorial | Edureka
PDF
AWS Cloud Practitioner Tutorial | Edureka
PDF
Blue Prism Top Interview Questions | Edureka
PDF
Big Data on AWS Tutorial | Edureka
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PDF
Kubernetes Installation on Ubuntu | Edureka
PDF
Introduction to DevOps | Edureka
What to learn during the 21 days Lockdown | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Tableau Tutorial for Data Science | Edureka
Python Programming Tutorial | Edureka
Top 5 PMP Certifications | Edureka
Top Maven Interview Questions in 2020 | Edureka
Linux Mint Tutorial | Edureka
How to Deploy Java Web App in AWS| Edureka
Importance of Digital Marketing | Edureka
RPA in 2020 | Edureka
Email Notifications in Jenkins | Edureka
EA Algorithm in Machine Learning | Edureka
Cognitive AI Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Blue Prism Top Interview Questions | Edureka
Big Data on AWS Tutorial | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Kubernetes Installation on Ubuntu | Edureka
Introduction to DevOps | Edureka

Recently uploaded (20)

PPTX
lung disease detection using transfer learning approach.pptx
PPTX
DAA UNIT 1 for unit 1 time compixity PPT.pptx
PPT
2011 HCRP presentation-final.pptjrirrififfi
PDF
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PPTX
Basic Statistical Analysis for experimental data.pptx
PPTX
Power BI - Microsoft Power BI is an interactive data visualization software p...
PDF
General category merit rank list for neet pg
PDF
NU-MEP-Standards معايير تصميم جامعية .pdf
PPTX
cardiac failure and associated notes.pptx
PDF
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
PPTX
Overview_of_Computing_Presentation.pptxxx
PPTX
Fkrjrkrkekekekeekkekswkjdjdjddwkejje.pptx
PPT
Classification methods in data analytics.ppt
PPTX
1.Introduction to orthodonti hhhgghhcs.pptx
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PDF
American Journal of Multidisciplinary Research and Review
PPTX
PPT for Diseases (1)-2, types of diseases.pptx
PPTX
cyber row.pptx for cyber proffesionals and hackers
lung disease detection using transfer learning approach.pptx
DAA UNIT 1 for unit 1 time compixity PPT.pptx
2011 HCRP presentation-final.pptjrirrififfi
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
Nucleic-Acids_-Structure-Typ...-1.pdf 011
Basic Statistical Analysis for experimental data.pptx
Power BI - Microsoft Power BI is an interactive data visualization software p...
General category merit rank list for neet pg
NU-MEP-Standards معايير تصميم جامعية .pdf
cardiac failure and associated notes.pptx
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
Overview_of_Computing_Presentation.pptxxx
Fkrjrkrkekekekeekkekswkjdjdjddwkejje.pptx
Classification methods in data analytics.ppt
1.Introduction to orthodonti hhhgghhcs.pptx
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
American Journal of Multidisciplinary Research and Review
PPT for Diseases (1)-2, types of diseases.pptx
cyber row.pptx for cyber proffesionals and hackers

What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka

  • 1. www.edureka.co/data-science Edureka’s Data Science Certification Training What is Data Science
  • 2. www.edureka.co/data-science Edureka’s Data Science Certification Training What Will You Learn Today? What is Data Science Need Of Data Science Use case of Data Science Business Intelligence vs. Data Science Tools used in Data Science Lifecycle of Data Science 1 2 3 4 6 5
  • 3. www.edureka.co/data-science Edureka’s Data Science Certification Training Need Of Data Science Revolution of Technology Unstructured Data Data Storage Lack of scientific insights Data Science Prediction Decision making Pattern discovery Data Flow Lack of predictive analytics
  • 4. www.edureka.co/data-science Edureka’s Data Science Certification Training Need Of Data Science THEN NOW
  • 5. www.edureka.co/data-science Edureka’s Data Science Certification Training Need Of Data Science You can use Data Science to  Recommend the right product to the right customer to enhance business.  Predict the characteristics of high LTV customers and helps in customer segmentation.  Build intelligence and ability in machines.  Predict fraudulent transactions beforehand.  Perform sentiment analysis to predict the outcome of elections.
  • 6. www.edureka.co/data-science Edureka’s Data Science Certification Training What Is Data Science
  • 7. www.edureka.co/data-science Edureka’s Data Science Certification Training What Is Data Science?  Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data.  Data Science is primarily used to make decisions and predictions.
  • 8. www.edureka.co/data-science Edureka’s Data Science Certification Training Now, lets understand Data Science with the help of some use cases.
  • 9. www.edureka.co/data-science Edureka’s Data Science Certification Training What Is Data Science?  Basketball teams are using data for tracking team strategies and outcome of matches.  Below parameters will be used for model building. • Average pass time of ball. • Number of successful passes. • Speed and accuracy of successful baskets. • Area of court the player on average is shadowing.  Models built on the basis of data science algorithms help in pattern discovery of player game.
  • 10. www.edureka.co/data-science Edureka’s Data Science Certification Training What Is Data Science?  Amazon has huge amount of consumer purchasing data.  The data consists of consumer demographics (age, sex, location), purchasing history, past browsing history.  Based on this data, Amazon segments its customers, draws a pattern and recommends the right product to the right customer at the right time.
  • 11. www.edureka.co/data-science Edureka’s Data Science Certification Training What Is Data Science? Google self driving car is a smart, driverless car.  It collects data from environment through sensors.  Takes decisions like when to speed up, when to speed down, when to overtake and when to turn.
  • 12. www.edureka.co/data-science Edureka’s Data Science Certification Training Use Cases Of Data Science
  • 13. www.edureka.co/data-science Edureka’s Data Science Certification Training Skills Of Data Scientist
  • 14. www.edureka.co/data-science Edureka’s Data Science Certification Training Role Of A Data Scientist The Data Scientist will be responsible for designing and creating processes and layouts for complex, large- scale data sets used for modeling, data mining, and research purposes. Responsibilities  Selecting features, building and optimizing classifiers using machine learning techniques.  Data mining using state-of-the-art methods.  Extending company’s data with third party sources of information when needed.  Processing, cleansing, and verifying the integrity of data for analysis.  Building predictive models using Machine Learning algorithms.
  • 15. www.edureka.co/data-science Edureka’s Data Science Certification Training BI Vs. Data Science Characteristics Business Intelligence Data Science Perspective Looking Backward Looking Forward Data Sources Structured (Usually SQL, often Data Warehouse) Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text) Approach Statistics and Visualization Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP) Focus Past and Present Present and Future Tools Pentaho, Microsoft BI, QlikView, R RapidMiner, BigML, Weka, R
  • 16. www.edureka.co/data-science Edureka’s Data Science Certification Training Tools Used In Data Science Data analysis Data warehousing Data visualization Machine learning • R • Spark • Python • SAS • Hadoop • SQL • Hive • R • Tableau • Raw • Spark • Mahout • Azure ML studio Commonly used tools by Data Scientists
  • 17. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science
  • 18. www.edureka.co/data-science Edureka’s Data Science Certification Training What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it? Definitely! Let me take you through the steps to predict the vulnerable patients.
  • 19. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science
  • 20. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  Discovery involves acquiring data from all the identified internal and external sources that can help answer the business question.  This data could be • logs from webservers • social media data • census datasets • data streamed from online sources via APIs
  • 21. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results Doctor gets this data from the medical history of the patient. Attributes: npreg – Number of times pregnant glucose – Plasma glucose concentration bp – Blood pressure skin – Triceps skinfold thickness bmi – Body mass index ped – Diabetes pedigree function age – Age income – Income Income is an irrelevant attribute in the prediction of diabetes
  • 22. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  The data can have a lot of inconsistencies like missing values, blank columns, abrupt values and incorrect data format which need to be cleaned.  It is required to explore, preprocess and condition data prior to modeling.  This will help you to spot the outliers and establish a relationship between the variables.
  • 23. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results This data has lot of anomalies and needs cleansing before further analysis can be done.
  • 24. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results We clean and preprocess this data by removing the outliers, filling up the null values and normalizing the data type.
  • 25. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  Here, we determine the methods and techniques to draw the relationships between variable.  Apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools.
  • 26. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results Use of visualization techniques like histograms, line graphs, box plots to get a fair idea of the distribution of data.
  • 27. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  Develop datasets for training and testing purposes.  Consider whether existing tools will suffice for running the models.  Analyze various learning techniques like classification, association and clustering to build the model.
  • 28. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results This is a decision tree based on different attributes.
  • 29. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results Deliver final reports, briefings, code and technical documents. Implement pilot project in a real-time production environment. Look for performance constraints if any.
  • 30. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Initialization Model Planning Model Building Deployment Communicate Results  Identify all the key findings and communicate to the stakeholders.  Explaining the model and result to medical authorities.  Determine if the results of the project are a success or a failure based on the criteria developed.
  • 31. www.edureka.co/data-science Edureka’s Data Science Certification Training Lifecycle Of Data Science Discovery Initialization Model Planning Model Building Deployment Communicate Results  Diabetes Positive set: • glucose > 154 • glucose >127 & <= 154 + bmi >30.9 • glucose<=127 + pregnant >5 • glucose<=127 + pregnant <=5 + age >28 • glucose<=127 + pregnant <=5 + age <=28 +bmi > 30.9  Diabetes Negative set: • glucose > 154 • glucose >127 & <= 154 + bmi <=30.9 • glucose<=127 + pregnant <=5 + age <=28 +bmi <= 30.9  We can use this decision tree result to know whether the patient is vulnerable to diabetes or not.
  • 32. www.edureka.co/data-science Edureka’s Data Science Certification Training Course Details Go to www.edureka.co/data-science Get Edureka Certified in Data Science Today! What our learners have to say about us! Shravan Reddy says- “I would like to recommend any one who wants to be a Data Scientist just one place: Edureka. Explanations are clean, clear, easy to understand. Their support team works very well.. I took the Data Science course and I'm going to take Machine Learning with Mahout and then Big Data and Hadoop”. Gnana Sekhar says - “Edureka Data science course provided me a very good mixture of theoretical and practical training. LMS pre recorded sessions and assignments were very good as there is a lot of information in them that will help me in my job. Edureka is my teaching GURU now...Thanks EDUREKA.” Balu Samaga says - “It was a great experience to undergo and get certified in the Data Science course from Edureka. Quality of the training materials, assignments, project, support and other infrastructures are a top notch.”

Editor's Notes