INTRODUCTION OF
DATA SCIENCE
.
❖ Data Science is about data gathering, analysis and decision making.
❖ Data Science is about finding patterns in data, through
analysis, and make future predictions.
❖ By using Data Science, companies are able to make:
⮚ Better decisions (should we choose A or B)
⮚ Predictive analysis (what will happen next?)
⮚ Pattern discoveries (find pattern, or maybe hidden information in the
data)
DATA SCIENCE
� Statistics
� Domain Expertise
� Data Engineering
� Visualization
� Machine Learning
� Advanced Computing
DATA SCIENCE COMPONENTS
❖ The essential component of Data Science is Statistics.
❖ It is a method to collect and analyze the numerical data in a large
amount to get useful and meaningful insights.
❖ There are two main categories of Statistics:
⮚ Descriptive Statistics
⮚ Inferential Statistics
STATISTICS
❖ Visualization means representing the data in visuals such as maps,
graphs, etc.
❖ So that people can understand it easily.
❖ It makes it easy to access a vast amount of data.
❖ The main goal of data visualization is to make it easier to identify
patterns, trends, and outliers in large data sets.
VISUALIZATION
❖ Machine Learning acts as a backbone for data science. It means
providing training to a machine in such a way that it acts as a
human brain.
❖ Various algorithms are used to solve the problems. With the help
of Machine Learning, it becomes easy to make predictions about
future data.
For example :
❖ Social media platform , Face book
MACHINE LEARNING
❖ Domain expertise means the specialized knowledge or skills of a
particular area.
❖ There are various areas in data science for which we need domain
experts.
❖ The lesser we know about the problem, the more difficult it will be to
solve it.
DOMAIN EXPERTISE
❖ Data Engineering involves acquiring, storing, retrieving, and
transforming the data.
❖ The key to understanding data engineering lies in the engineering
part.
❖ Data engineers design and build pipelines that transform and
transport data into a format, and it reaches the Data Scientists or
other end users in a highly usable state.
DATA ENGINEERING
❖ Advanced computing involves designing, writing, debugging, and
maintaining the source code of computer programs.
❖ Advanced computing capabilities are used to handle a growing
range of challenging science and engineering problems, many of
which are compute- and data-intensive.
ADVANCED COMPUTING
FACETS OF DATA
.
❖ Data science is focused on making sense of complex datasets and
in building predictive models from those data.
❖ As such, it encompasses a wide array of different activities, from the
upstream processes of acquiring, cleaning and integrating data to
downstream processes of analysis, modelling and prediction.
FACETS OF DATA
⮚ Structured
⮚ Unstructured
⮚ Natural Language
⮚ Machine-generated
⮚ Graph-based
⮚ Audio, video and images
⮚ Streaming
THE MAIN CATEGORIES OF
DATA
There are many facets of data science, including:
⮚ Identifying the structure of data.
⮚ Cleaning, filtering, reorganizing, augmenting, and aggregating data.
⮚ Visualizing data.
⮚ Data analysis, statistics, and modelling.
⮚ Machine Learning.
⮚ Assembling data processing pipelines to link these steps.
⮚ Leveraging high-end computational resources for large-scale
problems.
FACETS OF DATA
DATA SCIENCE
PROCESS
.
⮚ Step 1: Frame the problem.
⮚ Step 2: Collect the raw data needed for your problem.
⮚ Step 3: Process the data for analysis.
⮚ Step 4: Explore the data.
⮚ Step 5: Perform in-depth analysis.
⮚ Step 6: Communicate results of the analysis.
DATA SCIENCE PROCESS
❖ The first thing you have to do before you solve a problem is to
define exactly what it is.
❖ You need to be able to translate data questions into something
actionable.
You should ask questions like the following:
❖ Who are the customers?
❖ Why are they buying our product?
❖ How do we predict if a customer is going to buy our product?
Frame the problem
❖ Once you defined the problem, you need data to give you the
insights needed to turn the problem around with a solution.
❖ This part of the process involves thinking through what data you
need and finding ways to get that data, whether it’s querying internal
databases, or purchasing external datasets.
❖ You can export the CRM data in a CSV file for further analysis.
Collect the raw data needed
for your problem
❖ Now that you have all of the raw data, you’ll need to process it
before you can do any analysis.
❖ Oftentimes, data can be quite messy, especially if it hasn’t been
well-maintained.
❖ You see errors that will corrupt your analysis: values set to null
though they really are zero, duplicate values, and missing values.
❖ It’s up to you to go through and check your data to make sure you’ll
get accurate insights.
Process the data for analysis
❖ When your data is clean, you should start playing with it.
❖ The difficulty here isn’t coming up with ideas to test, it’s coming up
with ideas that are likely to turn into insights.
❖ You have to look at some of the most interesting patterns that can
help explain why sales are reduced for this group.
Explore the data
❖ This step of the process is where you going to have to apply your
statistical, mathematical and technological knowledge and leverage
all of the data science tools at your disposal to crunch the data and
find every insight.
❖ You can now combine all of those qualitative insights with data from
your quantitative analysis to craft a story that moves people to
action.
Perform in-depth analysis
❖ It’s important that the VP Sales understand why the insights you
uncovered are important.
❖ Ultimately, you’ve been called upon to create a solution throughout
the data science process.
❖ Proper communication will mean the difference
between action and inaction on your proposals.
❖ You start by explaining the reasons behind the underperformance of
the older demographic.
Communicate results of the
analysis
DATASCIENCE.pptx

DATASCIENCE.pptx

  • 1.
  • 2.
    ❖ Data Scienceis about data gathering, analysis and decision making. ❖ Data Science is about finding patterns in data, through analysis, and make future predictions. ❖ By using Data Science, companies are able to make: ⮚ Better decisions (should we choose A or B) ⮚ Predictive analysis (what will happen next?) ⮚ Pattern discoveries (find pattern, or maybe hidden information in the data) DATA SCIENCE
  • 3.
    � Statistics � DomainExpertise � Data Engineering � Visualization � Machine Learning � Advanced Computing DATA SCIENCE COMPONENTS
  • 4.
    ❖ The essentialcomponent of Data Science is Statistics. ❖ It is a method to collect and analyze the numerical data in a large amount to get useful and meaningful insights. ❖ There are two main categories of Statistics: ⮚ Descriptive Statistics ⮚ Inferential Statistics STATISTICS
  • 5.
    ❖ Visualization meansrepresenting the data in visuals such as maps, graphs, etc. ❖ So that people can understand it easily. ❖ It makes it easy to access a vast amount of data. ❖ The main goal of data visualization is to make it easier to identify patterns, trends, and outliers in large data sets. VISUALIZATION
  • 6.
    ❖ Machine Learningacts as a backbone for data science. It means providing training to a machine in such a way that it acts as a human brain. ❖ Various algorithms are used to solve the problems. With the help of Machine Learning, it becomes easy to make predictions about future data. For example : ❖ Social media platform , Face book MACHINE LEARNING
  • 7.
    ❖ Domain expertisemeans the specialized knowledge or skills of a particular area. ❖ There are various areas in data science for which we need domain experts. ❖ The lesser we know about the problem, the more difficult it will be to solve it. DOMAIN EXPERTISE
  • 8.
    ❖ Data Engineeringinvolves acquiring, storing, retrieving, and transforming the data. ❖ The key to understanding data engineering lies in the engineering part. ❖ Data engineers design and build pipelines that transform and transport data into a format, and it reaches the Data Scientists or other end users in a highly usable state. DATA ENGINEERING
  • 9.
    ❖ Advanced computinginvolves designing, writing, debugging, and maintaining the source code of computer programs. ❖ Advanced computing capabilities are used to handle a growing range of challenging science and engineering problems, many of which are compute- and data-intensive. ADVANCED COMPUTING
  • 10.
  • 11.
    ❖ Data scienceis focused on making sense of complex datasets and in building predictive models from those data. ❖ As such, it encompasses a wide array of different activities, from the upstream processes of acquiring, cleaning and integrating data to downstream processes of analysis, modelling and prediction. FACETS OF DATA
  • 12.
    ⮚ Structured ⮚ Unstructured ⮚Natural Language ⮚ Machine-generated ⮚ Graph-based ⮚ Audio, video and images ⮚ Streaming THE MAIN CATEGORIES OF DATA
  • 13.
    There are manyfacets of data science, including: ⮚ Identifying the structure of data. ⮚ Cleaning, filtering, reorganizing, augmenting, and aggregating data. ⮚ Visualizing data. ⮚ Data analysis, statistics, and modelling. ⮚ Machine Learning. ⮚ Assembling data processing pipelines to link these steps. ⮚ Leveraging high-end computational resources for large-scale problems. FACETS OF DATA
  • 14.
  • 15.
    ⮚ Step 1:Frame the problem. ⮚ Step 2: Collect the raw data needed for your problem. ⮚ Step 3: Process the data for analysis. ⮚ Step 4: Explore the data. ⮚ Step 5: Perform in-depth analysis. ⮚ Step 6: Communicate results of the analysis. DATA SCIENCE PROCESS
  • 16.
    ❖ The firstthing you have to do before you solve a problem is to define exactly what it is. ❖ You need to be able to translate data questions into something actionable. You should ask questions like the following: ❖ Who are the customers? ❖ Why are they buying our product? ❖ How do we predict if a customer is going to buy our product? Frame the problem
  • 17.
    ❖ Once youdefined the problem, you need data to give you the insights needed to turn the problem around with a solution. ❖ This part of the process involves thinking through what data you need and finding ways to get that data, whether it’s querying internal databases, or purchasing external datasets. ❖ You can export the CRM data in a CSV file for further analysis. Collect the raw data needed for your problem
  • 18.
    ❖ Now thatyou have all of the raw data, you’ll need to process it before you can do any analysis. ❖ Oftentimes, data can be quite messy, especially if it hasn’t been well-maintained. ❖ You see errors that will corrupt your analysis: values set to null though they really are zero, duplicate values, and missing values. ❖ It’s up to you to go through and check your data to make sure you’ll get accurate insights. Process the data for analysis
  • 19.
    ❖ When yourdata is clean, you should start playing with it. ❖ The difficulty here isn’t coming up with ideas to test, it’s coming up with ideas that are likely to turn into insights. ❖ You have to look at some of the most interesting patterns that can help explain why sales are reduced for this group. Explore the data
  • 20.
    ❖ This stepof the process is where you going to have to apply your statistical, mathematical and technological knowledge and leverage all of the data science tools at your disposal to crunch the data and find every insight. ❖ You can now combine all of those qualitative insights with data from your quantitative analysis to craft a story that moves people to action. Perform in-depth analysis
  • 21.
    ❖ It’s importantthat the VP Sales understand why the insights you uncovered are important. ❖ Ultimately, you’ve been called upon to create a solution throughout the data science process. ❖ Proper communication will mean the difference between action and inaction on your proposals. ❖ You start by explaining the reasons behind the underperformance of the older demographic. Communicate results of the analysis