SlideShare a Scribd company logo
Chapter-4
Data Visualization
By-Prof.Sangeeta Borde
Visualization:
• Definition: Graphical representation of data that can make information easy to analyze &
understand.
• Advantages:
1.Easier to analyze
2. Easier to detect trends, patterns, outliers
Exploratory Data Analysis (EDA)-
Exploratory Data Analysis (EDA) is a crucial initial step in data science projects.
It involves analyzing and visualising data to understand its key characteristics,
uncover patterns, and identify relationships between variables. It refers to
studying and exploring record sets to apprehend their predominant traits,
discover patterns, locate outliers, and identify relationships between variables.
EDA is normally carried out as a preliminary step before undertaking extra
formal statistical analyses or modelling.
Following Methods are involves in EDA
1. Univariate Visualization: Univariate Visualization statistics & summary for each
field in the raw dataset.
Univariate analysis focuses on a single variable to understand its
internal structure. It is primarily concerned with describing the data and
finding patterns existing in a single feature.
• Common techniques include:
• Histograms: Used to visualize the distribution of a variable.
• Box plots: Useful for detecting outliers and understanding the spread
and skewness of the data.
• Bar charts: Employed for categorical data to show the frequency of
each category.
• Summary statistics: Calculations like mean, median, mode, variance,
and standard deviation that describe the central tendency and
dispersion of the data.
Bivariate evaluation:
• Bivariate evaluation involves exploring the connection between variables. It enables find
associations, correlations, and dependencies between pairs of variables.
• Scatter Plots: These are one of the most common tools used in bivariate analysis. A
scatter plot helps visualize the relationship between two continuous variables.
• Correlation Coefficient: This statistical measure (often Pearson’s correlation coefficient
for linear relationships) quantifies the degree to which two variables are related.
• Cross-tabulation: Also known as contingency tables, cross-tabulation is used to analyze
the relationship between two categorical variables. It shows the frequency distribution of
categories of one variable in rows and the other in columns, which helps in understanding
the relationship between the two variables.
• Line Graphs: In the context of time series data, line graphs can be used to compare two
variables over time. This helps in identifying trends, cycles, or patterns that emerge in the
interaction of the variables over the specified period.
• Covariance: Covariance is a measure used to determine how much two random variables
change together.
Multivariate analysis
• Multivariate analysis examines the relationships between
two or more variables in the dataset. It aims to understand
how variables interact with one another, which is crucial for
most statistical modeling techniques. Techniques include:
• Pair plots: Visualize relationships across several variables
simultaneously to capture a comprehensive view of
potential interactions.
• Principal Component Analysis (PCA): A dimensionality
reduction technique used to reduce the dimensionality of
large datasets, while preserving as much variance as
possible.
Key aspects of EDA include:
• Distribution of Data: Examine the distribution of data points to understand their range, central
tendencies (mean, median), and dispersion (variance, standard deviation).
• Graphical Representations: Utilizing charts such as histograms, box plots, scatter plots, and
bar charts to visualize relationships within the data and distributions of variables.
• Outlier Detection: Identifying unusual values that deviate from other data points. Outliers can
influence statistical analyses and might indicate data entry errors or unique cases. Outliers may
occur due to several reasons such as measurement error, data entry error, sampling error etc.
• Correlation Analysis: Checking the relationships between variables to understand how they
might affect each other. This includes computing correlation coefficients and creating
correlation matrices.
• Handling Missing Values: Detecting and deciding how to address missing data points, whether
by imputation or removal, depending on their impact and the amount of missing data.
Key aspects of EDA include:
(Data visualization can help in…)
• Business Analysis made easy:
Decision making-such as sales prediction, product promotion, and
customer behavior.
Improve response Time- Quick glance
Greater simplicity:
Visual Encoding:
• Encoding in data visualization means translating into visual elements
on a chart or map.
• The attribute values signify important data characteristics such as
numerical, categorical, or ordinal data.
• The use of an appropriate visualization graph is a challenging task.
• Role of data visualization & its corresponding tool.
• 1. Distribution- Scatter Chart ,3D Area chart,Histogram
• 2.Relationship-Bubble chart,Scatter plot
• 3.Comparision-Bar ,Line,Column,Area
Visual Encoding:
4.Composition- Pie, Waterfall chart,stacked column chart
5.Location-Bubble Map
6.Connection-Matrix chart,word cloud,Tube map.
7.These are used to show accurate data in the dataset.
To represent data that involves three or more variables---
1.Shape 2.Size 3.Color 4.orientation
5.Texture 6. Length 7.Angles
Based on type of data.visualization tools will
be decided.
• The following software is used for data visualization:-
• 1.Tableau: Database integration, Email Integration, Dashboard creation.
• 2.Looker: Business Intelligent Platform.
• 3.Qlikview: Personalized data search, Role-based access
• 4.MS-EXCEL:
• 5.Domo:Dashboard creation
• 6.Power BI: Affordability web publishing
• 7.plotly: Image storage
Tools for Performing Exploratory Data Analysis
• 1. Python Libraries
• Pandas: Provides extensive functions for data manipulation and analysis, including data
structure handling and time series functionality.
• Matplotlib: A plotting library for creating static, interactive, and animated visualisations
in Python.
• Seaborn: Built on Matplotlib, it provides a high-level interface for drawing attractive and
informative statistical graphics.
• Plotly: An interactive graphing library for making interactive plots and offers more
sophisticated visualization capabilities.
• 2. R Packages
• ggplot2: it’s a powerful tool for making complex plots from data in a data frame.
• dplyr: A grammar of data manipulation, providing a consistent set of verbs that help you
solve the most common data manipulation challenges.
• tidyr: Helps to tidy your data. Tidying your data means storing it in a consistent form that
matches the semantics of the dataset with the way it is stored.
Libraries:
• Matplotlib is a data visualization library and 2-D plotting library of Python It was
initially released in 2003 and it is the most popular and widely-used plotting library in
the Python community. It comes with an interactive environment across multiple
platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the
Jupyter Notebook, web application servers, etc.
• Plotly is a free open-source graphing library that can be used to form data
visualizations. Plotly (plotly.py) is built on top of the Plotly JavaScript library
(plotly.js) and can be used to create web-based data visualizations that can be
displayed in Jupyter notebooks or web applications using Dash or saved as individual
HTML files. Plotly provides more than 40 unique chart types like scatter plots,
histograms, line charts, bar charts, pie charts, error bars, box plots, multiple axes,
sparklines, dendrograms, 3-D charts, etc.
Libraries:
• Seaborn is a Python data visualization library that is based on
Matplotlib and closely integrated with the NumPy and pandas data
structures. Seaborn has various dataset-oriented plotting functions that
operate on data frames and arrays that have whole datasets within
them. Then it internally performs the necessary statistical aggregation
and mapping functions to create informative plots that the user desires.
It is a high-level interface for creating beautiful and informative
statistical graphics that are integral to exploring and understanding
data. The Seaborn data graphics can include bar charts, pie charts,
histograms, scatterplots, error charts, etc. Seaborn also has various
tools for choosing colour palettes that can reveal patterns in the data.
GGplot
• Ggplot is a Python data visualization library that is based on the
implementation of ggplot2 which is created for the programming
language R. Ggplot can create data visualizations such as bar charts,
pie charts, histograms, scatterplots, error charts, etc. using high-level
API. It also allows you to add different types of data visualization
components or layers in a single visualization.
• geoplotlib: Most of the data visualization libraries don’t provide
much support for creating maps or using geographical data and that is
why geoplotlib is such an important Python library. It supports the
creation of geographical maps in particular with many different types
of maps available such as dot-density maps, choropleths, symbol
maps, etc.
Basic Data visualization Tools:
• 1.Histogram
• 2.Bar chart/Graphs
• 3.Line plot
• 4.Scatter plot
Histogram:
• A histogram is a visual depiction of a frequency distribution
table with continuous divisions that have been grouped. A
series of rectangles with foundations equal to the distances
between class bounds and areas proportionate to the
frequency in the associated classes make up the area
diagram.
Histogram:
import matplotlib.pyplot as plt
# create data
data = [32, 96, 45, 67, 76, 28, 79, 62, 43, 81, 70,
61, 95, 44, 60, 69, 71, 23, 69, 54, 76, 67,
82, 97, 26, 34, 18, 16, 59, 88, 29, 30, 66,
23, 65, 72, 20, 78, 49, 73, 62, 87, 37, 68,
81, 80, 77, 92, 81, 52, 43, 68, 71, 86]
# create histogram
plt.hist(data)
# display histogram
plt.show()
Import matplotlib.pyplot as plt
Arr1[ ]
for I in range(0,50)
Arr1(random. append(randint(0,100))
print(arr1)
plt.plot(Arr1,marker=‘o’)
Plt.show()
Bar Chart/Graph:
• A bar graph or bar chart can be defined as a graph or chart
representing explicit data in rectangular bars. In short, a bar graph is a
graph with either horizontal or vertical rectangular bars. A bar chart
with vertical bars is also called a column chart. The length of the bars
depends on the values because the bars are proportional to the values.
Components of Bar Chart:
• Chart Title: It denotes the name of the bar chart. In this, we can write what the chart is
representing.
• Grid Lines: The vertical and horizontal lines in gray color is called grid lines.
• Bars: A bar is corresponding to a value. It may be horizontal or vertical. The largest bar represents
the largest value.
• Axis Title: A bar graph has two titles one is vertical, and the other is horizontal. Both the axis is
related to each other. We can write the axis title for easy understanding. Suppose, the vertical axis
represents expenses. So, we can write Expenses (in rupees) on the vertical axis. The expenses may
be of different types, so we can write types of expenses on the horizontal axis.
• Labels: We can also categorize the horizontal axis title. For example, types of expenses can be
categorized into medical, transport, office, etc.
• Legends: A legend specifies what a bar is representing. It is also known as the key of a chart.
Consider the following graph; if we write 2019 in place of Series 1, it means the blue bars in the
graph represent the data of the year 2019.
• Scale: The scale represents the vertical values. It may include rupees, population, size, etc.
TYPES:
• Vertical
• Horizontal
• Stack
• 3D Bar
Line Plot,Scatter Plot,Area plot/chart
• Python's Matplotlib module is used for data visualization. A set of
methods called pyplot, a submodule of matplotlib, aids in creating
several charts. The relationship between two sets of data, X and Y, is
shown using line plots on a distinct axis.
Specialized Data Visualization Tools:
• BOX PLOT
• BUBBLE PLOT
• HEAT MAP
• VENN DIAGRAM

More Related Content

Similar to CH 4_TYBSC(CS)_Data Science_Visualisation (20)

PDF
DAVLectuer3 Exploratory data analysis .pdf
ZaheerAbbas82578
 
PPTX
data analytics and visualization CO4_18_Data Types for Plotting.pptx
JAVVAJI VENKATA RAO
 
PPTX
Introduction to Data Visualization, Importance and types
grsssyw24
 
PPTX
BDA_MO_1_S6_Basic_data_analytics_,reporting,_and_apply_basic_data.pptx
HarikaNimmaganti1
 
PPTX
Introduction of data science
TanujaSomvanshi1
 
PPTX
EDA.pptx
yovi pratama
 
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Ahmed Elmalla
 
PPTX
Data visualization
Baijayanti Chakraborty
 
PPTX
Day-2.pptx
Manimaran A
 
PPTX
EDA_Unit1_Charts_Code for your reference.pptx
MrsKavithaG
 
DOCX
UNIT-4.docx
scet315
 
PPTX
Data Exploration in Python.pptx
SIDDHARTH435426
 
PPTX
Data mining techniques unit 2
malathieswaran29
 
PPTX
EDA.pptx
Rahul Borate
 
PPTX
Exploratory Data Analysis week 4
Manzur Ashraf
 
PPTX
Exploratory Data Analysis
Umair Shafique
 
PPTX
UNIT_4_data visualization.pptx
BhagyasriPatel2
 
PPTX
Data Visualization1.pptx
qwtadhsaber
 
PDF
M4_DAR_part1. module part 4 analystics with r
LalithauLali
 
PDF
Exploratory Data Analysis in Spark
datamantra
 
DAVLectuer3 Exploratory data analysis .pdf
ZaheerAbbas82578
 
data analytics and visualization CO4_18_Data Types for Plotting.pptx
JAVVAJI VENKATA RAO
 
Introduction to Data Visualization, Importance and types
grsssyw24
 
BDA_MO_1_S6_Basic_data_analytics_,reporting,_and_apply_basic_data.pptx
HarikaNimmaganti1
 
Introduction of data science
TanujaSomvanshi1
 
EDA.pptx
yovi pratama
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Ahmed Elmalla
 
Data visualization
Baijayanti Chakraborty
 
Day-2.pptx
Manimaran A
 
EDA_Unit1_Charts_Code for your reference.pptx
MrsKavithaG
 
UNIT-4.docx
scet315
 
Data Exploration in Python.pptx
SIDDHARTH435426
 
Data mining techniques unit 2
malathieswaran29
 
EDA.pptx
Rahul Borate
 
Exploratory Data Analysis week 4
Manzur Ashraf
 
Exploratory Data Analysis
Umair Shafique
 
UNIT_4_data visualization.pptx
BhagyasriPatel2
 
Data Visualization1.pptx
qwtadhsaber
 
M4_DAR_part1. module part 4 analystics with r
LalithauLali
 
Exploratory Data Analysis in Spark
datamantra
 

More from sangeeta borde (11)

PDF
Ch.3 Data Science Data Preprocessing.pdf
sangeeta borde
 
PDF
Data Science_Chapter -2_Statical Data Analysis.pdf
sangeeta borde
 
PPTX
Ch1_Introduction to DATA SCIENCE_TYBSC(CS)_2024.pptx
sangeeta borde
 
PDF
Advance C Programming UNIT 4-FILE HANDLING IN C.pdf
sangeeta borde
 
PPTX
CH2_CYBER_SECURITY_FYMSC(DS)-MSC(CS)-MSC(IMCA).pptx
sangeeta borde
 
PPTX
FYBSC(CS)_UNIT-1_Pointers in C.pptx
sangeeta borde
 
PPTX
UNIT-5_Array in c_part1.pptx
sangeeta borde
 
PPTX
CH.4FUNCTIONS IN C (1).pptx
sangeeta borde
 
PDF
3. Test Scenarios & Test Cases with Excel Sheet Format (1).pdf
sangeeta borde
 
PPTX
2022-23TYBSC(CS)-Python Prog._Chapter-1.pptx
sangeeta borde
 
PPTX
2024-25 TYBSC(CS)-PYTHON_PROG_ControlStructure.pptx
sangeeta borde
 
Ch.3 Data Science Data Preprocessing.pdf
sangeeta borde
 
Data Science_Chapter -2_Statical Data Analysis.pdf
sangeeta borde
 
Ch1_Introduction to DATA SCIENCE_TYBSC(CS)_2024.pptx
sangeeta borde
 
Advance C Programming UNIT 4-FILE HANDLING IN C.pdf
sangeeta borde
 
CH2_CYBER_SECURITY_FYMSC(DS)-MSC(CS)-MSC(IMCA).pptx
sangeeta borde
 
FYBSC(CS)_UNIT-1_Pointers in C.pptx
sangeeta borde
 
UNIT-5_Array in c_part1.pptx
sangeeta borde
 
CH.4FUNCTIONS IN C (1).pptx
sangeeta borde
 
3. Test Scenarios & Test Cases with Excel Sheet Format (1).pdf
sangeeta borde
 
2022-23TYBSC(CS)-Python Prog._Chapter-1.pptx
sangeeta borde
 
2024-25 TYBSC(CS)-PYTHON_PROG_ControlStructure.pptx
sangeeta borde
 
Ad

Recently uploaded (20)

PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPSX
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
PPTX
How to Set Maximum Difference Odoo 18 POS
Celine George
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PDF
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
SSHS-2025-PKLP_Quarter-1-Dr.-Kerby-Alvarez.pdf
AishahSangcopan1
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PDF
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
HEALTH ASSESSMENT (Community Health Nursing) - GNM 1st Year
Priyanshu Anand
 
How to Set Maximum Difference Odoo 18 POS
Celine George
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - GLOBAL SUCCESS - CẢ NĂM - NĂM 2024 (VOCABULARY, ...
Nguyen Thanh Tu Collection
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
The dynastic history of the Chahmana.pdf
PrachiSontakke5
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
SSHS-2025-PKLP_Quarter-1-Dr.-Kerby-Alvarez.pdf
AishahSangcopan1
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
Ad

CH 4_TYBSC(CS)_Data Science_Visualisation

  • 2. Visualization: • Definition: Graphical representation of data that can make information easy to analyze & understand. • Advantages: 1.Easier to analyze 2. Easier to detect trends, patterns, outliers Exploratory Data Analysis (EDA)- Exploratory Data Analysis (EDA) is a crucial initial step in data science projects. It involves analyzing and visualising data to understand its key characteristics, uncover patterns, and identify relationships between variables. It refers to studying and exploring record sets to apprehend their predominant traits, discover patterns, locate outliers, and identify relationships between variables. EDA is normally carried out as a preliminary step before undertaking extra formal statistical analyses or modelling.
  • 3. Following Methods are involves in EDA 1. Univariate Visualization: Univariate Visualization statistics & summary for each field in the raw dataset. Univariate analysis focuses on a single variable to understand its internal structure. It is primarily concerned with describing the data and finding patterns existing in a single feature. • Common techniques include: • Histograms: Used to visualize the distribution of a variable. • Box plots: Useful for detecting outliers and understanding the spread and skewness of the data. • Bar charts: Employed for categorical data to show the frequency of each category. • Summary statistics: Calculations like mean, median, mode, variance, and standard deviation that describe the central tendency and dispersion of the data.
  • 4. Bivariate evaluation: • Bivariate evaluation involves exploring the connection between variables. It enables find associations, correlations, and dependencies between pairs of variables. • Scatter Plots: These are one of the most common tools used in bivariate analysis. A scatter plot helps visualize the relationship between two continuous variables. • Correlation Coefficient: This statistical measure (often Pearson’s correlation coefficient for linear relationships) quantifies the degree to which two variables are related. • Cross-tabulation: Also known as contingency tables, cross-tabulation is used to analyze the relationship between two categorical variables. It shows the frequency distribution of categories of one variable in rows and the other in columns, which helps in understanding the relationship between the two variables. • Line Graphs: In the context of time series data, line graphs can be used to compare two variables over time. This helps in identifying trends, cycles, or patterns that emerge in the interaction of the variables over the specified period. • Covariance: Covariance is a measure used to determine how much two random variables change together.
  • 5. Multivariate analysis • Multivariate analysis examines the relationships between two or more variables in the dataset. It aims to understand how variables interact with one another, which is crucial for most statistical modeling techniques. Techniques include: • Pair plots: Visualize relationships across several variables simultaneously to capture a comprehensive view of potential interactions. • Principal Component Analysis (PCA): A dimensionality reduction technique used to reduce the dimensionality of large datasets, while preserving as much variance as possible.
  • 6. Key aspects of EDA include: • Distribution of Data: Examine the distribution of data points to understand their range, central tendencies (mean, median), and dispersion (variance, standard deviation). • Graphical Representations: Utilizing charts such as histograms, box plots, scatter plots, and bar charts to visualize relationships within the data and distributions of variables. • Outlier Detection: Identifying unusual values that deviate from other data points. Outliers can influence statistical analyses and might indicate data entry errors or unique cases. Outliers may occur due to several reasons such as measurement error, data entry error, sampling error etc. • Correlation Analysis: Checking the relationships between variables to understand how they might affect each other. This includes computing correlation coefficients and creating correlation matrices. • Handling Missing Values: Detecting and deciding how to address missing data points, whether by imputation or removal, depending on their impact and the amount of missing data.
  • 7. Key aspects of EDA include: (Data visualization can help in…) • Business Analysis made easy: Decision making-such as sales prediction, product promotion, and customer behavior. Improve response Time- Quick glance Greater simplicity:
  • 8. Visual Encoding: • Encoding in data visualization means translating into visual elements on a chart or map. • The attribute values signify important data characteristics such as numerical, categorical, or ordinal data. • The use of an appropriate visualization graph is a challenging task. • Role of data visualization & its corresponding tool. • 1. Distribution- Scatter Chart ,3D Area chart,Histogram • 2.Relationship-Bubble chart,Scatter plot • 3.Comparision-Bar ,Line,Column,Area
  • 9. Visual Encoding: 4.Composition- Pie, Waterfall chart,stacked column chart 5.Location-Bubble Map 6.Connection-Matrix chart,word cloud,Tube map. 7.These are used to show accurate data in the dataset. To represent data that involves three or more variables--- 1.Shape 2.Size 3.Color 4.orientation 5.Texture 6. Length 7.Angles
  • 10. Based on type of data.visualization tools will be decided. • The following software is used for data visualization:- • 1.Tableau: Database integration, Email Integration, Dashboard creation. • 2.Looker: Business Intelligent Platform. • 3.Qlikview: Personalized data search, Role-based access • 4.MS-EXCEL: • 5.Domo:Dashboard creation • 6.Power BI: Affordability web publishing • 7.plotly: Image storage
  • 11. Tools for Performing Exploratory Data Analysis • 1. Python Libraries • Pandas: Provides extensive functions for data manipulation and analysis, including data structure handling and time series functionality. • Matplotlib: A plotting library for creating static, interactive, and animated visualisations in Python. • Seaborn: Built on Matplotlib, it provides a high-level interface for drawing attractive and informative statistical graphics. • Plotly: An interactive graphing library for making interactive plots and offers more sophisticated visualization capabilities. • 2. R Packages • ggplot2: it’s a powerful tool for making complex plots from data in a data frame. • dplyr: A grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges. • tidyr: Helps to tidy your data. Tidying your data means storing it in a consistent form that matches the semantics of the dataset with the way it is stored.
  • 12. Libraries: • Matplotlib is a data visualization library and 2-D plotting library of Python It was initially released in 2003 and it is the most popular and widely-used plotting library in the Python community. It comes with an interactive environment across multiple platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web application servers, etc. • Plotly is a free open-source graphing library that can be used to form data visualizations. Plotly (plotly.py) is built on top of the Plotly JavaScript library (plotly.js) and can be used to create web-based data visualizations that can be displayed in Jupyter notebooks or web applications using Dash or saved as individual HTML files. Plotly provides more than 40 unique chart types like scatter plots, histograms, line charts, bar charts, pie charts, error bars, box plots, multiple axes, sparklines, dendrograms, 3-D charts, etc.
  • 13. Libraries: • Seaborn is a Python data visualization library that is based on Matplotlib and closely integrated with the NumPy and pandas data structures. Seaborn has various dataset-oriented plotting functions that operate on data frames and arrays that have whole datasets within them. Then it internally performs the necessary statistical aggregation and mapping functions to create informative plots that the user desires. It is a high-level interface for creating beautiful and informative statistical graphics that are integral to exploring and understanding data. The Seaborn data graphics can include bar charts, pie charts, histograms, scatterplots, error charts, etc. Seaborn also has various tools for choosing colour palettes that can reveal patterns in the data.
  • 14. GGplot • Ggplot is a Python data visualization library that is based on the implementation of ggplot2 which is created for the programming language R. Ggplot can create data visualizations such as bar charts, pie charts, histograms, scatterplots, error charts, etc. using high-level API. It also allows you to add different types of data visualization components or layers in a single visualization. • geoplotlib: Most of the data visualization libraries don’t provide much support for creating maps or using geographical data and that is why geoplotlib is such an important Python library. It supports the creation of geographical maps in particular with many different types of maps available such as dot-density maps, choropleths, symbol maps, etc.
  • 15. Basic Data visualization Tools: • 1.Histogram • 2.Bar chart/Graphs • 3.Line plot • 4.Scatter plot
  • 16. Histogram: • A histogram is a visual depiction of a frequency distribution table with continuous divisions that have been grouped. A series of rectangles with foundations equal to the distances between class bounds and areas proportionate to the frequency in the associated classes make up the area diagram.
  • 17. Histogram: import matplotlib.pyplot as plt # create data data = [32, 96, 45, 67, 76, 28, 79, 62, 43, 81, 70, 61, 95, 44, 60, 69, 71, 23, 69, 54, 76, 67, 82, 97, 26, 34, 18, 16, 59, 88, 29, 30, 66, 23, 65, 72, 20, 78, 49, 73, 62, 87, 37, 68, 81, 80, 77, 92, 81, 52, 43, 68, 71, 86] # create histogram plt.hist(data) # display histogram plt.show()
  • 18. Import matplotlib.pyplot as plt Arr1[ ] for I in range(0,50) Arr1(random. append(randint(0,100)) print(arr1) plt.plot(Arr1,marker=‘o’) Plt.show()
  • 19. Bar Chart/Graph: • A bar graph or bar chart can be defined as a graph or chart representing explicit data in rectangular bars. In short, a bar graph is a graph with either horizontal or vertical rectangular bars. A bar chart with vertical bars is also called a column chart. The length of the bars depends on the values because the bars are proportional to the values.
  • 20. Components of Bar Chart: • Chart Title: It denotes the name of the bar chart. In this, we can write what the chart is representing. • Grid Lines: The vertical and horizontal lines in gray color is called grid lines. • Bars: A bar is corresponding to a value. It may be horizontal or vertical. The largest bar represents the largest value. • Axis Title: A bar graph has two titles one is vertical, and the other is horizontal. Both the axis is related to each other. We can write the axis title for easy understanding. Suppose, the vertical axis represents expenses. So, we can write Expenses (in rupees) on the vertical axis. The expenses may be of different types, so we can write types of expenses on the horizontal axis. • Labels: We can also categorize the horizontal axis title. For example, types of expenses can be categorized into medical, transport, office, etc. • Legends: A legend specifies what a bar is representing. It is also known as the key of a chart. Consider the following graph; if we write 2019 in place of Series 1, it means the blue bars in the graph represent the data of the year 2019. • Scale: The scale represents the vertical values. It may include rupees, population, size, etc.
  • 22. Line Plot,Scatter Plot,Area plot/chart • Python's Matplotlib module is used for data visualization. A set of methods called pyplot, a submodule of matplotlib, aids in creating several charts. The relationship between two sets of data, X and Y, is shown using line plots on a distinct axis.
  • 23. Specialized Data Visualization Tools: • BOX PLOT • BUBBLE PLOT • HEAT MAP • VENN DIAGRAM