SlideShare a Scribd company logo
Softwares &
Tools
for
Data Analytics
Lecture 6: Introduction to
Data Visualisation
Myint Moe Chit
2
Lecture Outline
• Goals of visualisation
• Usefulness of visualisation
• Basic data visualisation with R and
Python
• Visualising categorical variables
• Visualising numerical variables
• Visualising the relationship
between variables
3
Visualisation in Data Analytics
"We cannot expect a small number of numerical values
[summary statistics] to consistently convey the wealth of
information that exists in data. Numerical reduction
methods do not retain the information in the data.”
William Cleveland The Elements of Graphing Data“
The simple graph has brought more information to the
data analyst’s mind than any other device.
—John Tukey
The use of graphics to examine data is called visualisation.
4
Visualisation in Data Analytics
An important step in the data science methodology is obtaining a visual
representation of the data.
This has multiple advantages:
• We are better at extracting information from visual cues, so a visual
representation is usually more intuitive than a textual representation.
• A visualisation provides a concise snapshot and summarisation of the
data.
The goal of data visualisation is to convey a story to the viewer. This story
could be in the form of general trends about the data or an insight.
5
A picture is worth a thousand words
This visualisation
summarises the
relationship between BMI
and pulse and
corresponding health
status. What do you
discover?
“The greatest value of a picture is when it forces us to notice what we never expected to see.”
6
What Makes a Good Visualisation?
The McCandless Method
Four elements to achieve success in
data visualisation.
1. Information, the data you are working
with must be accurate
2. Story, a clear, compelling, interesting,
and relevant concept
3. Goal, a specific objective or function
for the visual
4. Visual form, an effective use of
metaphor or visual expression
Source:
https://www.informationisbeautiful.net/visualizations/what-makes-a-go
od-data-visualization/
7
Data Visualisation with R
• R includes basic graphing function ‘plot’. But we will use the ggplot2 package.
> install.packages(“ggplot2”)
> library(ggplot2)
> ggplot(data = dta, aes(sex)) + geom_bar(fill = “blue")
The main three components ggplot command are:
• Data: The dta represents the data is being summarised.
We refer to this as the data component.
• Aesthetic mapping: The plot uses several visual cues to
represent the information provided by the dataset.
aes(sex) represents sex variable from the dataset. We
refer to this as the aesthetic mapping component.
• Geometry: geom_bar indicates the plot is a bar graph.
This is referred to as the geometry component.
To use ggplot2 you will have to learn several functions and arguments. These are hard to
memorise, so we highly recommend you have the ggplot2 cheat sheet handy.
8
Summarising continuous numerical variable
• The first step of summarising a continuous
numerical variable is to identify the
distribution of the variable using a histogram
or boxplot.
• Histograms reveal the overall shape of the
frequencies in the groups.
• Suppose, we want to visualise the distribution
of the weight of the respondents in our
sample.
• Using a bar graph as shown below has no
explanatory power because the variable is a
continuous variable.
9
Data Visualisation with R
To summarise a numerical variable:
> ggplot(dta, aes(x=height)) +
geom_histogram(bins = 10, fill = "blue")
> ggplot(dta, aes(weight)) + geom_boxplot() +
coord_flip()
We can add more arguments.
> ggplot(dta, aes(x=height, y=..density..)) +
geom_histogram(bins = 10, fill = "blue") +
geom_density(color="red", size=1.2)
Using boxplot to summarise a numerical variable:
Note: For a variable with right-skewed distribution
and non-negative values (such as income, number of
employees), we may need to use logarithmic scale
for a histogram or boxplot.
10
Data Visualisation with R
Use a stacked (clustered) bar graph to visualise the
association between two categorical variables.
> ggplot(dta, aes(sex, fill = status)) +
geom_bar(position = "stack")
> ggplot(dta, aes(sex, fill = status)) +
geom_bar(position = “dodge")
Use a multiple boxplot to visualise the association
between a categorical and a numerical variable.
> ggplot(dta, aes(status, bmi)) +
geom_boxplot() + coord_flip()
Use a scatterplot to visualise the association between two
numerical variables.
> ggplot(dta, aes(bmi, pulse)) + geom_point()
+ stat_smooth(method="lm")
> ggplot(dta, aes(sex, fill = status)) +
geom_bar(position = “fill")
11
Data Visualisation with Python
We can use the pandas, matplotlib, seaborn packages for visualisation in Python.
In pandas package, just add “plot” attribute with suitable graph type. To plot a bar chart for a
categorical variable.
We can add additional arguments, such as the colour of the bar, font size, etc in the bracket.
12
Data Visualisation with Python
To summarise a numerical variable, use ‘plot.hist()’:
dta["weight"].plot.hist()
dta["weight"].plot.box()
To add a density line, use “seaborn” package.
sns.histplot(dta["weight"], bins=12, color='k’,
kde=True)
Using boxplot to summarise a numerical variable:
dta.boxplot(column = ["weight"])
13
Data Visualisation with Python
To visualise the relationship between two categorical
variables, append “plot.bar” attribute to the cross-tab of
the categorical variables.
pd.crosstab(dta.sex,dta.status).plot.bar()
Use a multiple boxplot to visualise the association
between a categorical and a numerical variable.
dta.boxplot(column=["pulse"], by="exercise",
showmeans=True)
Use Seaborn’s regplot to visualise the association
between two numerical variables with a trend line.
sns.regplot(x="BMI", y="pulse", data=dta)
pd.crosstab(dta.sex,dta.status).plot.bar(stacked = True)
pd.crosstab(dta.sex,dta.status, normalize =
“index”).plot.bar(stacked = True)
14
Scatter plot with a categorical variable
We can also add a third dimension in a scatter plot by
setting different colours of the dots for different group
of a categorical variable. For example, we can assign
different colour for observations with different health
status.
sns.scatterplot(x="BMI", y="pulse",
data=dta, hue = "status")
ggplot(dta, aes(bmi, pulse, colour = status)) +
geom_point(size = 3)
15
Summary of the lecture
In this section, we covered:
• the goals of visualisation
• usefulness of visualisation
• how to visualise categorical variables
• how to visualise numerical variables
• visualising the relationship between variables
Thank you.
Any question?

More Related Content

Similar to Lecture 6 Data Visualisation.pptxsfsfsfsfsdfs (20)

PDF
Data Visualization in R (Graph, Trend, etc)
Rudyansyah -
 
PPTX
Data visualization using R
Ummiya Mohammedi
 
PPTX
Introduction to Data Visualization for Agriculture and Allied Sciences using ...
Shubham Shah
 
PPTX
CLO4 - Week13 data analysiss python.pptx
55296
 
PPTX
BDA_MO_1_S6_Basic_data_analytics_,reporting,_and_apply_basic_data.pptx
HarikaNimmaganti1
 
PDF
UNit4.pdf
SugumarSarDurai
 
PDF
Science Online 2013: Data Visualization Using R
William Gunn
 
PPT
EXPLORATORY DATA ANALYSIS with tools.ppt
geethar79
 
PPT
EXPLORATORY DATA ANALYSIS
BabasID2
 
PPTX
Introduction to Data Visualization_Day 1.pptx
krittika26
 
PPTX
Exploratory Data Analysis week 4
Manzur Ashraf
 
PDF
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
PDF
Data presentation
Weam Banjar
 
PPTX
Exploratory data analysis using r
Tahera Shaikh
 
PPTX
Data Visualization.pptx
Kanchana Weerasinghe
 
PPTX
Data Visualization1.pptx
qwtadhsaber
 
PDF
Data Visualization With R
Rsquared Academy
 
PDF
Data Visualization Techniques
AllAnalytics
 
PDF
Unit---4.pdf how to gst du paper in this day and age
FireBolt6
 
PPTX
Data Visualization Workflow
jeremycadams
 
Data Visualization in R (Graph, Trend, etc)
Rudyansyah -
 
Data visualization using R
Ummiya Mohammedi
 
Introduction to Data Visualization for Agriculture and Allied Sciences using ...
Shubham Shah
 
CLO4 - Week13 data analysiss python.pptx
55296
 
BDA_MO_1_S6_Basic_data_analytics_,reporting,_and_apply_basic_data.pptx
HarikaNimmaganti1
 
UNit4.pdf
SugumarSarDurai
 
Science Online 2013: Data Visualization Using R
William Gunn
 
EXPLORATORY DATA ANALYSIS with tools.ppt
geethar79
 
EXPLORATORY DATA ANALYSIS
BabasID2
 
Introduction to Data Visualization_Day 1.pptx
krittika26
 
Exploratory Data Analysis week 4
Manzur Ashraf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
FEG
 
Data presentation
Weam Banjar
 
Exploratory data analysis using r
Tahera Shaikh
 
Data Visualization.pptx
Kanchana Weerasinghe
 
Data Visualization1.pptx
qwtadhsaber
 
Data Visualization With R
Rsquared Academy
 
Data Visualization Techniques
AllAnalytics
 
Unit---4.pdf how to gst du paper in this day and age
FireBolt6
 
Data Visualization Workflow
jeremycadams
 

Recently uploaded (12)

PDF
OR Royalties Inc. - Major Asset Overview - July 2025
OR Royalties Inc.
 
PPTX
一比一原版(UPF毕业证书)庞培法布拉大学毕业证如何办理
Taqyea
 
DOCX
CHOQUE CORP.docxfvewrfwrqqqqcfefwdsfefwef
zurith2301
 
PDF
OR Royalties Inc. - Corporate presentation, July 7, 2025
OR Royalties Inc.
 
PPTX
巴黎大学文凭办理|办理Paris毕业证学历证书2025年新版
Taqyea
 
PPTX
Juan_Tamad_Final_PPT.pptxyyyyfyffffhhggc
NicsCastillo
 
PPTX
一比一原版(Waseda毕业证书)早稻田大学毕业证如何办理
Taqyea
 
PPTX
英国硕士毕业证罗汉普顿大学学位证书Roehampton录取通知书2025年新版
Taqyea
 
DOCX
The Power of Community in Crypto Investment Decisions.docx
taymormohse7
 
PPTX
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证如何办理
Taqyea
 
PPTX
EXAMPLE OF Group Corporate Restructure.pptx
RidwanFauzan3
 
PPTX
IT CONCjkjkjkjkljkljkljkljkljkljkljkjkljlkjkljkljlkERNS.pptx
kashifmajeedjanjua
 
OR Royalties Inc. - Major Asset Overview - July 2025
OR Royalties Inc.
 
一比一原版(UPF毕业证书)庞培法布拉大学毕业证如何办理
Taqyea
 
CHOQUE CORP.docxfvewrfwrqqqqcfefwdsfefwef
zurith2301
 
OR Royalties Inc. - Corporate presentation, July 7, 2025
OR Royalties Inc.
 
巴黎大学文凭办理|办理Paris毕业证学历证书2025年新版
Taqyea
 
Juan_Tamad_Final_PPT.pptxyyyyfyffffhhggc
NicsCastillo
 
一比一原版(Waseda毕业证书)早稻田大学毕业证如何办理
Taqyea
 
英国硕士毕业证罗汉普顿大学学位证书Roehampton录取通知书2025年新版
Taqyea
 
The Power of Community in Crypto Investment Decisions.docx
taymormohse7
 
一比一原版(ANU毕业证书)澳大利亚国立大学毕业证如何办理
Taqyea
 
EXAMPLE OF Group Corporate Restructure.pptx
RidwanFauzan3
 
IT CONCjkjkjkjkljkljkljkljkljkljkljkjkljlkjkljkljlkERNS.pptx
kashifmajeedjanjua
 
Ad

Lecture 6 Data Visualisation.pptxsfsfsfsfsdfs

  • 1. Softwares & Tools for Data Analytics Lecture 6: Introduction to Data Visualisation Myint Moe Chit
  • 2. 2 Lecture Outline • Goals of visualisation • Usefulness of visualisation • Basic data visualisation with R and Python • Visualising categorical variables • Visualising numerical variables • Visualising the relationship between variables
  • 3. 3 Visualisation in Data Analytics "We cannot expect a small number of numerical values [summary statistics] to consistently convey the wealth of information that exists in data. Numerical reduction methods do not retain the information in the data.” William Cleveland The Elements of Graphing Data“ The simple graph has brought more information to the data analyst’s mind than any other device. —John Tukey The use of graphics to examine data is called visualisation.
  • 4. 4 Visualisation in Data Analytics An important step in the data science methodology is obtaining a visual representation of the data. This has multiple advantages: • We are better at extracting information from visual cues, so a visual representation is usually more intuitive than a textual representation. • A visualisation provides a concise snapshot and summarisation of the data. The goal of data visualisation is to convey a story to the viewer. This story could be in the form of general trends about the data or an insight.
  • 5. 5 A picture is worth a thousand words This visualisation summarises the relationship between BMI and pulse and corresponding health status. What do you discover? “The greatest value of a picture is when it forces us to notice what we never expected to see.”
  • 6. 6 What Makes a Good Visualisation? The McCandless Method Four elements to achieve success in data visualisation. 1. Information, the data you are working with must be accurate 2. Story, a clear, compelling, interesting, and relevant concept 3. Goal, a specific objective or function for the visual 4. Visual form, an effective use of metaphor or visual expression Source: https://www.informationisbeautiful.net/visualizations/what-makes-a-go od-data-visualization/
  • 7. 7 Data Visualisation with R • R includes basic graphing function ‘plot’. But we will use the ggplot2 package. > install.packages(“ggplot2”) > library(ggplot2) > ggplot(data = dta, aes(sex)) + geom_bar(fill = “blue") The main three components ggplot command are: • Data: The dta represents the data is being summarised. We refer to this as the data component. • Aesthetic mapping: The plot uses several visual cues to represent the information provided by the dataset. aes(sex) represents sex variable from the dataset. We refer to this as the aesthetic mapping component. • Geometry: geom_bar indicates the plot is a bar graph. This is referred to as the geometry component. To use ggplot2 you will have to learn several functions and arguments. These are hard to memorise, so we highly recommend you have the ggplot2 cheat sheet handy.
  • 8. 8 Summarising continuous numerical variable • The first step of summarising a continuous numerical variable is to identify the distribution of the variable using a histogram or boxplot. • Histograms reveal the overall shape of the frequencies in the groups. • Suppose, we want to visualise the distribution of the weight of the respondents in our sample. • Using a bar graph as shown below has no explanatory power because the variable is a continuous variable.
  • 9. 9 Data Visualisation with R To summarise a numerical variable: > ggplot(dta, aes(x=height)) + geom_histogram(bins = 10, fill = "blue") > ggplot(dta, aes(weight)) + geom_boxplot() + coord_flip() We can add more arguments. > ggplot(dta, aes(x=height, y=..density..)) + geom_histogram(bins = 10, fill = "blue") + geom_density(color="red", size=1.2) Using boxplot to summarise a numerical variable: Note: For a variable with right-skewed distribution and non-negative values (such as income, number of employees), we may need to use logarithmic scale for a histogram or boxplot.
  • 10. 10 Data Visualisation with R Use a stacked (clustered) bar graph to visualise the association between two categorical variables. > ggplot(dta, aes(sex, fill = status)) + geom_bar(position = "stack") > ggplot(dta, aes(sex, fill = status)) + geom_bar(position = “dodge") Use a multiple boxplot to visualise the association between a categorical and a numerical variable. > ggplot(dta, aes(status, bmi)) + geom_boxplot() + coord_flip() Use a scatterplot to visualise the association between two numerical variables. > ggplot(dta, aes(bmi, pulse)) + geom_point() + stat_smooth(method="lm") > ggplot(dta, aes(sex, fill = status)) + geom_bar(position = “fill")
  • 11. 11 Data Visualisation with Python We can use the pandas, matplotlib, seaborn packages for visualisation in Python. In pandas package, just add “plot” attribute with suitable graph type. To plot a bar chart for a categorical variable. We can add additional arguments, such as the colour of the bar, font size, etc in the bracket.
  • 12. 12 Data Visualisation with Python To summarise a numerical variable, use ‘plot.hist()’: dta["weight"].plot.hist() dta["weight"].plot.box() To add a density line, use “seaborn” package. sns.histplot(dta["weight"], bins=12, color='k’, kde=True) Using boxplot to summarise a numerical variable: dta.boxplot(column = ["weight"])
  • 13. 13 Data Visualisation with Python To visualise the relationship between two categorical variables, append “plot.bar” attribute to the cross-tab of the categorical variables. pd.crosstab(dta.sex,dta.status).plot.bar() Use a multiple boxplot to visualise the association between a categorical and a numerical variable. dta.boxplot(column=["pulse"], by="exercise", showmeans=True) Use Seaborn’s regplot to visualise the association between two numerical variables with a trend line. sns.regplot(x="BMI", y="pulse", data=dta) pd.crosstab(dta.sex,dta.status).plot.bar(stacked = True) pd.crosstab(dta.sex,dta.status, normalize = “index”).plot.bar(stacked = True)
  • 14. 14 Scatter plot with a categorical variable We can also add a third dimension in a scatter plot by setting different colours of the dots for different group of a categorical variable. For example, we can assign different colour for observations with different health status. sns.scatterplot(x="BMI", y="pulse", data=dta, hue = "status") ggplot(dta, aes(bmi, pulse, colour = status)) + geom_point(size = 3)
  • 15. 15 Summary of the lecture In this section, we covered: • the goals of visualisation • usefulness of visualisation • how to visualise categorical variables • how to visualise numerical variables • visualising the relationship between variables