SlideShare a Scribd company logo
Pandas
Library Of
Python
Table of contents
01
03
02
04
Introduction Data Cleaning
Correlations Plotting
• Pandas Getting
started
• Pandas series
• Pandas DataFrames
• Pandas Read CSV
• Pandas Read JSON
• Pandas Analyzing
Data
• Clean Data
• Clean Empty Cells
• Clean wrong Format
• Clean Wrong Data
• Remove Duplication
Introduction
01
• Pandas is a Python library used for working
with data sets.
• It has functions for analyzing, cleaning,
exploring, and manipulating data.
• The name "Pandas" has a reference to both
"Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.
Pandas Getting Started
Installation of Pandas:
C:usersyour name>pip install pandas
Import Pandas:
import pandas
Import with alias:
import pandas as pd
Checking Pandas Version:
pd.__version__
C:UsersYour Name>pip install
pandas
C:UsersYour Name>pip
Pandas Series
● A Pandas Series is like a column in a
table.
● It is a one-dimensional array holding
data of any type.
● It is denoted by Labels or Index.
● Labels are mutable.
● Default labels is zero.
● Labels are used to access specific
value.
● Syntax: pd.Series()
Pandas DataFrames
• A Pandas DataFrame is a table with rows and columns.
• It is a two-dimensional data structure.
• It is denoted by labels or index.
• Labels are mutable.
• Syntax: pd.DataFrame()
• Pandas use the loc attribute to return one or more
specified rows.
• Use single [ ] brackets for single rows.
• Use double Brackets [[ ]] for multiple rows.
Files Format in Pandas
Comma separated
Rows &columns
Simple data
Easier to read
Plain text
Nested data
Complex data
Readable (but
complex to read)
SQL
HTML
PICKLE
STATA
CSV JSON OTHERS
Uses of CSV Files in Data Analysis:
Store and maintain
collected data in a simple,
organized format.
Share datasets with
others, as CSV files are
widely compatible.
Easily read data for
cleaning and preparation
in Pandas.
Export analyzed or
processed data from a
DataFrame back to a CSV
file for reporting
other applications.
Collection
Preprocessing
Sharing
Export
Uses of JSON files
•Collection
Store and maintain structured or nested data, commonly used in
web applications, APIs, and databases.
•Sharing
Share complex data structures between applications, as JSON is
compatible with many web and mobile platforms.
•Preprocessing
Easily read JSON data into Pandas for cleaning and organizing.
JSON’s structure supports hierarchical data, which can be
processed into a structured format.
•Export
Export analyzed or processed data from Pandas back to JSON,
maintaining the hierarchical format for use in APIs, databases, or
further applications.
DATA
CLEANING
02
Data cleaning in pandas involves preparing
and correcting data by handling missing
values, fixing data types, and removing
inconsistencies for accurate analysis
Why data
cleaning
important?
• Ensures data accuracy and reliability
for analysis
• Improves data quality, which
enhances the validity of insights
• Prepares data for accurate modeling
and machine learning
• Reduces errors that could lead to
incorrect conclusions
Types of data
cleaning
1. Fixing Structural Errors: Correcting inconsistencies in
data format and values.
2. Managing Unwanted Outliers: Handling extreme
values that skew the data.
3. Handling Missing Data: Addressing or filling in missing
values.
4. Removal of Unwanted Observations: Eliminating
irrelevant or incorrect data points.
Data Cleaning Workflow in Pandas
1. Import Libraries:
Import pandas and
load data
2. Inspect Data:
Use .head() and .info()
to review data
structure
3. Handle Missing Values:
Use dropna() or fillna() for
missing data
5. Save Cleaned Data:
Export cleaned data to a
new file
4.Fix Columns with Wrong
Format:Use str.replace() to
clean unwanted characters
• Convert the cleaned column
to a numeric type
This workflow helps prepare data for analysis and ensures data quality.
Data cleaning & removing duplicates
•Ensures data is accurate , consistent and ready for
analysis.
•Duplicated or wrong data can lead to incorrect results.
•Panda libraries has python code syntax to remove and
clean data
Syntax
2. Removing Duplicates
• Remove duplicate rows: Drop
rows where all values are the
same as another row.
• Remove duplicates based on
specific columns: Drop rows
that have duplicate values in
one or more selected columns.
• Keep the first or last
occurrence: Retain either the
first or last duplicate entry and
remove others.
1.Removing Missing Data
• Remove rows with missing
values: Drop rows that contain
any null values.
• Remove rows with missing
values in specific columns:
Drop rows where certain
columns have null values.
• Fill missing values: Replace
null with a specified value (e.g.,
0) or use the mean of the
column.
• Forward fill: Propagate the
last valid value forward.
• Backward fill: Use the next
valid value to fill missing
Correlation
03
Correlation is a statistical measure that shows how
two variables move in relation to each other. It helps
identify associations between variables and is widely
used in finance, data science, and machine learning
for feature selection, pattern recognition, and
understanding variable relationships.
TYPES OF CORRELATIONS
Netural
If there is no identifiable
pattern between the two
variables, they are
uncorrelated.
Correlation
Negative
When one variable
increases, the other
variable tends to
decrease. For example, an
increase in the price of a
product might negatively
correlate with sales
quantity
Positive
When one variable
increases, the other
variable tends to increase
as well. For example,
height and weight often
have a positive
correlation
Measuring Correlation Coefficients
 Correlation Coefficient (r): Measures the direction and
strength of a relationship, ranging from -1 to +1.
 +1: Perfect positive correlation
 -1: Perfect negative correlation
 0: No correlation
 Types of Correlation Coefficients in Pandas:
 Pearson: Measures linear relationships (default).
 Spearman: Measures rank correlation, useful for non-
linear data.
 Kendall: Another rank-based method.
Calculating Correlation in Pandas
Corr() Method: Computes the
correlation between all numerical columns
in a DataFrame.
Specifying Method Type: You can
specify the method of correlation
(Pearson, Spearman, Kendall).
C:UsersYour Name>pip install
pandas
Pandas Plotting
04
• Pandas Library built on top of Matplotlib,
Pandas has built-in support for easy data
visualization.
• It purpose is to visualize trends, distributions,
and relationships in data for analysis.
• Syntax for general plotting: .plot()
• Specific Plotting requires three arguments i.e
“kind” , “x” , “y”.
Suitable for categorical
data.
Syntax: df.plot.bar()
Displays distribution
of numerical data.
Syntax: df.plot.hist()
Types of Pandas Plotting
Ideal for time series
data.
Syntax: df.plot.line()
Used to show correlation
between two variables.
Syntax: df.plot.scatter(x,
y)
Highlights data
distribution, central
values, and outliers.
Syntax: df.plot.box()
Set color, labels, titles,
font, style and grid lines.
Line Bar Histogram
Scatter Box Customize
THANKS
ANY QUESTIONS?

More Related Content

Similar to Pandas in Programming (python) presentation (20)

PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
PDF
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
PPTX
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
PPTX
Meetup Junio Data Analysis with python 2018
DataLab Community
 
PPTX
Presentation on data preparation with pandas
AkshitaKanther
 
PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
jagatpal4217
 
PPTX
interenship.pptx
Naveen316549
 
PPTX
Working with Graphs _python.pptx
MrPrathapG
 
PPTX
Pandas csv
Devashish Kumar
 
PPTX
Lecture 3 intro2data
Johnson Ubah
 
PPTX
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
PDF
Data Wrangling and Visualization Using Python
MOHITKUMAR1379
 
PDF
Aaa ped-5-Data manipulation: Pandas
AminaRepo
 
PPTX
2. Data Preprocessing with Numpy and Pandas.pptx
PeangSereysothirich
 
PDF
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
R.K.College of engg & Tech
 
PPTX
Lecture 9.pptx
MathewJohnSinoCruz
 
PDF
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython Wes Mck...
arianmutchpp
 
PPT
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
sagarrathore52204
 
PPTX
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
PPTX
Pandas-(Ziad).pptx
Sivam Chinna
 
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
Meetup Junio Data Analysis with python 2018
DataLab Community
 
Presentation on data preparation with pandas
AkshitaKanther
 
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
jagatpal4217
 
interenship.pptx
Naveen316549
 
Working with Graphs _python.pptx
MrPrathapG
 
Pandas csv
Devashish Kumar
 
Lecture 3 intro2data
Johnson Ubah
 
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
Data Wrangling and Visualization Using Python
MOHITKUMAR1379
 
Aaa ped-5-Data manipulation: Pandas
AminaRepo
 
2. Data Preprocessing with Numpy and Pandas.pptx
PeangSereysothirich
 
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
R.K.College of engg & Tech
 
Lecture 9.pptx
MathewJohnSinoCruz
 
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython Wes Mck...
arianmutchpp
 
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
sagarrathore52204
 
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
Pandas-(Ziad).pptx
Sivam Chinna
 

Recently uploaded (20)

PDF
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
PPTX
Controller Request and Response in Odoo18
Celine George
 
PDF
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PPTX
QUARTER 1 WEEK 2 PLOT, POV AND CONFLICTS
KynaParas
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
PDF
The Different Types of Non-Experimental Research
Thelma Villaflores
 
PDF
Horarios de distribución de agua en julio
pegazohn1978
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
Reconstruct, Restore, Reimagine: New Perspectives on Stoke Newington’s Histor...
History of Stoke Newington
 
Controller Request and Response in Odoo18
Celine George
 
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
QUARTER 1 WEEK 2 PLOT, POV AND CONFLICTS
KynaParas
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
The Different Types of Non-Experimental Research
Thelma Villaflores
 
Horarios de distribución de agua en julio
pegazohn1978
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
Ad

Pandas in Programming (python) presentation

  • 2. Table of contents 01 03 02 04 Introduction Data Cleaning Correlations Plotting • Pandas Getting started • Pandas series • Pandas DataFrames • Pandas Read CSV • Pandas Read JSON • Pandas Analyzing Data • Clean Data • Clean Empty Cells • Clean wrong Format • Clean Wrong Data • Remove Duplication
  • 3. Introduction 01 • Pandas is a Python library used for working with data sets. • It has functions for analyzing, cleaning, exploring, and manipulating data. • The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
  • 4. Pandas Getting Started Installation of Pandas: C:usersyour name>pip install pandas Import Pandas: import pandas Import with alias: import pandas as pd Checking Pandas Version: pd.__version__ C:UsersYour Name>pip install pandas C:UsersYour Name>pip
  • 5. Pandas Series ● A Pandas Series is like a column in a table. ● It is a one-dimensional array holding data of any type. ● It is denoted by Labels or Index. ● Labels are mutable. ● Default labels is zero. ● Labels are used to access specific value. ● Syntax: pd.Series()
  • 6. Pandas DataFrames • A Pandas DataFrame is a table with rows and columns. • It is a two-dimensional data structure. • It is denoted by labels or index. • Labels are mutable. • Syntax: pd.DataFrame() • Pandas use the loc attribute to return one or more specified rows. • Use single [ ] brackets for single rows. • Use double Brackets [[ ]] for multiple rows.
  • 7. Files Format in Pandas Comma separated Rows &columns Simple data Easier to read Plain text Nested data Complex data Readable (but complex to read) SQL HTML PICKLE STATA CSV JSON OTHERS
  • 8. Uses of CSV Files in Data Analysis: Store and maintain collected data in a simple, organized format. Share datasets with others, as CSV files are widely compatible. Easily read data for cleaning and preparation in Pandas. Export analyzed or processed data from a DataFrame back to a CSV file for reporting other applications. Collection Preprocessing Sharing Export
  • 9. Uses of JSON files •Collection Store and maintain structured or nested data, commonly used in web applications, APIs, and databases. •Sharing Share complex data structures between applications, as JSON is compatible with many web and mobile platforms. •Preprocessing Easily read JSON data into Pandas for cleaning and organizing. JSON’s structure supports hierarchical data, which can be processed into a structured format. •Export Export analyzed or processed data from Pandas back to JSON, maintaining the hierarchical format for use in APIs, databases, or further applications.
  • 10. DATA CLEANING 02 Data cleaning in pandas involves preparing and correcting data by handling missing values, fixing data types, and removing inconsistencies for accurate analysis
  • 11. Why data cleaning important? • Ensures data accuracy and reliability for analysis • Improves data quality, which enhances the validity of insights • Prepares data for accurate modeling and machine learning • Reduces errors that could lead to incorrect conclusions
  • 12. Types of data cleaning 1. Fixing Structural Errors: Correcting inconsistencies in data format and values. 2. Managing Unwanted Outliers: Handling extreme values that skew the data. 3. Handling Missing Data: Addressing or filling in missing values. 4. Removal of Unwanted Observations: Eliminating irrelevant or incorrect data points.
  • 13. Data Cleaning Workflow in Pandas 1. Import Libraries: Import pandas and load data 2. Inspect Data: Use .head() and .info() to review data structure 3. Handle Missing Values: Use dropna() or fillna() for missing data 5. Save Cleaned Data: Export cleaned data to a new file 4.Fix Columns with Wrong Format:Use str.replace() to clean unwanted characters • Convert the cleaned column to a numeric type This workflow helps prepare data for analysis and ensures data quality.
  • 14. Data cleaning & removing duplicates •Ensures data is accurate , consistent and ready for analysis. •Duplicated or wrong data can lead to incorrect results. •Panda libraries has python code syntax to remove and clean data
  • 15. Syntax 2. Removing Duplicates • Remove duplicate rows: Drop rows where all values are the same as another row. • Remove duplicates based on specific columns: Drop rows that have duplicate values in one or more selected columns. • Keep the first or last occurrence: Retain either the first or last duplicate entry and remove others. 1.Removing Missing Data • Remove rows with missing values: Drop rows that contain any null values. • Remove rows with missing values in specific columns: Drop rows where certain columns have null values. • Fill missing values: Replace null with a specified value (e.g., 0) or use the mean of the column. • Forward fill: Propagate the last valid value forward. • Backward fill: Use the next valid value to fill missing
  • 16. Correlation 03 Correlation is a statistical measure that shows how two variables move in relation to each other. It helps identify associations between variables and is widely used in finance, data science, and machine learning for feature selection, pattern recognition, and understanding variable relationships.
  • 17. TYPES OF CORRELATIONS Netural If there is no identifiable pattern between the two variables, they are uncorrelated. Correlation Negative When one variable increases, the other variable tends to decrease. For example, an increase in the price of a product might negatively correlate with sales quantity Positive When one variable increases, the other variable tends to increase as well. For example, height and weight often have a positive correlation
  • 18. Measuring Correlation Coefficients  Correlation Coefficient (r): Measures the direction and strength of a relationship, ranging from -1 to +1.  +1: Perfect positive correlation  -1: Perfect negative correlation  0: No correlation  Types of Correlation Coefficients in Pandas:  Pearson: Measures linear relationships (default).  Spearman: Measures rank correlation, useful for non- linear data.  Kendall: Another rank-based method.
  • 19. Calculating Correlation in Pandas Corr() Method: Computes the correlation between all numerical columns in a DataFrame. Specifying Method Type: You can specify the method of correlation (Pearson, Spearman, Kendall). C:UsersYour Name>pip install pandas
  • 20. Pandas Plotting 04 • Pandas Library built on top of Matplotlib, Pandas has built-in support for easy data visualization. • It purpose is to visualize trends, distributions, and relationships in data for analysis. • Syntax for general plotting: .plot() • Specific Plotting requires three arguments i.e “kind” , “x” , “y”.
  • 21. Suitable for categorical data. Syntax: df.plot.bar() Displays distribution of numerical data. Syntax: df.plot.hist() Types of Pandas Plotting Ideal for time series data. Syntax: df.plot.line() Used to show correlation between two variables. Syntax: df.plot.scatter(x, y) Highlights data distribution, central values, and outliers. Syntax: df.plot.box() Set color, labels, titles, font, style and grid lines. Line Bar Histogram Scatter Box Customize