SlideShare a Scribd company logo
Introduction To Pandas
By Dr. Sonali Sonavane
Introduction
• Pandas is a Python library used for working
with data sets.
• It has functions for analyzing, cleaning,
exploring, and manipulating data.
• The name "Pandas" has a reference to both
"Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.
What is Pandas?
• Pandas is a Python library.
• Pandas is used to analyze data.
Pandas Data Structure
• Series
• DataFrame
• Panel
Series
• Series is a one-dimensional array like structure
with homogeneous data. For example, the
following series is a collection of integers 10,
23, 56, …
Key Points
• Homogeneous data
• Size Immutable
• Values of Data Mutable
10 23 56 17 52 61 73 90 26 72
DataFrame
• DataFrame is a two-dimensional array with
heterogeneous data.
Key Points
• Heterogeneous data
• Size Mutable
• Data Mutable
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78
Panel
• Panel is a three-dimensional data structure with
heterogeneous data. It is hard to represent the
panel in graphical representation. But a panel
can be illustrated as a container of DataFrame.
Key Points
• Heterogeneous data
• Size Mutable
• Data Mutable
SERIES
Pandas.Series
• A pandas Series can be created using the
following constructor −
pandas.Series( data, index, dtype, copy)
A series can be created using various inputs like
• Array
• Dict
• Scalar value or constant
Create a Series from ndarray
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print s
Create a Series from dict
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print s
Create a Series from Scalar
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s
Accessing Data from Series with Position
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the first element
print s[0]
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the first three element
print s[:3]
Retrieve Data Using Label (Index)
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve a single element
print s['a']
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve multiple elements
print s[['a','c','d']]
DataFrame
pandas.DataFrame
• A pandas DataFrame can be created using the
following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
• A pandas DataFrame can be created using various
inputs like −
– Lists
– dict
– Series
– Numpy ndarrays
– Another DataFrame
Create a DataFrame from Lists
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
Print(df)
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
Print(df)
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)
Create a DataFrame from Dict of ndarrays / Lists
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
Print(df)
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print(df)
Create a DataFrame from Dict of Series
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Print(df)
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df['three']=df['one']+df['two']
print (df ['three'])
Print(df[one])
Column Deletion
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print df
# using pop function
print ("Deleting another column using POP function:")
df.pop('two')
print df
Row Selection, Addition, and Deletion
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df.loc['b']
Output:
one 2.0
two 2.0
Name: b, dtype: float64
Slice
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df[2:4])
one two
c 3.0 3
d NaN 4
Read CSV Files
• A simple way to store big data sets is to use
CSV files (comma separated files).
• CSV files contains plain text and is a well know
format that can be read by everyone including
Pandas.
• import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())
print(df.head(10))
print(df.tail())
print(df.head())
print(df.info())

More Related Content

Similar to Introduction To Pandas:Basics with syntax and examples.pptx (20)

PDF
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
DineshThallapelly
 
PPTX
pandas for series and dataframe.pptx
ssuser52a19e
 
PPTX
Python Library-Series.pptx
JustinDsouza12
 
PPTX
dataframe_operations and various functions
JayanthiM19
 
PPTX
Pandas Dataframe reading data Kirti final.pptx
Kirti Verma
 
PDF
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
PPTX
Data Frame Data structure in Python pandas.pptx
Ramakrishna Reddy Bijjam
 
PDF
pandas dataframe notes.pdf
AjeshSurejan2
 
PPTX
Data Analysis with Python Pandas
Neeru Mittal
 
PPTX
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PPTX
pandas directories on the python language.pptx
SumitMajukar
 
PPTX
Unit 1 Ch 2 Data Frames digital vis.pptx
abida451786
 
PPTX
Numpy_Pandas_for beginners_________.pptx
Abhi Marvel
 
PDF
Lecture on Python Pandas for Decision Making
ssuser46aec4
 
PPTX
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PDF
2 pandasbasic
pramod naik
 
PDF
Pandas pythonfordatascience
Nishant Upadhyay
 
PPTX
Manipulation and Python Tools-fundamantals of data science
arivukarasi
 
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
DineshThallapelly
 
pandas for series and dataframe.pptx
ssuser52a19e
 
Python Library-Series.pptx
JustinDsouza12
 
dataframe_operations and various functions
JayanthiM19
 
Pandas Dataframe reading data Kirti final.pptx
Kirti Verma
 
Panda data structures and its importance in Python.pdf
sumitt6_25730773
 
Data Frame Data structure in Python pandas.pptx
Ramakrishna Reddy Bijjam
 
pandas dataframe notes.pdf
AjeshSurejan2
 
Data Analysis with Python Pandas
Neeru Mittal
 
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
pandas directories on the python language.pptx
SumitMajukar
 
Unit 1 Ch 2 Data Frames digital vis.pptx
abida451786
 
Numpy_Pandas_for beginners_________.pptx
Abhi Marvel
 
Lecture on Python Pandas for Decision Making
ssuser46aec4
 
Unit 4_Working with Graphs _python (2).pptx
prakashvs7
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
2 pandasbasic
pramod naik
 
Pandas pythonfordatascience
Nishant Upadhyay
 
Manipulation and Python Tools-fundamantals of data science
arivukarasi
 

More from sonali sonavane (11)

PPTX
Understanding_Copyright_Presentation.pptx
sonali sonavane
 
PPTX
Python chart plotting using Matplotlib.pptx
sonali sonavane
 
PPTX
SQL: Data Definition Language(DDL) command
sonali sonavane
 
PPTX
SQL Data Manipulation language and DQL commands
sonali sonavane
 
PPTX
Random Normal distribution using python programming
sonali sonavane
 
PPTX
program to create bell curve of a random normal distribution
sonali sonavane
 
PPTX
Data Preprocessing: One Hot Encoding Method
sonali sonavane
 
PPTX
Data Preprocessing Introduction for Machine Learning
sonali sonavane
 
PPTX
Data Preprocessing:Feature scaling methods
sonali sonavane
 
PPTX
Data Preprocessing:Perform categorization of data
sonali sonavane
 
PPTX
NBA Subject Presentation08 march 24_A Y 2023-24.pptx
sonali sonavane
 
Understanding_Copyright_Presentation.pptx
sonali sonavane
 
Python chart plotting using Matplotlib.pptx
sonali sonavane
 
SQL: Data Definition Language(DDL) command
sonali sonavane
 
SQL Data Manipulation language and DQL commands
sonali sonavane
 
Random Normal distribution using python programming
sonali sonavane
 
program to create bell curve of a random normal distribution
sonali sonavane
 
Data Preprocessing: One Hot Encoding Method
sonali sonavane
 
Data Preprocessing Introduction for Machine Learning
sonali sonavane
 
Data Preprocessing:Feature scaling methods
sonali sonavane
 
Data Preprocessing:Perform categorization of data
sonali sonavane
 
NBA Subject Presentation08 march 24_A Y 2023-24.pptx
sonali sonavane
 
Ad

Recently uploaded (20)

PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
PPTX
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PPTX
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPTX
GRADE-3-PPT-EVE-2025-ENG-Q1-LESSON-1.pptx
EveOdrapngimapNarido
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PDF
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
Post Dated Cheque(PDC) Management in Odoo 18
Celine George
 
I AM MALALA The Girl Who Stood Up for Education and was Shot by the Taliban...
Beena E S
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
Neurodivergent Friendly Schools - Slides from training session
Pooky Knightsmith
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
GRADE-3-PPT-EVE-2025-ENG-Q1-LESSON-1.pptx
EveOdrapngimapNarido
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Aprendendo Arquitetura Framework Salesforce - Dia 03
Mauricio Alexandre Silva
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
Ad

Introduction To Pandas:Basics with syntax and examples.pptx

  • 1. Introduction To Pandas By Dr. Sonali Sonavane
  • 2. Introduction • Pandas is a Python library used for working with data sets. • It has functions for analyzing, cleaning, exploring, and manipulating data. • The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
  • 3. What is Pandas? • Pandas is a Python library. • Pandas is used to analyze data.
  • 4. Pandas Data Structure • Series • DataFrame • Panel
  • 5. Series • Series is a one-dimensional array like structure with homogeneous data. For example, the following series is a collection of integers 10, 23, 56, … Key Points • Homogeneous data • Size Immutable • Values of Data Mutable 10 23 56 17 52 61 73 90 26 72
  • 6. DataFrame • DataFrame is a two-dimensional array with heterogeneous data. Key Points • Heterogeneous data • Size Mutable • Data Mutable Name Age Gender Rating Steve 32 Male 3.45 Lia 28 Female 4.6 Vin 45 Male 3.9 Katie 38 Female 2.78
  • 7. Panel • Panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But a panel can be illustrated as a container of DataFrame. Key Points • Heterogeneous data • Size Mutable • Data Mutable
  • 9. Pandas.Series • A pandas Series can be created using the following constructor − pandas.Series( data, index, dtype, copy) A series can be created using various inputs like • Array • Dict • Scalar value or constant
  • 10. Create a Series from ndarray import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data) print s import pandas as pd import numpy as np data = np.array(['a','b','c','d']) s = pd.Series(data,index=[100,101,102,103]) print s
  • 11. Create a Series from dict import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data) print s import pandas as pd import numpy as np data = {'a' : 0., 'b' : 1., 'c' : 2.} s = pd.Series(data,index=['b','c','d','a']) print s
  • 12. Create a Series from Scalar import pandas as pd import numpy as np s = pd.Series(5, index=[0, 1, 2, 3]) print s
  • 13. Accessing Data from Series with Position import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve the first element print s[0] import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve the first three element print s[:3]
  • 14. Retrieve Data Using Label (Index) import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve a single element print s['a'] import pandas as pd s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve multiple elements print s[['a','c','d']]
  • 16. pandas.DataFrame • A pandas DataFrame can be created using the following constructor − pandas.DataFrame( data, index, columns, dtype, copy) • A pandas DataFrame can be created using various inputs like − – Lists – dict – Series – Numpy ndarrays – Another DataFrame
  • 17. Create a DataFrame from Lists import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) Print(df) import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age']) Print(df) import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) print(df)
  • 18. Create a DataFrame from Dict of ndarrays / Lists import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} df = pd.DataFrame(data) print(df) import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]} df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4']) Print(df)
  • 19. import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data, index=['first', 'second']) print(df)
  • 20. Create a DataFrame from Dict of Series import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) Print(df) import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) df['three']=df['one']+df['two'] print (df ['three']) Print(df[one])
  • 21. Column Deletion d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 'three' : pd.Series([10,20,30], index=['a','b','c'])} df = pd.DataFrame(d) print ("Our dataframe is:") print df # using del function print ("Deleting the first column using DEL function:") del df['one'] print df # using pop function print ("Deleting another column using POP function:") df.pop('two') print df
  • 22. Row Selection, Addition, and Deletion import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print df.loc['b'] Output: one 2.0 two 2.0 Name: b, dtype: float64
  • 23. Slice import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print(df[2:4]) one two c 3.0 3 d NaN 4
  • 24. Read CSV Files • A simple way to store big data sets is to use CSV files (comma separated files). • CSV files contains plain text and is a well know format that can be read by everyone including Pandas. • import pandas as pd df = pd.read_csv('data.csv') print(df.to_string())
  • 25. import pandas as pd df = pd.read_csv('data.csv') print(df.head()) print(df.head(10)) print(df.tail()) print(df.head()) print(df.info())