2. Introduction
• Pandas is a Python library used for working
with data sets.
• It has functions for analyzing, cleaning,
exploring, and manipulating data.
• The name "Pandas" has a reference to both
"Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.
3. What is Pandas?
• Pandas is a Python library.
• Pandas is used to analyze data.
5. Series
• Series is a one-dimensional array like structure
with homogeneous data. For example, the
following series is a collection of integers 10,
23, 56, …
Key Points
• Homogeneous data
• Size Immutable
• Values of Data Mutable
10 23 56 17 52 61 73 90 26 72
6. DataFrame
• DataFrame is a two-dimensional array with
heterogeneous data.
Key Points
• Heterogeneous data
• Size Mutable
• Data Mutable
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78
7. Panel
• Panel is a three-dimensional data structure with
heterogeneous data. It is hard to represent the
panel in graphical representation. But a panel
can be illustrated as a container of DataFrame.
Key Points
• Heterogeneous data
• Size Mutable
• Data Mutable
9. Pandas.Series
• A pandas Series can be created using the
following constructor −
pandas.Series( data, index, dtype, copy)
A series can be created using various inputs like
• Array
• Dict
• Scalar value or constant
10. Create a Series from ndarray
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print s
11. Create a Series from dict
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print s
12. Create a Series from Scalar
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s
13. Accessing Data from Series with Position
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the first element
print s[0]
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the first three element
print s[:3]
14. Retrieve Data Using Label (Index)
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve a single element
print s['a']
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve multiple elements
print s[['a','c','d']]
16. pandas.DataFrame
• A pandas DataFrame can be created using the
following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
• A pandas DataFrame can be created using various
inputs like −
– Lists
– dict
– Series
– Numpy ndarrays
– Another DataFrame
17. Create a DataFrame from Lists
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
Print(df)
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
Print(df)
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)
18. Create a DataFrame from Dict of ndarrays / Lists
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print(df)
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
Print(df)
20. Create a DataFrame from Dict of Series
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Print(df)
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
df['three']=df['one']+df['two']
print (df ['three'])
Print(df[one])
21. Column Deletion
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df
# using del function
print ("Deleting the first column using DEL function:")
del df['one']
print df
# using pop function
print ("Deleting another column using POP function:")
df.pop('two')
print df
22. Row Selection, Addition, and Deletion
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df.loc['b']
Output:
one 2.0
two 2.0
Name: b, dtype: float64
23. Slice
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print(df[2:4])
one two
c 3.0 3
d NaN 4
24. Read CSV Files
• A simple way to store big data sets is to use
CSV files (comma separated files).
• CSV files contains plain text and is a well know
format that can be read by everyone including
Pandas.
• import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())