SlideShare a Scribd company logo
Assignment 3
Perform the Categorization of dataset
• Often in real-time, data includes the text
columns, which are repetitive.
• Features like gender, country, and codes are
always repetitive. These are the examples for
categorical data.
• Categorical variables can take on only a limited,
and usually fixed number of possible values.
• Besides the fixed length, categorical data might
have an order but cannot perform numerical
operation. Categorical are a Pandas data type.
Category Object Creation
import pandas as pd
s = pd.Series(["a","b","c","a"], dtype="category")
print (s)
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a, b, c]
Using pd.Categorical
• Syntax:
pandas.Categorical(values, categories, ordered)
import pandas as pd
cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
print (cat)
[a, b, c, a, b, c]
Categories (3, object): [a, b, c]
import pandas as pd
cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a'])
print (cat)
[a, b, c, a, b, c, NaN]
Categories (3, object): [c, b, a]
.describe() command on the
categorical data
import pandas as pd
import numpy as np
cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
df = pd.DataFrame({"cat":cat, "s":["a", "c", "c", np.nan]})
print df.describe()
print df["cat"].describe() cat s
count 3 3
unique 2 2
top c c
freq 2 2
count 3
unique 2
top c
freq 2
Name: cat, dtype: object

More Related Content

Similar to Data Preprocessing:Perform categorization of data (20)

PDF
Worksheet - python Pandas numerical py pdf
udaywalnandini
 
PDF
De-Cluttering-ML | TechWeekends
DSCUSICT
 
PPTX
introduction to data structures in pandas
vidhyapm2
 
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Ogunsina1
 
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
smartashammari
 
PPTX
interenship.pptx
Naveen316549
 
PPTX
PANDAS IN PYTHON (Series and DataFrame)
Harshitha190299
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PPTX
Unit 1 Ch 2 Data Frames digital vis.pptx
abida451786
 
PPTX
Pandas.pptx
Ramakrishna Reddy Bijjam
 
PPTX
Python Pandas.pptx
SujayaBiju
 
PPTX
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
PPTX
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PDF
PyData Paris 2015 - Track 1.2 Gilles Louppe
Pôle Systematic Paris-Region
 
PPTX
Pa1 session 5
aiclub_slides
 
PDF
Leip103
AyushMishra856610
 
PPTX
Python for data analysis
Savitribai Phule Pune University
 
PDF
Pandas in Python for Data Exploration .pdf
sejalkadam21
 
PPTX
Python-for-Data-Analysis.pptx
ParveenShaik21
 
PPTX
DataFrame in Python Pandas
Sangita Panchal
 
Worksheet - python Pandas numerical py pdf
udaywalnandini
 
De-Cluttering-ML | TechWeekends
DSCUSICT
 
introduction to data structures in pandas
vidhyapm2
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Ogunsina1
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
smartashammari
 
interenship.pptx
Naveen316549
 
PANDAS IN PYTHON (Series and DataFrame)
Harshitha190299
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
Unit 1 Ch 2 Data Frames digital vis.pptx
abida451786
 
Python Pandas.pptx
SujayaBiju
 
Presentation on the basic of numpy and Pandas
ipazhaniraj
 
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
PyData Paris 2015 - Track 1.2 Gilles Louppe
Pôle Systematic Paris-Region
 
Pa1 session 5
aiclub_slides
 
Python for data analysis
Savitribai Phule Pune University
 
Pandas in Python for Data Exploration .pdf
sejalkadam21
 
Python-for-Data-Analysis.pptx
ParveenShaik21
 
DataFrame in Python Pandas
Sangita Panchal
 

More from sonali sonavane (11)

PPTX
Introduction To Pandas:Basics with syntax and examples.pptx
sonali sonavane
 
PPTX
Understanding_Copyright_Presentation.pptx
sonali sonavane
 
PPTX
Python chart plotting using Matplotlib.pptx
sonali sonavane
 
PPTX
SQL: Data Definition Language(DDL) command
sonali sonavane
 
PPTX
SQL Data Manipulation language and DQL commands
sonali sonavane
 
PPTX
Random Normal distribution using python programming
sonali sonavane
 
PPTX
program to create bell curve of a random normal distribution
sonali sonavane
 
PPTX
Data Preprocessing: One Hot Encoding Method
sonali sonavane
 
PPTX
Data Preprocessing Introduction for Machine Learning
sonali sonavane
 
PPTX
Data Preprocessing:Feature scaling methods
sonali sonavane
 
PPTX
NBA Subject Presentation08 march 24_A Y 2023-24.pptx
sonali sonavane
 
Introduction To Pandas:Basics with syntax and examples.pptx
sonali sonavane
 
Understanding_Copyright_Presentation.pptx
sonali sonavane
 
Python chart plotting using Matplotlib.pptx
sonali sonavane
 
SQL: Data Definition Language(DDL) command
sonali sonavane
 
SQL Data Manipulation language and DQL commands
sonali sonavane
 
Random Normal distribution using python programming
sonali sonavane
 
program to create bell curve of a random normal distribution
sonali sonavane
 
Data Preprocessing: One Hot Encoding Method
sonali sonavane
 
Data Preprocessing Introduction for Machine Learning
sonali sonavane
 
Data Preprocessing:Feature scaling methods
sonali sonavane
 
NBA Subject Presentation08 march 24_A Y 2023-24.pptx
sonali sonavane
 
Ad

Recently uploaded (20)

PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PDF
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PPTX
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
PDF
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
Dimensions of Societal Planning in Commonism
StefanMz
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
A PPT on Alfred Lord Tennyson's Ulysses.
Beena E S
 
CONCURSO DE POESIA “POETUFAS – PASSOS SUAVES PELO VERSO.pdf
Colégio Santa Teresinha
 
Ad

Data Preprocessing:Perform categorization of data

  • 1. Assignment 3 Perform the Categorization of dataset
  • 2. • Often in real-time, data includes the text columns, which are repetitive. • Features like gender, country, and codes are always repetitive. These are the examples for categorical data. • Categorical variables can take on only a limited, and usually fixed number of possible values. • Besides the fixed length, categorical data might have an order but cannot perform numerical operation. Categorical are a Pandas data type.
  • 3. Category Object Creation import pandas as pd s = pd.Series(["a","b","c","a"], dtype="category") print (s) 0 a 1 b 2 c 3 a dtype: category Categories (3, object): [a, b, c]
  • 4. Using pd.Categorical • Syntax: pandas.Categorical(values, categories, ordered) import pandas as pd cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c']) print (cat) [a, b, c, a, b, c] Categories (3, object): [a, b, c]
  • 5. import pandas as pd cat=pd.Categorical(['a','b','c','a','b','c','d'], ['c', 'b', 'a']) print (cat) [a, b, c, a, b, c, NaN] Categories (3, object): [c, b, a]
  • 6. .describe() command on the categorical data import pandas as pd import numpy as np cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"]) df = pd.DataFrame({"cat":cat, "s":["a", "c", "c", np.nan]}) print df.describe() print df["cat"].describe() cat s count 3 3 unique 2 2 top c c freq 2 2 count 3 unique 2 top c freq 2 Name: cat, dtype: object