Exploring Categorical Data Last Updated : 25 Jul, 2022 Summarize Comments Improve Suggest changes Share Like Article Like Report Categorical Variable/Data (or Nominal variable): Such variables take on a fixed and limited number of possible values. For example – grades, gender, blood group type, etc. Also, in the case of categorical variables, the logical order is not the same as categorical data e.g. “one”, “two”, “three”. But the sorting of these variables uses logical order. For example, gender is a categorical variable and has categories - male and female and there is no intrinsic ordering to the categories. A purely categorical variable is one that simply allows you to assign categories, but you cannot clearly order the variables. Terms related to Variability Metrics : Mode : Most frequently occurring value in the given data Example- Data = ["Car", "Bat", "Bat", "Car", "Bat", "Bat", "Bat", "Bike"] Mode = "Bat"Expected Value : When working in machine learning, categories have to be associated with a numeric value, so as to give understanding to the machine. This gives an average value based on a category’s probability of occurrence i.e. Expected Value. It is calculated by --> Multiply each outcome by its probability of occurring. -> Sum these valuesSo, it is the sum of values times their probability of occurrence often used to sum up factor variable levels.Bar Charts : Frequency of each category plotted as bars. Loading Libraries - Python3 import matplotlib.pyplot as plt import numpy as np Data - Python3 label = ['Car', 'Bike', 'Truck', 'Cycle', 'Jeeps', 'Ambulance'] no_vehicle = [941, 854, 4595, 2125, 942, 509] Indexing Data - Python3 index = np.arange(len(label)) print ("Total Labels : ", len(label)) print ("Indexing : ", index) Output:Total Labels : 6 Indexing : [0 1 2 3 4 5]Bar Graph - Python3 plt.bar(index, no_vehicle) plt.xlabel('Type', fontsize = 15) plt.ylabel('No of Vehicles', fontsize = 15) plt.xticks(index, label, fontsize = 10, rotation = 30) plt.title('Market Share for Each Genre 1995-2017') plt.show() Output: Pie Charts : Frequency of each category plotted as pie or wedges. It is a circular graph, where the arc length of each slice is proportional to the quantity it represents. Python3 plt.figure(figsize =(8, 8)) plt.pie(no_vehicle, labels = label, startangle = 90, autopct ='%.1f %%') plt.show() Output: Machine Learning - Handling Categorical Data Comment More infoAdvertise with us Next Article Seaborn Datasets For Data Science M mohit gupta_omg :) Follow Improve Article Tags : Data Science data-science python Practice Tags : python Similar Reads Measures in Data Mining - Categorization and Computation In data mining, Measures are quantitative tools used to extract meaningful information from large sets of data. They help in summarizing, describing, and analyzing data to facilitate decision-making and predictive analytics. Measures assess various aspects of data, such as central tendency, variabil 5 min read Nominal vs Ordinal Data Data science revolves around the processing and analysis of data utilizing a range of tools and techniques. In today's data-driven world, we come across types of data each requiring handling and interpretation. It is important to understand different types of data for proper data analysis and statis 7 min read Seaborn Datasets For Data Science Seaborn, a Python data visualization library, offers a range of built-in datasets that are perfect for practicing and demonstrating various data science concepts. These datasets are designed to be simple, intuitive, and easy to work with, making them ideal for beginners and experienced data scientis 7 min read How to Collect Data Sets Data sets are fundamental to various fields, including research, machine learning, data analysis, and business intelligence. Collecting high-quality data sets is crucial for ensuring the accuracy and reliability of any conclusions drawn from the data. This article will cover the essential steps and 4 min read What is Data Analysis? Data analysis refers to the practice of examining datasets to draw conclusions about the information they contain. It involves organizing, cleaning, and studying the data to understand patterns or trends. Data analysis helps to answer questions like "What is happening" or "Why is this happening".Org 6 min read What Is Ordinal Data? Ordinal data is a form of categorical data that has a meaningful order among its categories. But, it lacks any numerical values or a fixed interval that can separate them from each other. In simple terms, ordinal data represents variables that can be ranked or ordered, but the precise difference bet 7 min read Like