Classification decision tree

CLASSIFICATION:
DECISIONTREE
Sarvajanik College of Engineering &Technology
Computer (Shift-1) 7th year Group 9 (Morning)

Prepared by:
Name Enrollment No
Yazad Dumasia 140420107015
Karan Gajjar 140420107016
Vievk Kadhiwala 140420107022
Darshan Koshiya 140420107025
Presented To:
Prof. Rakesh Patel
Prof. Snehal Gandhi

Introduction to Classification

• Classification, the task of assigning objects to one of several predefined
categories or class.
• Given a collection of records (training set )
• Each record contains a set of attributes, one of the attributes is the class.
• Find a model for class attribute as a function of the values of other attributes.
• Goal: previously unseen records should be assigned a class as accurately as
possible.
• A test set is used to determine the accuracy of the model. Usually, the given
data set is divided into training and test sets, with training set used to build the
model and test set used to validate it.
Introduction to Classification

Tid Attrib1 Attrib2 Attrib3 Class
1 Yes large 120k No
2 No Medium 110k No
3 No Small 75k No
4 Yes Medium 130k No
5 No Large 85k Yes
6 No Medium 60k No
7 Yes Large 220k No
8 No Small 85k Yes
9 No Medium 75k No
10 no Small 90k Yes
Model
Tid
Attrib1 Attrib2 Attrib3 Class
11 No Small 1300k ?
12 Yes Medium 90k ?
13 Yes Large 105k ?
14 No Small 90k ?
15 No Large 65k ?
Learning
algorithm
Learn
Model
Apply
Model
Deduction
Induction

•Example of Classification techniques:
- Decision Tree 
- Neural Network
- Rule-Based
- naïve Bayes Classifier, etc.
• Classification techniques are most suited for predicting data sets with binary
or nominal categories. They are less effective for ordinal categories since
they do not consider the implicit order among the categories.
Classification techniques

Following are the examples of cases where the data analysis task is Classification
• A bank loan officer wants to analyze the data in order to know which customer (loan applicant)
are risky or which are safe.
• A marketing manager at a company needs to analyze a customer with a given profile, who will
buy a new computer.
• In both of the above examples, a model or classifier is constructed to predict the categorical
labels. These labels are risky or safe for loan application data and yes or no for marketing data.
Example of Classification

•A basic algorithm for learning decision trees is as below.
•During tree construction, attribute selection measures are used to
select the attribute that best partitions the tuples into distinct
classes.
•When decision trees are built, many of the branches may reflect
noise or outliers in the training data.
•Tree pruning attempts to identify and remove such branches, with
the goal of improving classification accuracy on unseen data.
DECISION TREE

• Classification and Regression algorithm
• Twoig
• Gini
• Entropy-based algorithm
• ID3
• C4.5
• Construction of a decision tree is possible in two ways:
Based on the training data
Top-Down strategy Down strategy
DECISION TREE

Example to generate decision tree
The data set has five attributes.
► There is a special attribute: the attribute class is the class label.
► The attributes, temp (temperature) and humidity are numerical
Attributes
► Other attributes are categorical, that is, they cannot be ordered.
► Based on the training data set, we want to find a set of rules to
know what values of outlook, temperature, humidity and wind,
determine whether or not to play Tennis.
From the given sample data we can observe that
Outlook= {Sunny , Overcast , Rain}
Temperature= {Hot , Mild , cool}
Humidity={ High , Normal}
Wind speed = {Weak , Strong } 10
ClassTraining data set for Play Golf
Day Outlook Temp Humidity Wind
speed
Play
ball
1 Rainy Hot High False No
2 Rainy Hot High True No
3 Overcast Hot High False Yes
4 Sunny Mild High False Yes
5 Sunny Cool Normal False Yes
6 Sunny Cool Normal True No
7 Overcast Cool Normal True Yes
8 Rainy Mild High False No
9 Rainy Cool Normal False Yes
10 Sunny Mild Normal False Yes
11 Rainy Mild Normal Strong Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Sunny Mild High True No

• Consider the following training data set.
• There are three attributes, namely, age, pin code
and class.
• The attribute class is used for class label.
The attribute age is a numeric attribute, whereas
pin code is a categorical one.
Though the domain of pin code is numeric, no
ordering can be define d among pin code values.
You cannot derive any useful information if one
pin-code is greater than another pin code.
Concept of Categorical Attributes
ID AGE PINCODE CLASS
1 30 395003 C1
2 25 395003 C1
3 21 395013 C2
4 43 395003 C1
5 18 395013 C2
6 33 395013 C1
7 29 395013 C1
8 55 395003 C2
9 48 395003 C1
Another Example

Figuregivesadecisiontreeforthetraining
datatrainingdata.
Thesplittingattributeattherootispincode
andthesplittingcriterionandthesplitting
criterionhereispincode=395003.
Similarly,fortheleftchildnode,thesplitting
criterionisage<48(thesplittingattributeis
age).
Althoughtherightchildnodehasthesame
attributeasthesplittingattribute,the
splittingcriterionisdifferent.
Pincode = 395003 {1-9}
Age <=48;
[1,2,4,8,9]
C2:8C1: [1,2,4,9]
Age <=21;
[3,5,6,7]
C1 : [6,7]C2:[3,5]
Atrootlevel,wehave9records.Theassociated
splittingcriterionis pincode=395003.
Asaresult,wesplittherecordsintotwosubsets.
Records1,2,4,8,and9aretotheleftchildnote
andremainingtotherightnode.
Theprocessisrepeatedateverynode.

• Invented by J.Ross Quinlan in 1975.
• Each node corresponds to a splitting attribute
• Each arc is a possible value of that attribute.
• At each node the splitting attribute is selected to be the most informative among the
attributes not yet considered in the path from the root.
• Entropy is used to measure how informative is a node.
Iterative Dichotomizer 3 (ID3)

• Used to generate a decision tree from a given data set by employing a
top-down, greedy search, to test each attribute at every node of the tree.
• The resulting tree is used to classify future samples.
ID3 (Iterative Dichotomiser 3): Basic Idea

• Entropy, as it relates to machine learning, is a measure of the randomness in the information
being processed.
• The higher the entropy, the harder it is to draw any conclusions from that information.
• Flipping a coin is an example of an action that provides information that is random. For a coin
that has no affinity for 'heads' or 'tails', the outcome of any number of tosses is difficult to
predict. Why?
• Because there is no relationship between flipping, and the outcome. This is the essence of
entropy.
What is entropy in decision tree ?

Entropy
 In order to define information gain precisely, we need to discuss entropy
first.
 A formula to calculate the homogeneity of a sample.
 A completely homogeneous sample has entropy of 0 (leaf node).
 An equally divided sample has entropy of 1.
 The formula for entropy is:
 where p(I) is the proportion of S belonging to class I.
 ∑ is over total outcomes. Log2 is log base 2.
Entropy(S) = -p(I) log2 p(I)<=1

Example for understand Entropy :
Example 1
 If S is a collection of 14 examples with 9 YES and 5 NO examples then
 Entropy(S) = - (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940
A collection S consists of 20 data examples:
13 Yes : 7 No
Entropy(S) = – (13/20) Log2(13/20)– (7/20) Log2(7/20)
Entropy(S) = 0.934

• Informationgainistheamountofinformationthat'sgainedbyknowingthe
valueoftheattribute,whichistheentropyofthedistributionbeforethesplit
minustheentropyofthedistributionafterit.Thelargestinformationgainis
equivalenttothesmallestentropy.
What is information gain in data mining?
Play Golf
Yes No
Outlook
Sunny 3 2 5
Overcast 4 0 4
Rainy 2 3 5
14
E(PlayGolf,outlook) = P(Sunny)*E(3,2) + P(Overcast)*E(4,0)+P(Rainy)*E(2,3)
= (5/14)*0.971 +(4/14)*0.0 +(5/14)*0.971
=0.693

Movie Example
Film Country of Origin Big Star Genre Success
1 United States Yes Science Fiction True
2 United States No Comedy False
3 United States Yes Comedy True
4 India No Comedy True
5 India Yes Science Fiction False
6 India Yes Romance False
7 Rest of World Yes Comedy False
8 Rest of World No Science Fiction False
9 India Yes Comedy True

Entropy ofTable
Is the Film a Success?
Entropy(5 Yes, 5 No) = –[ (5/10) Log2(5/10) + (5/10) Log2(5/10)]
Entropy(Success) = 1

Split – Country of Origin

Gain – Country of Origin
Where is the film from?
Entropy(USA) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)
Entropy(USA) = 0.811
Entropy(India) = – (2/4) Log2(2/4) – (2/4) Log2(2/4)
Entropy(India) = 1
Entropy(Rest of World) = – (0/2) Log2(0/2) – (2/2) Log2(2/2)
Entropy(Rest of World) = 0
Gain(Origin) = 1 – (4/10 *0.811 + 4/10*1 + 2/10*0) = 0.276

Split – Big Star

Gain – Big Star
Is there a Big Star in the film?
Entropy(Yes) = – (4/7) Log2(4/7) – (3/7) Log2(3/7)
Entropy(Yes) = 0.985
Entropy(No) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)
Entropy(No) = 0.918
Gain(Star) = 1 – (7/10 *0.985 + 3/10*0.918) = 0.0351

Split – Genre

Gain – Genre
What genre is the film?
Entropy(SciFi) = – (1/3) Log2(1/3) – (2/3) Log2(2/3)
Entropy(SciFi) = 0.918
Entropy(Com) = – (4/6) Log2(4/6) – (2/6) Log2(2/6)
Entropy(Com) = 0.918
Entropy(Rom) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)
Entropy(Rom) = 0
Gain(Genre) = 1 – (3/10 *0.918 + 6/10*0.918+ 1/10*0) = 0.1738

Compare Gains…
Gain(Origin) = 0.276
Gain(Star) = 0.0351
Gain(Genre) = 0.1738
Gain(Origin) = 0.276
Gain(Star) = 0.0351
First Split: Origin

All Movies
NewTable
United States
India
NewTable NewTable

All Movies
NewTable
United States
India
Rest of World
NewTable NewTable

NewTable – United States
Entropy(3Yes, 1 No) = – (3/4) Log2(3/4) – (1/4) Log2(1/4)
Entropy(Success) = 0.811

Split – Big Star

Gain – Big Star
Is there a Big Star in the film?
Entropy(Yes) = – (3/3) Log2(3/3) – (0/3) Log2(0/3)
Entropy(Yes) = 0
Entropy(No) = – (0/1) Log2(0/1) – (1/1) Log2(1/1)
Entropy(No) = 0
Gain(Star) = 0.811 – (3/4 *0 + 1/4*0) = 0.811

Split – Genre

Gain – Genre
What genre is the film?
Entropy(SciFi) = – (1/1) Log2(1/1) – (0/1) Log2(0/1)
Entropy(SciFi) = 0
Entropy(Com) = – (2/3) Log2(2/3) – (1/3) Log2(1/3)
Entropy(Com) = 0.918
Gain(Genre) = 0.811 – (1/4 *0 + 3/4*0.918) = 0.1225

Compare Gains…
Gain(Star) = 0.811
Gain(Star) = 0.811
Split: Star

All Movies
United States
India
Rest of World
NewTable NewTableTable
Star No Star
NewTable NewTable

All Movies
United States
India
Rest of World
NewTable NewTableTable
Star
No Star
Table
Failure
Sci-Fi Comedy
Success Success

All Movies
United States
India
Rest of World
Table
Table
Table
Star No Star
Table
Failure
Sci-Fi
Comedy
Success Success
Star No Star
Table Success
Sci-Fi Comedy
Success Success

All Movies
United States
India Rest of World
Table
Table
Table
Star No Star
NewTable Failure
Sci-Fi Comedy
Success Success
Star No Star
NewTable
Success
Sci-Fi
Comedy
Success Success
Comedy from the India, with a big star…

Day Outlook Temp Humidity Wind speed Play ball
1 Rainy Hot High False No
2 Rainy Hot High True No
3 Overcast Hot High False Yes
4 Sunny Mild High False Yes
5 Sunny Cool Normal False Yes
6 Sunny Cool Normal True No
7 Overcast Cool Normal True Yes
8 Rainy Mild High False No
9 Rainy Cool Normal False Yes
10 Sunny Mild Normal False Yes
11 Rainy Mild Normal Strong Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Sunny Mild High True No
SolutionTraining data set for Play Golf

Step 1: Calculate entropy of the target.
Entropy(Play Golf) = Entropy( Nos_of_No , Nos_of_Yes )
= Entropy(5,9)
= Entropy(0.36 , 0.64)
= – (0.36) Log2 (0.36) – (0.64) Log2 (0.64)
= 0.94
Entropy(Play Golf) = 0.94

Step 2: The dataset is then split on the different attributes. The entropy for each branch
is calculated. Then it is added proportionally, to get total entropy for the split. The
resulting entropy is subtracted from the entropy before the split. The result is the
Information Gain, or decrease in entropy.
Play Golf
Yes No
Outlook
Sunny 3 2
Overcast 4 0
Rainy 2 3
Gain=0.247
Play Golf
Yes No
Temp.
Hot 2 2
Mild 4 2
Cool 3 1
Gain=0.029
Play Golf
Yes No
Humidity
High 3 4
Normal 6 1
Gain=0.152
Play Golf
Yes No
Windy
False 6 2
True 3 3
Gain=0.048
Gain(T,X) = Entropy(T) -Entropy(x)
G(Play Golf , Outlook)
= Entropy(Play Golf)- Entropy(Play Golf ,
Outlook)
= 0.940-0.693
= 0.247

Step 3: Choose attribute with the largest information
gain as the decision node, divide the dataset by its
branches and repeat the same process on every
branch.
Play Golf
Yes No
Outlook
Sunny 3 2
Overcast 4 0
Rainy 2 3
Gain=0.247
Outlook
SunnyOvercastRainy Outlook Temp Humidity Windy Play Golf
Sunny Mild High False Yes
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Sunny Mild Normal False Yes
Sunny Mild High True No
Outlook Temp Humidity Windy Play Golf
Overcast Hot High False Yes
Overcast Cool Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Outlook Temp Humidity Windy Play Golf
Rainy Hot High False No
Rainy Hot High True No
Rainy Mild High True Yes
Rainy Cool Normal False Yes
Rainy Mild Normal True Yes

Step 4a: A branch with entropy of 0 is a leaf node.
Temp Humidity Windy Play Golf
Hot High False Yes
Cool Normal True Yes
Mild High True Yes
Hot Normal False Yes
Outlook
Sunny Overcast Rainy
Play=
Yes

Step 4b: A branch with entropy more than 0 needs further splitting.
Temp Humidity Windy Play Golf
Mild High False Yes
Cool Normal False Yes
Cool Normal True No
Mild Normal False Yes
Mild High True No
Outlook
Sunny Overcast Rain
Windy
Yes
TrueFalse
NoYes
Step 5: The ID3 algorithm is run recursively on the non-leaf branches,
until all data is classified.

Decision Tree to Decision Rules
A decision tree can easily be transformed to a set of rules by mapping from the root node to the leaf
nodes one by one.
Therefore the final decision tree is
Outlook
RainyOvercastSunny
HumidityWind speed
Yes
HighNormal
NoYes
False True
Yes No

• A decision tree construction process is concerned with identifying the splitting attributes an d
splitting criterion at every level o f t he tree.
• Major strengths are:
• Decision tree able to generate understandable rules.
• They are able to handle both numerical and categorical attributes.
• They provide clear indication of which fields are most important for prediction or classification
prediction or classification.
• Weaknesses are:
• The process of growing a decision tree is computationally expensive.
• AtThe process of growing a decision tree is computationally expensive. At each node, each candidate splitting
field is examined before its best split can be found.
• Some decision tree can only deal with binary-valued target classes.
Advantages and Shortcomings of Decision Tree Classifications

DECISIONTREE APPLICATIONS
• Business Management
In the past decades, many organizations had created their own databases to enhance their customer services. Decision trees are a
possible way to extract useful information from databases and they have already been employed in many applications in the domain of
business and management. In particular, decision tree modelling is widely used in customer relationship management and fraud
detection, which are presented in subsections below.
• Customer Relationship Management : A frequently used approach to manage customers’ relationships is to investigate how
individuals access online services. Such an investigation is mainly performed by collecting and analyzing individuals’ usage data and
then providing recommendations based on the extracted information.
• Fraudulent Statement Detection :Another widely used business application is the detection of Fraudulent Financial Statements
(FFS)
• Engineering :
The other important application domain that decision trees can support is engineering. In particular, decision trees are widely used in
energy consumption and fault diagnosis, which are described in subsections below.
• Energy Consumption : Energy consumption concerns how much electricity has been used by individuals. The investigation of
energy consumption becomes an important issue as it helps utility companies identify the amount of energy needed.
• Fault Diagnosis : Another widely used application in the engineering domain is the detection of faults, especially in the
identification of a faulty bearing in rotary machineries.
• Healthcare Management: As decision tree modelling can be used for making predictions, there are an increasing number of studies
that investigate to use decision trees in health-care management. 48

Conclusion
 The decision tree method is a powerful statistical tool for classification, prediction,
interpretation, and data manipulation that has several potential applications in medical
research. Using decision tree models to describe research findings has the following
advantages:
 • Simplifies complex relationships between input variables and target variables by
dividing original input variables into significant subgroups.
 • Easy to understand and interpret.
 • Non-parametric approach without distributional assumptions.
 • Easy to handle missing values without needing to resort to imputation.
 • Easy to handle heavy skewed data without needing to resort to data transformation.
 • Robust to outliers. -

Classification decision tree

More Related Content

What's hot (20)

Similar to Classification decision tree (20)

More from yazad dumasia (8)

Recently uploaded (20)

Classification decision tree