SlideShare a Scribd company logo
Classification : Decision
Tree (DT)
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2021)
Outline
 What is decision tree (DT) algorithm
 Why we need DT
 Pros and Cons of DT
 Information Theory
 Some Issues in DT
 Assignment II
02/04/25 2
Decision Tree (DT)
 Decision trees: splitting datasets one features at a time.
 The decision tree is one of the most commonly used classification technique.
 It has a decision blocks (rectangles).
 Termination block (ovals).
 The right and left arrows are called branches.
 The kNN algorithm can do a grate job of classification, but it didn’t lead to
any major insight about the data.
02/04/25 3
Decision Tree (DT)
A decision Tree
02/04/25 4
Decision Tree (DT)
 The best part of the DT (decision tree) algorithm is that humans can easily
understand the data:
 The Decision Tree algorithm:
 Takes a set of data. (training examples)
 Build a decision tree (model), and draw it.
 It can also be re-represented as sets of if-then rules to improve human readability.
 The DT does a grate job of distilling data into knowledge.
 Takes a set of unfamiliar data and extract a set of rules.
 DT is often used in expert system development.
02/04/25 5
Decision Tree (DT)
 The DT can be expressed using the following expression:
 (Outlook = Sunny Humidity =
˄ Normal) → Yes
 (Outlook =
˅ Overcast) → Yes
 (Outlook =
˅ Rain Wind =
˄ Weak) → Yes
02/04/25 6
Decision Tree (DT)
 The pros and cons of DT:
 Pros of DT:
 Computationally cheap to use,
 Easy for humans to understand the learned results,
 Missing values OK (robust to errors),
 Can deal with irrelevant features.
 Cons of DT:
 Prone to overfitting.
 Work with: Numeric values, nominal values.
02/04/25 7
Decision Tree (DT)
 Appropriate problems for DT learning:
 Instance are represented by attribute-value pairs (fixed set of attributes and
their values),
 The target function has discrete output values,
 Disjunctive descriptions may be required,
 The training data may contain errors,
 The training data may contain missing attribute values.
02/04/25 8
Decision Tree (DT)
 The mathematics that is used by DT to split the dataset is called
information theory:
 The first decision, you need to make is:
 Which feature shall be used to split the data.
 You need to try every feature and measure which split will give the
best result.
 Then split the dataset into subsets.
 The subsets will then traverse down the branches of the decision
node.
 If the data on the branch is the same, stop; else repeat the splitting.
02/04/25 9
Decision Tree (DT)
Pseudo-code for the splitting function
02/04/25 10
Decision Tree (DT)
 General approach to decision trees:
 Collect: Any method.
 Prepare: DI3 algorithm works only on nominal values, so any
continues values will need to be quantized.
 Analyze: Any method. You should visually inspect the tree after it
is build.
 Train: Construct a tree data structure. (DT)
 Test: Calculate the error rate with the learned tree.
 Use: This can be used in any supervised learning task. Often, to
better understand the data.
02/04/25 11
Decision Tree (DT)
 We would like to classify the following animals into two
classes:
Fish and not Fish
02/04/25 12
Marine animal data
Decision Tree (DT)
 Need to decide whether we should split the data based on the
first feature or the second feature:
To make more organize the unorganized data.
One way to do this is to measure the information.
Measure the information before and after the split.
 Information theory is a branch of science that is concerned with
quantifying information.
 The change in information before and after the split is known
as the information gain.
02/04/25 13
Decision Tree (DT)
 The split with the highest information gain is the best option.
The measure of information of a set is known as the Shannon
entropy or entropy.
One way to do this is to measure the information.
 The change in information before and after the split is known
as the information gain.
02/04/25 14
Decision Tree (DT)
 To calculate entropy, you need the expected value of all the information of
all possible values of our class.
 This is given by:
02/04/25 15
 Where n is the number of classes:
Decision Tree (DT)
 The higher the entropy, the more mixed up the data.
 Another common measure of disorder in a set is the Gini
impurity.
Which is the probability of choosing an item from the set and the
probability of that data item being misclassified.
Calculate the shannon entropy of a dataset.
Dataset splitting on a given feature.
Choosing the best feature to split on.
02/04/25 16
Decision Tree (DT)
 Recursively building the tree.
Start with dataset and split it based on the best attribute.
The data will traverse down the branches of the tree to another
node.
This node will then split the data again (recursively)
Stop under the following conditions: run out of attributes or if the
instances in a branch are the same class.
02/04/25 17
Decision Tree (DT)
02/04/25 18
Table 2: Example training sets
Decision Tree (DT)
02/04/25 19
Figure 3: Data path while splitting
Decision Tree (DT)
 ID3 uses the information gain measure to select among the
candidate attributes.
Start with dataset and split it based on the best attribute.
Given a collection S, containing positive and negative examples
of some target.
The entropy of S relative to this Boolean classification is:
Entropy(S) = -p1Xlog2p1+ - p2Xlog2p2
02/04/25 20
Decision Tree (DT)
 Example:
The target attribute is PlayTennis. (yes/no)
02/04/25 21
Table 3: Example training sets
Decision Tree (DT)
 Suppose S is a collection of 14 examples of some Boolean
concept, including 9 positive and 5 negative examples.
 Then the entropy of S relative to Boolean classification is:
02/04/25 22
Decision Tree (DT)
 Note that the entropy is 0 if all members of S belong to the
same class.
 For example: if all the members are positive (p+ = 1), then (p-
= 0).
Entropy (s) = (-1.log2(1)) + (-0.log2(0)) = 0
 Note the entropy is one (1) when the collection contain an
equal number of positive and negative examples.
 If the collection contain unequal number of positive and
negative the entropy is b/n 0 and 1.
02/04/25 23
Decision Tree (DT)
 Suppose S is a collection of training-example days described by
attributes Wind. (weak, strong)
 The information gain is the measure used by ID3 to select the
best attribute at each step in growing the tree.
02/04/25 24
Decision Tree (DT)
Information gain of the two attributes: Humidity and Wind.
02/04/25 25
Decision Tree (DT)
 Example:
ID3 will determines the information gain for each attribute.
(Outlook, Temperature, Humidity and Wind)
Then select the one with the highest information gain.
The information gain values for all four attributes are:
Gain (S, Outlook) = 0.246
Gain (S, Humidity) = 0.151
Gain (S, Wind) = 0.048
Gain (S, Temperature) = 0.029
Outlook provides grater information gain than the other.
02/04/25 26
Decision Tree (DT)
 Example:
According to the information gain measure, the Outlook attribute
selected as the root node.
Branches are created below the root for each of its possible
values. (Sunny, Overcast, and Rain)
02/04/25 27
Decision Tree (DT)
 The partially learned decision tree resulting from the first step of ID3.
02/04/25 28
Decision Tree (DT)
 The overcast descendant has only positive examples and
therefore becomes a leaf node with classification Yes:
 The other two nodes will be expand by selecting the attribute
with the highest information gain relative to the new subsets.
02/04/25 29
Decision Tree (DT)
 Decision Tree learning can be:
Classification tree: finite set values target variables
Regression tree: continuous values target variable
There are many specific Decision Tree algorithms:
ID3 (Iterative of ID3)
C4.5 (Successor of ID3)
CART(Classification and Regression Tree)
CHAID (Chi – squared Automatic Interaction Detector)
MARS: extends DT to handle numerical data better
02/04/25 30
Decision Tree (DT)
 Different Decision Tree algorithms use different metrics for
measuring the “best attribute” :
Information gain: used by ID3, C4.5 and C5.0
Gini impurity: used by CART
02/04/25 31
Decision Tree (DT)
 ID3 in terms of its search space and search strategy:
ID3’s hypothesis space of all decision tree is a complete space of
finite discrete-valued functions.
ID3’s maintains only a single current hypothesis as it searches
through the space of decision trees.
ID3 in its pure form perform no backtracking in its search. (post-
pruning the decision tree)
ID3 uses all training examples at each step in the search to make
statistically based decisions regarding how to refine its current
hypothesis. (much less sensitive to error)
02/04/25 32
Decision Tree (DT)
 Inductive bias in Decision Tree learning (ID3) :
Inductive bias are the set of assumption.
ID3 selects in favor of shorter trees over longer ones. (breadth
first approach)
Selects trees that place the attributes with the highest information
gain closest to the root.
02/04/25 33
Decision Tree (DT)
 Issues in Decision Tree learning:
How deeply to grow the decision tree,
Handling continuous attributes ,
Choosing an appropriate attributes selection measure,
Handling training data with missing attribute values,
Handling attributes with differing costs and,
Improve computational efficiency.
 ID3 extended to address most of these issues to C4.5.
02/04/25 34
Decision Tree (DT)
 Avoiding over fitting the Data:
 Noisy data and too small training examples are problems.
 Over fitting is a practical problem for Decision Tree and many of
the learning algorithms.
 Over fitting was found to decrease the accuracy of the learned tree
by 10-25%.
 Approach to avoid over fitting:
 Stop growing the tree, before it over fitting. (direct but less practical)
Allow the tree to over fitting, and then post-prune (the most successful
in practice)
02/04/25 35
Decision Tree (DT)
 Incorporating continuous-value attributes:
Initial definition of ID3, attributes and target value must be
discrete set of values.
The attributes tested in the decision nodes of the tree must be
discrete value :
 Create a new Boolean attribute for the continuous value or
 Multiple interval rather than just two interval.
02/04/25 36
Decision Tree (DT)
 Alternative measure for selecting attributes:
Information gain favor attributes with many values.
One alternative measure that has been used successfully is the
gain ratio.
02/04/25 37
Decision Tree (DT)
 Handling training example with missing attribute value:
Assign it with the most common value among training examples at
node n.
Assign the probability to each of the possible values of the attribute.
The 2nd
approach, is used in C4.5.
02/04/25 38
Decision Tree (DT)
 Handling attributes with different costs:
Low-cost attributes than high-cost attributes.
ID3 can be modified to take into account costs by introducing a cost
term into the attribute selection measure.
Divide the gain by the cost of the attribute.
02/04/25 39
Question & Answer
02/04/25 40
Thank You !!!
02/04/25 41
Assignment II
 Answer the given questions by considering the following set oftraining examples.
02/04/25 42
Assignment II
(a) What is the entropy of this collection of training examples with respect to the target function classification?
 (b) What is the information gain of a2 relative to these training examples?
02/04/25 43
Decision Tree (DT)
 Do some research on the following Decision Tree algorithms:
ID3 (Iterative of ID3)
C4.5 (Successor of ID3)
CART(classification and Regression Tree)
CHAID (Chi – squared Automatic Interaction Detector)
MARS: extends DT to handle numerical data better
02/04/25 44

More Related Content

Similar to Lecture -3 Classification(Decision Tree).ppt (20)

PPT
Classfication Basic.ppt
henonah
 
PPT
Classfication Basic.ppt
henonah
 
PPT
ClassificationOfMachineLearninginCSE.ppt
fizarcse
 
PPT
ClassificationOfMachineLearninginCSE.ppt
fizarcse
 
PPTX
Decision tree
Karan Deopura
 
PPTX
Decision tree
Karan Deopura
 
PPT
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
PPT
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
PPT
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
PPT
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
PPT
Classification (ML).ppt
rajasamal1999
 
PPT
Classification (ML).ppt
rajasamal1999
 
PPT
classification in data warehouse and mining
anjanasharma77573
 
PPT
classification in data warehouse and mining
anjanasharma77573
 
PPT
Cs501 classification prediction
Kamal Singh Lodhi
 
PPT
Cs501 classification prediction
Kamal Singh Lodhi
 
PPT
Chapter 8. Classification Basic Concepts.ppt
Subrata Kumer Paul
 
PPT
Chapter 8. Classification Basic Concepts.ppt
Subrata Kumer Paul
 
PPT
Unit 3classification
Kalpna Saharan
 
PPT
Unit 3classification
Kalpna Saharan
 
Classfication Basic.ppt
henonah
 
Classfication Basic.ppt
henonah
 
ClassificationOfMachineLearninginCSE.ppt
fizarcse
 
ClassificationOfMachineLearninginCSE.ppt
fizarcse
 
Decision tree
Karan Deopura
 
Decision tree
Karan Deopura
 
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Rvishnupriya2
 
Classification (ML).ppt
rajasamal1999
 
Classification (ML).ppt
rajasamal1999
 
classification in data warehouse and mining
anjanasharma77573
 
classification in data warehouse and mining
anjanasharma77573
 
Cs501 classification prediction
Kamal Singh Lodhi
 
Cs501 classification prediction
Kamal Singh Lodhi
 
Chapter 8. Classification Basic Concepts.ppt
Subrata Kumer Paul
 
Chapter 8. Classification Basic Concepts.ppt
Subrata Kumer Paul
 
Unit 3classification
Kalpna Saharan
 
Unit 3classification
Kalpna Saharan
 

More from gadisaAdamu (20)

PDF
Addis ababa of education plan.docxJOSY 10 C.pdf
gadisaAdamu
 
PDF
Addis ababa college of education plan.docxjosy 10 A.pdf
gadisaAdamu
 
PPT
Lecture -2 Classification (Machine Learning Basic and kNN).ppt
gadisaAdamu
 
PPT
Lecture -8 Classification(AdaBoost) .ppt
gadisaAdamu
 
PPT
Lecture -10 AI Reinforcement Learning.ppt
gadisaAdamu
 
PPTX
Updated Lensa Research Proposal (1).pptx
gadisaAdamu
 
PPTX
Lensa research presentation Powepoint.pptx
gadisaAdamu
 
PPTX
Lensa Habtamu Updated one Powerpoint.pptx
gadisaAdamu
 
PPTX
Updated Lensa Research Proposal (1).pptx
gadisaAdamu
 
PPTX
Lensa Updated research presentation Powerpoint.pptx
gadisaAdamu
 
PPTX
AI Chapter Two.pArtificial Intelligence Chapter One.pptxptx
gadisaAdamu
 
PPTX
Artificial Intelligence Chapter One.pptx
gadisaAdamu
 
PPTX
Introduction to Embeded System chapter 1 and 2.pptx
gadisaAdamu
 
PPT
Chapter Five Synchonization distributed Sytem.ppt
gadisaAdamu
 
PPTX
Introduction to Embeded System chapter one and 2.pptx
gadisaAdamu
 
PPT
chapter distributed System chapter 3 3.ppt
gadisaAdamu
 
PPTX
Chapter 2- distributed system Communication.pptx
gadisaAdamu
 
PPTX
Chapter 1-Introduction to distributed system.pptx
gadisaAdamu
 
PPTX
chapter AI 4 Kowledge Based Agent.pptx
gadisaAdamu
 
PPT
Articial intelligence chapter 3 problem solver.ppt
gadisaAdamu
 
Addis ababa of education plan.docxJOSY 10 C.pdf
gadisaAdamu
 
Addis ababa college of education plan.docxjosy 10 A.pdf
gadisaAdamu
 
Lecture -2 Classification (Machine Learning Basic and kNN).ppt
gadisaAdamu
 
Lecture -8 Classification(AdaBoost) .ppt
gadisaAdamu
 
Lecture -10 AI Reinforcement Learning.ppt
gadisaAdamu
 
Updated Lensa Research Proposal (1).pptx
gadisaAdamu
 
Lensa research presentation Powepoint.pptx
gadisaAdamu
 
Lensa Habtamu Updated one Powerpoint.pptx
gadisaAdamu
 
Updated Lensa Research Proposal (1).pptx
gadisaAdamu
 
Lensa Updated research presentation Powerpoint.pptx
gadisaAdamu
 
AI Chapter Two.pArtificial Intelligence Chapter One.pptxptx
gadisaAdamu
 
Artificial Intelligence Chapter One.pptx
gadisaAdamu
 
Introduction to Embeded System chapter 1 and 2.pptx
gadisaAdamu
 
Chapter Five Synchonization distributed Sytem.ppt
gadisaAdamu
 
Introduction to Embeded System chapter one and 2.pptx
gadisaAdamu
 
chapter distributed System chapter 3 3.ppt
gadisaAdamu
 
Chapter 2- distributed system Communication.pptx
gadisaAdamu
 
Chapter 1-Introduction to distributed system.pptx
gadisaAdamu
 
chapter AI 4 Kowledge Based Agent.pptx
gadisaAdamu
 
Articial intelligence chapter 3 problem solver.ppt
gadisaAdamu
 
Ad

Recently uploaded (20)

PPTX
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Introduction to Neural Networks and Perceptron Learning Algorithm.pptx
Kayalvizhi A
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
MRRS Strength and Durability of Concrete
CivilMythili
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
Thermal runway and thermal stability.pptx
godow93766
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Ad

Lecture -3 Classification(Decision Tree).ppt

  • 1. Classification : Decision Tree (DT) Adama Science and Technology University School of Electrical Engineering and Computing Department of CSE Dr. Mesfin Abebe Haile (2021)
  • 2. Outline  What is decision tree (DT) algorithm  Why we need DT  Pros and Cons of DT  Information Theory  Some Issues in DT  Assignment II 02/04/25 2
  • 3. Decision Tree (DT)  Decision trees: splitting datasets one features at a time.  The decision tree is one of the most commonly used classification technique.  It has a decision blocks (rectangles).  Termination block (ovals).  The right and left arrows are called branches.  The kNN algorithm can do a grate job of classification, but it didn’t lead to any major insight about the data. 02/04/25 3
  • 4. Decision Tree (DT) A decision Tree 02/04/25 4
  • 5. Decision Tree (DT)  The best part of the DT (decision tree) algorithm is that humans can easily understand the data:  The Decision Tree algorithm:  Takes a set of data. (training examples)  Build a decision tree (model), and draw it.  It can also be re-represented as sets of if-then rules to improve human readability.  The DT does a grate job of distilling data into knowledge.  Takes a set of unfamiliar data and extract a set of rules.  DT is often used in expert system development. 02/04/25 5
  • 6. Decision Tree (DT)  The DT can be expressed using the following expression:  (Outlook = Sunny Humidity = ˄ Normal) → Yes  (Outlook = ˅ Overcast) → Yes  (Outlook = ˅ Rain Wind = ˄ Weak) → Yes 02/04/25 6
  • 7. Decision Tree (DT)  The pros and cons of DT:  Pros of DT:  Computationally cheap to use,  Easy for humans to understand the learned results,  Missing values OK (robust to errors),  Can deal with irrelevant features.  Cons of DT:  Prone to overfitting.  Work with: Numeric values, nominal values. 02/04/25 7
  • 8. Decision Tree (DT)  Appropriate problems for DT learning:  Instance are represented by attribute-value pairs (fixed set of attributes and their values),  The target function has discrete output values,  Disjunctive descriptions may be required,  The training data may contain errors,  The training data may contain missing attribute values. 02/04/25 8
  • 9. Decision Tree (DT)  The mathematics that is used by DT to split the dataset is called information theory:  The first decision, you need to make is:  Which feature shall be used to split the data.  You need to try every feature and measure which split will give the best result.  Then split the dataset into subsets.  The subsets will then traverse down the branches of the decision node.  If the data on the branch is the same, stop; else repeat the splitting. 02/04/25 9
  • 10. Decision Tree (DT) Pseudo-code for the splitting function 02/04/25 10
  • 11. Decision Tree (DT)  General approach to decision trees:  Collect: Any method.  Prepare: DI3 algorithm works only on nominal values, so any continues values will need to be quantized.  Analyze: Any method. You should visually inspect the tree after it is build.  Train: Construct a tree data structure. (DT)  Test: Calculate the error rate with the learned tree.  Use: This can be used in any supervised learning task. Often, to better understand the data. 02/04/25 11
  • 12. Decision Tree (DT)  We would like to classify the following animals into two classes: Fish and not Fish 02/04/25 12 Marine animal data
  • 13. Decision Tree (DT)  Need to decide whether we should split the data based on the first feature or the second feature: To make more organize the unorganized data. One way to do this is to measure the information. Measure the information before and after the split.  Information theory is a branch of science that is concerned with quantifying information.  The change in information before and after the split is known as the information gain. 02/04/25 13
  • 14. Decision Tree (DT)  The split with the highest information gain is the best option. The measure of information of a set is known as the Shannon entropy or entropy. One way to do this is to measure the information.  The change in information before and after the split is known as the information gain. 02/04/25 14
  • 15. Decision Tree (DT)  To calculate entropy, you need the expected value of all the information of all possible values of our class.  This is given by: 02/04/25 15  Where n is the number of classes:
  • 16. Decision Tree (DT)  The higher the entropy, the more mixed up the data.  Another common measure of disorder in a set is the Gini impurity. Which is the probability of choosing an item from the set and the probability of that data item being misclassified. Calculate the shannon entropy of a dataset. Dataset splitting on a given feature. Choosing the best feature to split on. 02/04/25 16
  • 17. Decision Tree (DT)  Recursively building the tree. Start with dataset and split it based on the best attribute. The data will traverse down the branches of the tree to another node. This node will then split the data again (recursively) Stop under the following conditions: run out of attributes or if the instances in a branch are the same class. 02/04/25 17
  • 18. Decision Tree (DT) 02/04/25 18 Table 2: Example training sets
  • 19. Decision Tree (DT) 02/04/25 19 Figure 3: Data path while splitting
  • 20. Decision Tree (DT)  ID3 uses the information gain measure to select among the candidate attributes. Start with dataset and split it based on the best attribute. Given a collection S, containing positive and negative examples of some target. The entropy of S relative to this Boolean classification is: Entropy(S) = -p1Xlog2p1+ - p2Xlog2p2 02/04/25 20
  • 21. Decision Tree (DT)  Example: The target attribute is PlayTennis. (yes/no) 02/04/25 21 Table 3: Example training sets
  • 22. Decision Tree (DT)  Suppose S is a collection of 14 examples of some Boolean concept, including 9 positive and 5 negative examples.  Then the entropy of S relative to Boolean classification is: 02/04/25 22
  • 23. Decision Tree (DT)  Note that the entropy is 0 if all members of S belong to the same class.  For example: if all the members are positive (p+ = 1), then (p- = 0). Entropy (s) = (-1.log2(1)) + (-0.log2(0)) = 0  Note the entropy is one (1) when the collection contain an equal number of positive and negative examples.  If the collection contain unequal number of positive and negative the entropy is b/n 0 and 1. 02/04/25 23
  • 24. Decision Tree (DT)  Suppose S is a collection of training-example days described by attributes Wind. (weak, strong)  The information gain is the measure used by ID3 to select the best attribute at each step in growing the tree. 02/04/25 24
  • 25. Decision Tree (DT) Information gain of the two attributes: Humidity and Wind. 02/04/25 25
  • 26. Decision Tree (DT)  Example: ID3 will determines the information gain for each attribute. (Outlook, Temperature, Humidity and Wind) Then select the one with the highest information gain. The information gain values for all four attributes are: Gain (S, Outlook) = 0.246 Gain (S, Humidity) = 0.151 Gain (S, Wind) = 0.048 Gain (S, Temperature) = 0.029 Outlook provides grater information gain than the other. 02/04/25 26
  • 27. Decision Tree (DT)  Example: According to the information gain measure, the Outlook attribute selected as the root node. Branches are created below the root for each of its possible values. (Sunny, Overcast, and Rain) 02/04/25 27
  • 28. Decision Tree (DT)  The partially learned decision tree resulting from the first step of ID3. 02/04/25 28
  • 29. Decision Tree (DT)  The overcast descendant has only positive examples and therefore becomes a leaf node with classification Yes:  The other two nodes will be expand by selecting the attribute with the highest information gain relative to the new subsets. 02/04/25 29
  • 30. Decision Tree (DT)  Decision Tree learning can be: Classification tree: finite set values target variables Regression tree: continuous values target variable There are many specific Decision Tree algorithms: ID3 (Iterative of ID3) C4.5 (Successor of ID3) CART(Classification and Regression Tree) CHAID (Chi – squared Automatic Interaction Detector) MARS: extends DT to handle numerical data better 02/04/25 30
  • 31. Decision Tree (DT)  Different Decision Tree algorithms use different metrics for measuring the “best attribute” : Information gain: used by ID3, C4.5 and C5.0 Gini impurity: used by CART 02/04/25 31
  • 32. Decision Tree (DT)  ID3 in terms of its search space and search strategy: ID3’s hypothesis space of all decision tree is a complete space of finite discrete-valued functions. ID3’s maintains only a single current hypothesis as it searches through the space of decision trees. ID3 in its pure form perform no backtracking in its search. (post- pruning the decision tree) ID3 uses all training examples at each step in the search to make statistically based decisions regarding how to refine its current hypothesis. (much less sensitive to error) 02/04/25 32
  • 33. Decision Tree (DT)  Inductive bias in Decision Tree learning (ID3) : Inductive bias are the set of assumption. ID3 selects in favor of shorter trees over longer ones. (breadth first approach) Selects trees that place the attributes with the highest information gain closest to the root. 02/04/25 33
  • 34. Decision Tree (DT)  Issues in Decision Tree learning: How deeply to grow the decision tree, Handling continuous attributes , Choosing an appropriate attributes selection measure, Handling training data with missing attribute values, Handling attributes with differing costs and, Improve computational efficiency.  ID3 extended to address most of these issues to C4.5. 02/04/25 34
  • 35. Decision Tree (DT)  Avoiding over fitting the Data:  Noisy data and too small training examples are problems.  Over fitting is a practical problem for Decision Tree and many of the learning algorithms.  Over fitting was found to decrease the accuracy of the learned tree by 10-25%.  Approach to avoid over fitting:  Stop growing the tree, before it over fitting. (direct but less practical) Allow the tree to over fitting, and then post-prune (the most successful in practice) 02/04/25 35
  • 36. Decision Tree (DT)  Incorporating continuous-value attributes: Initial definition of ID3, attributes and target value must be discrete set of values. The attributes tested in the decision nodes of the tree must be discrete value :  Create a new Boolean attribute for the continuous value or  Multiple interval rather than just two interval. 02/04/25 36
  • 37. Decision Tree (DT)  Alternative measure for selecting attributes: Information gain favor attributes with many values. One alternative measure that has been used successfully is the gain ratio. 02/04/25 37
  • 38. Decision Tree (DT)  Handling training example with missing attribute value: Assign it with the most common value among training examples at node n. Assign the probability to each of the possible values of the attribute. The 2nd approach, is used in C4.5. 02/04/25 38
  • 39. Decision Tree (DT)  Handling attributes with different costs: Low-cost attributes than high-cost attributes. ID3 can be modified to take into account costs by introducing a cost term into the attribute selection measure. Divide the gain by the cost of the attribute. 02/04/25 39
  • 42. Assignment II  Answer the given questions by considering the following set oftraining examples. 02/04/25 42
  • 43. Assignment II (a) What is the entropy of this collection of training examples with respect to the target function classification?  (b) What is the information gain of a2 relative to these training examples? 02/04/25 43
  • 44. Decision Tree (DT)  Do some research on the following Decision Tree algorithms: ID3 (Iterative of ID3) C4.5 (Successor of ID3) CART(classification and Regression Tree) CHAID (Chi – squared Automatic Interaction Detector) MARS: extends DT to handle numerical data better 02/04/25 44