What is ID3 in a decision tree
This article is part 2 of 4 in the series of articles, Machine Learning with Decision Tree Techniques.
Engineers are very familiar with decision trees in order to break down products hierarchically and to create procedural instructions. The data scientists also want to create procedural instructions, but automatically from the data. Applied in this way, decision trees are a form of machine learning: the machine itself should find a way to assign an object to a class.
The ID3 algorithm
Understanding the ID3 algorithm is worthwhile because it is the basis for many other algorithms based on it. With its iterative and recursive approach, it is also quite easy to understand, but its effect should not be underestimated. The procedure can be broken down into three essential steps, whereby the first step unfolds the actual effect (with all advantages and disadvantages):
- Step: Selecting the attribute with the highest information gain
Look at all the attributes (features) of the data set and determine which attribute best classifies the data.
- Step: Create a node with the attribute
If the results under this node are unambiguous (1 unique value), save it in this node and jump back.
- Step: Recursive continuation of this process
Otherwise, break the data down into n subsets for each attribute and repeat these steps for each of the subsets.
Information gain - and how to calculate it
The information gain of an attribute () in the sense of the ID3 algorithm is the difference from the entropy () (see Part 1 of the series of articles: Entropy, a measure of impurity in data) of the entire data set () and the sum of the weighted entropies of the attribute for each individual value (Value ), which occurs in the attribute:
How the calculation of the information gain works will be shown in Part 3 of this series of articles (to be published shortly).
The advantages of the ID3 algorithm - and the disadvantages
The algorithm is the basis for many other algorithms. In its simplicity it has certain advantages - which probably make it the most popular decision tree algorithm - but it also has a number of disadvantages that should be considered.
Pay attention to and avoid overfitting
Decision trees generated from data tend to be overfitted. This means that the trees can adapt to the training data to the extent that they fit perfectly, but have no or only an inadequate generalizing description. New data, which can have a higher diversity than the training data, are then no longer correctly classified under a reasonable error rate.
Beware of key columns!
Some attributes force an over-adjustment: For example, if an attribute such as "customer ID" (unique number per customer) is included, we can expect an entropy of 0 for each individual value in the attribute, because everyone ID describes a unique case (customer, customer group, etc.). It follows that the information gain for this attribute is maximal. Here the tree would get an enormous width, which would not be helpful, because each value (IDs) would get a single branch in the tree, which leads to a unique result. The tree cannot be used for new data (new customer numbers) because it no longer represents a general description, but is only an image of the training data.
Prunning - shorten the tree afterwards
Particularly large trees are not good trees and are a sign of over-adaptation. One way to reduce the size is to recalculate the information gains and shorten branches (generalization) if the information gain is too low. Often it is not the entropy or the Gini coefficient but the classification error that is used as a measure of the impurity.
Random forests as an overfitting panacea
Random Forests (a form of ensemble learning) is a collective decision on class membership based on several decision trees. This kind of “democratic” machine learning is also possible Ensemble learning called. If several decision trees with different structures are used for common classification, the effect of overfitting individual trees is usually reduced.
Benjamin Aunkofer is lead data scientist at DATANOMIQ and university lecturer for data science and data strategy. In addition, he works as Interim Head of Business Intelligence and gives seminars / workshops on BI, data science and machine learning for companies.
- When did Japan become a state?
- Is biotechnology better than an AG BSC
- What is the best school for sports physiotherapy
- What makes a firearm in New Zealand
- Did someone die on a magic show?
- Tom Cruise is drunk
- Can Austria be taken into account in the Balkans?
- What is something that lasts forever
- What's the deal with non-binary
- What fruits and vegetables are keto-friendly
- Hire PwC Ethical Hackers
- Why is Jerusalem a Jewish city
- Bronze wrapped guitar strings are good
- Where is the Aston Villa
- Can ignite car at ola
- Where can I get Puma shoes
- Why was Judaism founded?
- What is The Pixar Theory
- Is Rama a Brahmin
- What is haemozoin
- Why was Jaws so successful
- Why should you choose electronics and communication technology
- Has anyone tried the Dettol disinfectant spray?
- How do i unlock ZTE Z971