Identification trees are used for applications ranging from medical diagnoses to project management. Like Nearest Neighbor algorithm, ID trees are also coming from the group of supervised learning algorithms.
Identification tree is a representation
Being a decision tree, an ID tree has a root node from which it sorts down data or instances. This process is a sort of classification (i.e. thus each leaf node represents a class). At the same time each node in the tree specifies a test of some attribute of the instance, and its branches correspond to possible values for this attribute.
From words to action…
Enough of definitions, lets move on with a practical example. Imagine we want to identify whether a vehicle is electric or regular based on some properties it has like speed, size, price and battery type.
We’ve got a table of observations in front of us. Our goal is to train a model to make robust predictions. We can do this by looking at close features and those that really matter.
Step 1: Build “tests” for each property of the vehicle and select the root node
Once our tests for each property are built, we can select a root node by looking at tests that give us the highest number of subsets with homogeneous instances.
Step 2: Extend the heterogeneous leaf of the root node
Our test on “Battery type” gives the best outcome, thus we select it as a root node. Then we extend its heterogeneous Li-ion branch, conducting tests as we did before with the following restrictions:
- Tests only for the remaining properties (Price, Size, Speed)
- Include only remaining rows of data for the test (i.e. 1,3,5,8)
Once we move on, the price test does the best with splitting its subsets into homogeneous sets. And also it has no leaf with instances that differ. Therefore we find the best model predicting whether the vehicle is electric or not:
- Look at the vehicles “Battery type” and ensure that it is “Li-ion”
- The “price” of car must be high
How to run the identification tree algorithm in a better way?
In real world it is unlikely to find any homogeneous set while building our decision tree, therefore there must be better way to select which attribute to test at each node in the tree.
The total in-homogeneity, or “disorder”, in the subsets of each test can be calculated through the formula, borrowed from information theory:
Lets repeat the building of our prediction model using a new formula. See the illustration below:
As you may have noticed, we came to the same solution as before. However, this approach works for the imperfect examples too, unlike the first approach.
Identification trees are robust to irrelevant attributes (i.e. noise) and are fast at prediction. In the internet you may find many researches and articles about decisions trees and their real world applications.
Suggested literature and useful sources:
1. “Learning: Identification Trees, Disorder”, MIT Open Course Ware