Decision
Tree: Decision
trees are widely used for classification and regression tasks. The basic intuition behind a Decision tree is
to map out all possible decision paths in the form of a tree. It is made by
splitting training data set into distinct nodes. One node contains one category
of data or all of the data. It is about testing an attribute and branching the
cases based on the results of the test. Each internal node corresponding to a test and
each branch corresponds to a result. Each leaf node assigns a classification.
A Decision Tree is a structure
that allows us to split the dataset into branches and then make simple
decisions at each level. This will allow us to arrive at the final decision by walking
down the tree. Decision Trees are produced by training algorithms, which
identify how we can split the data in the best possible way. Any decision
process starts at the root node at the top of the tree. Each node in the tree
is basically a decision rule. Algorithms construct these rules based on the
relationship between the input data and the target labels in the training data.
The values in the input data are utilized to estimate the value of the output.
Now that we
understand the basic concept of Decision Trees, the next thing is to understand how
the trees are automatically constructed. We need algorithms that can construct
the optimal tree based on our data. In order to understand it, we need to
understand the concept of Entropy.
In this context, Entropy refers to information entropy and not thermodynamic
entropy. Entropy is basically a measure of uncertainty. One of the main goals
of a decision tree is to reduce uncertainty as we move from the root node
towards the leaf nodes. When we see an unknown data point, we are completely
uncertain about the output. By the time we reach the leaf node, we are certain
about the output. This means that we need to construct the decision tree in a
way that will reduce the uncertainty at each level. This implies that we need
to reduce the entropy as we progress down the tree.
Entropy is defined by
the formula,
Entropy = - Σ p(Ai)
log p(Ai) for every value of
i.
No comments:
Post a Comment
If you have any doubt, let me know