Pages

Decision Tree : Theory


Decision Tree: Decision trees are widely used for classification and regression tasks.  The basic intuition behind a Decision tree is to map out all possible decision paths in the form of a tree. It is made by splitting training data set into distinct nodes. One node contains one category of data or all of the data. It is about testing an attribute and branching the cases based on the results of the test. Each internal node corresponding to a test and each branch corresponds to a result. Each leaf node assigns a classification.
   A Decision Tree is a structure that allows us to split the dataset into branches and then make simple decisions at each level. This will allow us to arrive at the final decision by walking down the tree. Decision Trees are produced by training algorithms, which identify how we can split the data in the best possible way. Any decision process starts at the root node at the top of the tree. Each node in the tree is basically a decision rule. Algorithms construct these rules based on the relationship between the input data and the target labels in the training data. The values in the input data are utilized to estimate the value of the output.
Now that we understand the basic concept of Decision Trees, the next thing is to understand how the trees are automatically constructed. We need algorithms that can construct the optimal tree based on our data. In order to understand it, we need to understand the concept of Entropy. In this context, Entropy refers to information entropy and not thermodynamic entropy. Entropy is basically a measure of uncertainty. One of the main goals of a decision tree is to reduce uncertainty as we move from the root node towards the leaf nodes. When we see an unknown data point, we are completely uncertain about the output. By the time we reach the leaf node, we are certain about the output. This means that we need to construct the decision tree in a way that will reduce the uncertainty at each level. This implies that we need to reduce the entropy as we progress down the tree.

Entropy is defined by the formula,
Entropy = - Σ p(Ai) log p(Ai)  for every value of i.

No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED