Pages

Ensembles of Decision Trees: Random Forests

As the main drawback of decision trees is that they tend to overfit the training data. Random forests are one way to address this problem. Random forests are essentially a collection of decision trees, where each tree is slightly different from the others. The idea of random forests is that each tree might do a relatively good job of predicting, but will likely overfit on part of the data. If we build many trees, all of which work well and overfit in different ways, we can reduce the amount of overfitting by averaging their results. To implement this strategy, we need to build many decision trees. Each tree should do an acceptable job of predicting the target, and should also be different from the other trees. Random forests get their name from injecting randomness into the tree building to ensure each tree is different. There are two ways in which the trees in a random forest are randomized: by selecting the data points used to build a tree and by selecting the features in each split test.


No comments:

Post a Comment

If you have any doubt, let me know

Email Subscription

Enter your email address:

Delivered by FeedBurner

INSTAGRAM FEED