Random Forests (RF) is one of the most popular machine learning classifiers due to its capability to obtain good classification results with a relatively reduced number of training samples and because it relies on a reduced number of user-defined parameters.
In addition, RF can easily address the Hughes phenomenon due to the two randomness dimensions: first, it randomly selects samples to create a user-defined number of decision trees and, second, a randomly user-defined number of variables are used for splitting the decision tree nodes.
In this lecture, we will be introducing the concept of ensemble classifier by emphasizing the main methods used to build an ensemble: bootstrapping and bagging. We will explain how RF classifier is built and how the final decision is taken. The concept of out-of-the-bag will be explained in detail especially in connection with its role for internal evaluation of the RF accuracy. We will also explain how RF calculates the variable importance, a measure that can be used to discard the less relevant input variables.
In the end, special attention is given to the proximity measure used to identify either outliers or subclasses in the training samples.