Classification: Random Forests

Classification: Random Forests 2020-11-30T12:39:35+00:00

Product Description

Random Forests (RF) is one of the most popular machine learning classifiers due to its capability to obtain good classification results with a relatively reduced number of training samples and because it relies on a reduced number of user-defined parameters.

In addition, RF can easily address the Hughes phenomenon due to the two randomness dimensions: first, it randomly selects samples to create a user-defined number of decision trees and, second, a randomly user-defined number of variables are used for splitting the decision tree nodes.

In this lecture, we will be introducing the concept of ensemble classifier by emphasizing the main methods used to build an ensemble: bootstrapping and bagging. We will explain how RF classifier is built and how the final decision is taken. The concept of out-of-the-bag will be explained in detail especially in connection with its role for internal evaluation of the RF accuracy. We will also explain how RF calculates the variable importance, a measure that can be used to discard the less relevant input variables.

In the end, special attention is given to the proximity measure used to identify either outliers or subclasses in the training samples.

Learning outcomes

  • Define the concept of ‘ensemble classifier’.

  • Explain the concepts of bagging and bootstrapping.

  • Describe how variable importance is calculated in Random Forest classifier.

  • Explain the role of proximity measure for removing outliers from the training sample set.

  • List the main advantages and disadvantages of the RF classifier.

BoK concepts

Links to concepts from the EO4GEO Body of Knowledge used in this course:

Material preview

Ownership

Designed and developed by: Mariana Belgiu, University of Twente.

Contributor: Anurag Kulshrestha.

License: Creative Commons Attribution-ShareAlike.

Education level

Language

English

Access

Find below a direct link to the HTML presentation.

access the presentation

Find below a link to the GitHub repository where you can download the presentation.

download the presentation