Classification: Decision Trees

D ecision Trees (DT) is a non-parametric classifier that gained popularity in different domains because its structure is explicit and easily interpretable.

A DT is built by recursively splitting each tree node using a statistical procedure such as Gini impurity measure, information gain (for classification scenarios), or variance (for prediction problems).

In this lecture, we will be introducing the main steps required for building a DT. Two procedures used to select the best variables for splitting the tree nodes will be discussed by making use of a practical example. These procedures include Gini impurity and information gain. The overfitting and underfitting concepts will be explained in the second part of the lecture when we will introduce two solutions to build the optimal DT, i.e. a DT that does not overfit: (1) stop growing the DT early before overfitting and (2)pruning or reducing the size of the tree.

The lecture ends up by listing the main advantages and disadvantages of DT.

Learning outcomes

Explain how information gain and Gini impurity is calculated.
Present the main advantages and disadvantages of decision trees classifier.
Describe the concept of over-fitting and under-fitting.
Define the main solutions that can be applied to avoid decision trees over-fitting.

UT-ITC_Satellite_Data_Classification_Decision_Tree

BoK concepts

Links to concepts from the EO4GEO Body of Knowledge used in this course:

- > AM | Analytical Methods
  - > AM10 | Data mining
    - > AM10-2 | Data mining approaches
- > IP | Image processing and analysis
  - > IP3 | Image understanding
    - > IP3-4 | Image classification
      - > IP3-4-5 | Decision trees
      - > IP3-4-7 | Machine learning
      - > IP3-4-9 | Sampling strategies

Material preview

Ownership

Designed and developed by: Mariana Belgiu, University of Twente.

License: Creative Commons Attribution-ShareAlike.

Education level

EQF 6 (what is this?)

Language

English

Creation date

2020-06-20

Access

Find below a direct link to the HTML presentation.

access the presentation

Find below a link to the GitHub repository where you can download the presentation.

download the presentation

Classification: Decision Trees

EO4GEO Lecture

Product Description

Learning outcomes

BoK concepts

Material preview

Ownership

Education level

Language

Creation date

Access

Share this course.

Related Projects