Forest Cover Type Classifier

Posted Jul 21, 2021 Updated Feb 3, 2024

By Aryan Jain

1 min read

Forest cover in general refers to the relative (in percent) or sure (in square kilometres/square miles) land area that is covered by forests. According to the Food and Agriculture Organization, a forest is defined as land spanning more than 0.5 hectares with trees higher than 5 meters and a canopy cover of more than 10 percent, or trees able to reach these thresholds in situ. It does not include land that is predominantly under agricultural or urban land use. Forest cover is one category of terrestrial land cover. Land cover is the observed physical features, both natural and manmade, that occupy the earth’s immediate surface … forest cover is defined as 25% or greater canopy closure at the Landsat pixel scale (30-m × 30-m spatial resolution) for trees >5 m in height.

Objective

Build a machine learning model to predict the forest cover type.

Code and Resources Used

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, xgboost, lightgbm
Data Source: Blackard,Jock. (1998). Covertype. UCI Machine Learning Repository.
Data Link: https://doi.org/10.24432/C50K5N

Data Dictionary

Elevation: Elevation in meters

Aspect: Aspect in degrees azimuth

Slope: Slope in degrees

Horizontal_Distance_To_Hydrology: Horz Dist to nearest surface water features

Vertical_Distance_To_Hydrology: Vert Dist to nearest surface water features

Horizontal_Distance_To_Roadways: Horz Dist to nearest roadway

Hillshade_9am: Hillshade index at 9am, summer solstice

Hillshade_Noon: Hillshade index at noon, summer soltice

Hillshade_3pm: Hillshade index at 3pm, summer solstice

Horizontal_Distance_To_Fire_Points: Horz Dist to nearest wildfire ignition points

Wilderness_Area (4 binary columns): Wilderness area designation

Soil_Type (40 binary columns): Soil Type designation

Cover_Type (7 types): Forest Cover Type designation

Model Building

Split the data into train and tests sets with a test size of 20%.

I tried five different models and evaluated them using Accuracy.

Models:

K Nearest Neigbour Classifier (K = 5)
Decision Tree Classifier
Gradient Boosting Classifier
Random Forest Classifier
XGBoost Classifier

Model performance

The Random Forest model outperformed the other approaches on the test and validation set.

	Model	Accuracy
0	Random Forest Classifier	96.20%
1	K Nearest Neigbour Classifier (K = 5)	95.11%
2	Decision Tree Classifier	93.33%
3	XGBoost Classifier	92.46%
4	Gradient Boosting Classifier	81.64%

Projects, Machine Learning, Classification

This post is licensed under CC BY 4.0 by the author.