Post

Forest Cover Type Classifier

Forest Cover Type Classifier

Open in Github Page

Forest cover in general refers to the relative (in percent) or sure (in square kilometres/square miles) land area that is covered by forests. According to the Food and Agriculture Organization, a forest is defined as land spanning more than 0.5 hectares with trees higher than 5 meters and a canopy cover of more than 10 percent, or trees able to reach these thresholds in situ. It does not include land that is predominantly under agricultural or urban land use. Forest cover is one category of terrestrial land cover. Land cover is the observed physical features, both natural and manmade, that occupy the earth’s immediate surface … forest cover is defined as 25% or greater canopy closure at the Landsat pixel scale (30-m × 30-m spatial resolution) for trees >5 m in height.

Objective

Build a machine learning model to predict the forest cover type.

Code and Resources Used

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, xgboost, lightgbm
Data Source: Blackard,Jock. (1998). Covertype. UCI Machine Learning Repository.
Data Link: https://doi.org/10.24432/C50K5N

Data Dictionary

Elevation: Elevation in meters

Aspect: Aspect in degrees azimuth

Slope: Slope in degrees

Horizontal_Distance_To_Hydrology: Horz Dist to nearest surface water features

Vertical_Distance_To_Hydrology: Vert Dist to nearest surface water features

Horizontal_Distance_To_Roadways: Horz Dist to nearest roadway

Hillshade_9am: Hillshade index at 9am, summer solstice

Hillshade_Noon: Hillshade index at noon, summer soltice

Hillshade_3pm: Hillshade index at 3pm, summer solstice

Horizontal_Distance_To_Fire_Points: Horz Dist to nearest wildfire ignition points

Wilderness_Area (4 binary columns): Wilderness area designation

Soil_Type (40 binary columns): Soil Type designation

Cover_Type (7 types): Forest Cover Type designation

Model Building

Split the data into train and tests sets with a test size of 20%.

I tried five different models and evaluated them using Accuracy.

Models:

  • K Nearest Neigbour Classifier (K = 5)
  • Decision Tree Classifier
  • Gradient Boosting Classifier
  • Random Forest Classifier
  • XGBoost Classifier

Model performance

The Random Forest model outperformed the other approaches on the test and validation set.

 ModelAccuracy
0Random Forest Classifier96.20%
1K Nearest Neigbour Classifier (K = 5)95.11%
2Decision Tree Classifier93.33%
3XGBoost Classifier92.46%
4Gradient Boosting Classifier81.64%
This post is licensed under CC BY 4.0 by the author.