Forest Cover Type Classifier
Forest cover in general refers to the relative (in percent) or sure (in square kilometres/square miles) land area that is covered by forests. According to the Food and Agriculture Organization, a forest is defined as land spanning more than 0.5 hectares with trees higher than 5 meters and a canopy cover of more than 10 percent, or trees able to reach these thresholds in situ. It does not include land that is predominantly under agricultural or urban land use. Forest cover is one category of terrestrial land cover. Land cover is the observed physical features, both natural and manmade, that occupy the earth’s immediate surface … forest cover is defined as 25% or greater canopy closure at the Landsat pixel scale (30-m × 30-m spatial resolution) for trees >5 m in height.
Objective
Build a machine learning model to predict the forest cover type.
Code and Resources Used
Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, xgboost, lightgbm
Data Source: Blackard,Jock. (1998). Covertype. UCI Machine Learning Repository.
Data Link: https://doi.org/10.24432/C50K5N
Data Dictionary
Elevation: Elevation in meters
Aspect: Aspect in degrees azimuth
Slope: Slope in degrees
Horizontal_Distance_To_Hydrology: Horz Dist to nearest surface water features
Vertical_Distance_To_Hydrology: Vert Dist to nearest surface water features
Horizontal_Distance_To_Roadways: Horz Dist to nearest roadway
Hillshade_9am: Hillshade index at 9am, summer solstice
Hillshade_Noon: Hillshade index at noon, summer soltice
Hillshade_3pm: Hillshade index at 3pm, summer solstice
Horizontal_Distance_To_Fire_Points: Horz Dist to nearest wildfire ignition points
Wilderness_Area (4 binary columns): Wilderness area designation
Soil_Type (40 binary columns): Soil Type designation
Cover_Type (7 types): Forest Cover Type designation
Model Building
Split the data into train and tests sets with a test size of 20%.
I tried five different models and evaluated them using Accuracy.
Models:
- K Nearest Neigbour Classifier (K = 5)
- Decision Tree Classifier
- Gradient Boosting Classifier
- Random Forest Classifier
- XGBoost Classifier
Model performance
The Random Forest model outperformed the other approaches on the test and validation set.
Model | Accuracy | |
---|---|---|
0 | Random Forest Classifier | 96.20% |
1 | K Nearest Neigbour Classifier (K = 5) | 95.11% |
2 | Decision Tree Classifier | 93.33% |
3 | XGBoost Classifier | 92.46% |
4 | Gradient Boosting Classifier | 81.64% |