Post

NBA Longevity Classification

NBA Longevity Classification

Open in Github Page

The National Basketball Association (NBA) is a professional basketball league in North America. The league is composed of 30 teams (29 in the United States and 1 in Canada) and is one of the four major professional sports leagues in the United States and Canada. It is the premier men’s professional basketball league in the world.

Problem Statement

Career longevity is dependent on various factors for any players in all the games and so for NBA Rookies. The factors like games played, count of games played, and other statistics of the player during the game.

Objective

Using machine learning techniques determine if a player’s career will flourish or not.

Code and Resources Used

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, xgboost, lightgbm
Data Source: Data World Data Link: https://data.world/exercises/logistic-regression-exercise-1

Data Dictionary

The values for given attributes are average over all the games played by players

GP: Games Played

MIN: Minutes Played

PTS: Number of points per game

FGM: Field goals made

FGA: Field goals attempt

FG%: field goals percent

3P Made: 3 point made

3PA: 3 points attempt

3P%: 3 point percent

FTM: Free throw made

FTA: Free throw attempts

FT%: Free throw percent

OREB: Offensive rebounds

DREB: Defensive rebounds

REB: Rebounds

AST: Assists

STL: Steals

BLK: Blocks

TOV: Turnovers

Target: 0 if career years played < 5, 1 if career years played >= 5

EDA

I looked at the distributions of the data. Below are a few highlights.

Dataset Shape: (1101, 20)

 NamedtypesMissingUniques
0GPfloat640274
1MINfloat640514
2PTSfloat640392
3FGMfloat640289
4FGAfloat640366
5FG%float640480
63P Madefloat640132
73PAfloat640216
83P%float640349
9FTMfloat640258
10FTAfloat640285
11FT%float640554
12OREBfloat640243
13DREBfloat640281
14REBfloat640306
15ASTfloat640279
16STLfloat640219
17BLKfloat640201
18TOVfloat640241
19Targetint6402

alt text alt text correlation_matrix

Data Pre-Processing

  1. Implemented the Quantile Transformer.

Model Building

Split the data into train and tests sets with a test size of 20%.

I tried three different models and evaluated them using Accuracy.

Models:

  • Logistic Regression
  • XGBoost Classifier
  • Light Gradient Boosting Classifier

Model performance

The Light Gradient Boosting model outperformed the other approaches on the test and validation set.

 ModelAccuracy
0LGBM Classifier74.66%
1XGBoost Classifier71.04%
2Logistic Regression70.58%
This post is licensed under CC BY 4.0 by the author.