NBA Longevity Classification

Posted Jun 15, 2021 Updated Feb 3, 2024

By Aryan Jain

2 min read

The National Basketball Association (NBA) is a professional basketball league in North America. The league is composed of 30 teams (29 in the United States and 1 in Canada) and is one of the four major professional sports leagues in the United States and Canada. It is the premier men’s professional basketball league in the world.

Problem Statement

Career longevity is dependent on various factors for any players in all the games and so for NBA Rookies. The factors like games played, count of games played, and other statistics of the player during the game.

Objective

Using machine learning techniques determine if a player’s career will flourish or not.

Code and Resources Used

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, xgboost, lightgbm
Data Source: Data World Data Link: https://data.world/exercises/logistic-regression-exercise-1

Data Dictionary

The values for given attributes are average over all the games played by players

GP: Games Played

MIN: Minutes Played

PTS: Number of points per game

FGM: Field goals made

FGA: Field goals attempt

FG%: field goals percent

3P Made: 3 point made

3PA: 3 points attempt

3P%: 3 point percent

FTM: Free throw made

FTA: Free throw attempts

FT%: Free throw percent

OREB: Offensive rebounds

DREB: Defensive rebounds

REB: Rebounds

AST: Assists

STL: Steals

BLK: Blocks

TOV: Turnovers

Target: 0 if career years played < 5, 1 if career years played >= 5

EDA

I looked at the distributions of the data. Below are a few highlights.

Dataset Shape: (1101, 20)

	Name	dtypes	Uniques
0	GP	float64	274
1	MIN	float64	514
2	PTS	float64	392
3	FGM	float64	289
4	FGA	float64	366
5	FG%	float64	480
6	3P Made	float64	132
7	3PA	float64	216
8	3P%	float64	349
9	FTM	float64	258
10	FTA	float64	285
11	FT%	float64	554
12	OREB	float64	243
13	DREB	float64	281
14	REB	float64	306
15	AST	float64	279
16	STL	float64	219
17	BLK	float64	201
18	TOV	float64	241
19	Target	int64	2

Data Pre-Processing

Implemented the Quantile Transformer.

Model Building

Split the data into train and tests sets with a test size of 20%.

I tried three different models and evaluated them using Accuracy.

Models:

Logistic Regression
XGBoost Classifier
Light Gradient Boosting Classifier

Model performance

The Light Gradient Boosting model outperformed the other approaches on the test and validation set.

	Model	Accuracy
0	LGBM Classifier	74.66%
1	XGBoost Classifier	71.04%
2	Logistic Regression	70.58%

Projects, Machine Learning, Classification

This post is licensed under CC BY 4.0 by the author.