Using ML Clustering Algorithms
To Classify NBA Player Positions

December 2019

Introduction

With more and more big men extending their shooting range, coupled with the small-ball revolution, the NBA’s traditional five positions are arguably becoming more and more obselete. Players well over the average point guard height of 6’3″ such as Giannis Antetokounmpo and even LeBron James are defying old-school conventions by playing point guard/forward, and my goal was to let the machines decide whether or not we still need positional delineations.

Machine learning is a sub-discipline of artificial intelligence and computer science that deals with training models to make predictions based off input data. Clustering is an ML classification method that divides data into smaller groups, or clusters.

DBSCAN Clustering

The first algorithm, which stands for density-based spatial clustering of applications with noise, classifies players based on the density of their various statistics. In a 2D graph, this corresponds to forming classes based data points that are close in proximity to each other. For the NBA dataset, the model classified players into 10 clusters based on points, assists, offensive/defensive rebounds, steals, blocks, field goal attempts, field goal percentage, three-point attempts, three-point percentage, free throw attempts, and free throw percentage. The 3D graph to the right shows the different clusters, differentiated by color.

Final Positional Clusters (DBSCAN Model)

K-Means Clustering

This algorithm requires a certain number of clusters as an input – thus the “k” in “k-means” – and I used 10 to maintain consistency with the prior DBSCAN model. The 3D graph to the left shows generally similar clusters to the DBSCAN algorithm. (Arbitrarily determining how many positions each player could be classified into might’ve altered the final results, but this can be ignored given the scope of this project.)

Final Positional Clusters (K-Means Model)

Hierarchical Clustering

Similar to K-Means, this algorithm takes in a certain number of clusters, and I used 10 clusters again for the same reason mentioned above. As alluded to in its name, this model creates hierarchies between data points and gradually classifies them into clusters. Each of the 10 clusters are displayed to the right.

Final Positional Clusters (Hierarchical Model)

Conclusion

For the most part, all three models pass the eyeball test and yield reasonable player classifications, but closer examination raises some questions. The DBSCAN model effectively identifies superstars and 3-point shooters, but bench players are somewhat scattered across over half a dozen clusters. The k-means model seem to make the distinction between starters and bench players, yet there are six role player categories that don’t have any substantial or apparent differences between them. Lastly, the hierarchical model may be the most accurate, roughly separating players into the five traditional positions as well as including more specific clusters such as rebounding machines; it should also be noted that having two respective classifications for Kevin Love and Scott Machado alone is slightly confusing.

Natural forwards like Nikola Jokić are beginning to play like point guards and tiptoe the line between guards and forwards, and guards like Russell Westbrook and James Harden are surely confusing models by filling the stat sheet in unfamiliar ways for their position. As a result, there are clear disadvantages to each model used above; for example, in the k-means model, prescribing the amount of clusters may force the algorithm to classify two drastically different players as similar to best fit all the other clusters. Regardless, the high degree of variation certainly demonstrates that in the modern NBA, positions simply matter less.