NBA play by play data and RAPM in R, plus new career RAPM
Posted: Wed Feb 07, 2024 11:59 am
I recently became aware of several additional guides to working with NBA play-by-play data in R.
Ramiro Bentes put together this guide: https://nbainrstats.netlify.app/post/ad ... play-data/
It's a whole sequence of posts with a lot of good R code.
Building off this, Ahmed Cheema built a couple of versions of RAPM using R:
https://www.thespax.com/nba/calculating ... asketball/
https://www.thespax.com/nba/quantifying ... ince-1997/
I note that the RAPM values found there appear to be more compressed than I would expect.
Also, Jerry Engelmann recently posted some new career RAPM data on Twitter. Career RAPM data is problematic because aging effects cause weird issues.
https://twitter.com/JerryEngelmann/stat ... 3153179776
And subsequently:
https://twitter.com/JerryEngelmann/stat ... 9741604261
First one doesn't use aging curves at all, which will cause weird issues when players are at the beginning or end of their careers and the regression still thinks they must be Superstars because they were at their peak.
The second one should be better but it also will have issues with players that don't age in a standard curve.
Ramiro Bentes put together this guide: https://nbainrstats.netlify.app/post/ad ... play-data/
It's a whole sequence of posts with a lot of good R code.
Building off this, Ahmed Cheema built a couple of versions of RAPM using R:
https://www.thespax.com/nba/calculating ... asketball/
https://www.thespax.com/nba/quantifying ... ince-1997/
I note that the RAPM values found there appear to be more compressed than I would expect.
Also, Jerry Engelmann recently posted some new career RAPM data on Twitter. Career RAPM data is problematic because aging effects cause weird issues.
https://twitter.com/JerryEngelmann/stat ... 3153179776
And subsequently:
https://twitter.com/JerryEngelmann/stat ... 9741604261
First one doesn't use aging curves at all, which will cause weird issues when players are at the beginning or end of their careers and the regression still thinks they must be Superstars because they were at their peak.
The second one should be better but it also will have issues with players that don't age in a standard curve.