NBA play by play data and RAPM in R, plus new career RAPM

Home for all your discussion of basketball statistical analysis.
Post Reply
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

NBA play by play data and RAPM in R, plus new career RAPM

Post by DSMok1 »

I recently became aware of several additional guides to working with NBA play-by-play data in R.

Ramiro Bentes put together this guide: https://nbainrstats.netlify.app/post/ad ... play-data/

It's a whole sequence of posts with a lot of good R code.

Building off this, Ahmed Cheema built a couple of versions of RAPM using R:
https://www.thespax.com/nba/calculating ... asketball/
https://www.thespax.com/nba/quantifying ... ince-1997/
I note that the RAPM values found there appear to be more compressed than I would expect.

Also, Jerry Engelmann recently posted some new career RAPM data on Twitter. Career RAPM data is problematic because aging effects cause weird issues.
https://twitter.com/JerryEngelmann/stat ... 3153179776
And subsequently:
https://twitter.com/JerryEngelmann/stat ... 9741604261
First one doesn't use aging curves at all, which will cause weird issues when players are at the beginning or end of their careers and the regression still thinks they must be Superstars because they were at their peak.

The second one should be better but it also will have issues with players that don't age in a standard curve.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: NBA play by play data and RAPM in R, plus new career RAPM

Post by J.E. »

Death, taxes, and people getting weird (wrong?) results when running RAPM in R
My first guess was that it's just missing an rubber-band effect adjustment. But they're so compressed, it doesn't seem like that would fix the whole issue
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: NBA play by play data and RAPM in R, plus new career RAPM

Post by DSMok1 »

J.E. wrote: Thu Feb 08, 2024 11:03 am Death, taxes, and people getting weird (wrong?) results when running RAPM in R
My first guess was that it's just missing an rubber-band effect adjustment. But they're so compressed, it doesn't seem like that would fix the whole issue
Could it just be a wrong lambda being selected? He indicated that he was using a prior similar to the one I created that you used.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
nbacouchside
Posts: 151
Joined: Sun Jul 14, 2013 4:58 am
Contact:

Re: NBA play by play data and RAPM in R, plus new career RAPM

Post by nbacouchside »

J.E. wrote: Thu Feb 08, 2024 11:03 am Death, taxes, and people getting weird (wrong?) results when running RAPM in R
My first guess was that it's just missing an rubber-band effect adjustment. But they're so compressed, it doesn't seem like that would fix the whole issue
It looks like the lineups tutorial is in R but that the actual lineups were pulled using Python and the RAPM calculation was done in Python. So can't blame R this time!

From the spax article:
I began this project by scraping play-by-play data for every regular season and postseason game since 1997. Then I used the ideas in this tutorial (applying it to Python) to get lineup data for each possession in the play-by-play. I was successfully able to do this for every dataset except for the 1997 regular season, which contained a lot of missing information. The data used in final RAPM calculations is almost entirely complete from the 1997 postseason to Game 6 of the 2021 Finals.

In order to address the greater importance of the postseason, I doubled playoff possessions to increase their weight in calculations. At the end of the data collection process, I had compiled 859,049 stints across 5,972,736 possessions.
(emphasis mine)
Post Reply