Creating an SPM

SkyJuke · Post by **SkyJuke** » Sat Jul 20, 2024 12:21 pm

Hey all,

I'm currently creating a Statistical Plus-Minus (SPM) model for personal use. I plan on building different variants of the model, starting from raw box score data (for use in the game Basketball GM) and eventually incorporating play-by-play data and other advanced metrics. Right now, I have some questions on how to improve model performance.

My current iteration uses raw box score data, and I am regressing players' average stats against their 2-year RAPM from the 1998 season to 2024 (data provided by JE). My current R² with the entire dataset is approximately 0.6. Here are my questions:

Length of RAPM: What size of RAPM (e.g., 2-year, 3-year) have you found to be the best when producing SPM models?
Averaging Stats: Is using a player's average stats across a period a promising approach for the X variables in the regression?
Cross-Validation: I'm using Sklearn's Ridge CV to produce my results. Should I leave out some samples of my data when training the model to improve performance?
Normalization: Does normalizing data every two years improve model performance?
Cutoff Threshold: Since my basic version is meant to work in a game and I'm only concerned with the best seasons, should I limit my regression to only players with a specific cutoff of RAPM?
Non-linear Variables: Reading through other models that have their methodologies online, it seems non-linear variables could improve performance. What non-linear stats or methods have you found to be most effective?
Play-by-Play Data: When I try to tackle a more advanced SPM model, where can I find publicly available play-by-play data?

If there are any other things I could improve on that I didn't mention, please let me know.

Mike G · Post by **Mike G** » Sat Jul 20, 2024 1:39 pm

Maybe work up a boxscore based plus-minus for a few years, find RAPM for those years, and get an avg discrepancy for players. Whether they are ascendant in their career or on the decline, they may consistently look better in one or the other.

For example, suppose Jrue Holiday has mediocre production but excellent RAPM. Both may be in decline, but there may be a consistent difference from year to year. Whatever his current season shows statistically, tack on his avg 'bonus' PM.

J.E. · Post by **J.E.** » Sun Jul 21, 2024 8:09 am

A lot of these questions depend on what your goal is.
Since it doesn't appear that you're trying to build an SPM that best predicts future lineup performance, not all of these questions have a one-size-fits-all answer

Averaging Stats: Is using a player's average stats across a period a promising approach for the X variables in the regression?

Generally it's better to put in a "best guess" of the player's "true ability". The easiest example is that a player who made 2/2 threes should probably not be in the SPM regression with 100% 3p%.
Can use Bayesian inference, Kalman filter, or Ridge regression (on the single stats) to get more accurate predictions

When I try to tackle a more advanced SPM model, where can I find publicly available play-by-play data?

You can get a lot of NBA PlayByPlay through https://github.com/swar/nba_api

Non-linear variables have generally be proven dangerous, at times creating junk estimates for out-of-sample data

Without knowing the use case it's hard to say in regards to normalization and cutoffs

SkyJuke · Post by **SkyJuke** » Mon Jul 22, 2024 7:48 am

Thanks for the replies.

Generally, it's better to put in a "best guess" of the player's "true ability." The most straightforward example is that a player who made 2/2 threes should probably not be in the SPM regression with 100% 3p%.
Can use Bayesian inference, Kalman filter, or Ridge regression (on the single stats) to get more accurate predictions

I haven't heard of Bayesian inference or the Kalman filter, but I'll read up on them and see if they improve performance.

Without knowing the use case, it's hard to say in regards to normalization and cutoffs

I'll give two use cases. I probably intend only to use the more straightforward box-only metric to rank the best seasons from the best players in the league.
Secondly, I would create a more general SPM, similar to LEBRON and EPM, where I produce a plus-minus evaluation for every player in the league using play-by-play data.
Does RAPM length matter(above three years)? Three years seems standard, but it still has some errors. Longer years would have problems with aging but might increase accuracy due to better RAPM results.

Since it doesn't appear that you're trying to build an SPM that best predicts future lineup performance,

Are there any current SPMs that aim to do this, and where can I see more readings?
Thanks for the heads-up on non-linear interactions. I've seen the old BPM have this problem, which led to the infamous 2017 Westbrook numbers.

Maybe work up a boxscore-based plus-minus for a few years, find RAPM for those years, and get an average discrepancy for players. Whether they are ascendant in their career or on the decline, they may consistently look better in one or the other.

For example, suppose Jrue Holiday has mediocre production but excellent RAPM. Both may decline, but there may be a consistent difference from year to year. Whatever his current season shows statistically, tack on his avg 'bonus' PM.

Thanks for this approach. I will try to see how the results look.

An additional question would be when creating any NBA metric, is the R^2 or any statistical measure of fit enough to test how good your metric is? Or do you need to involve more "eye test" to see whether these values match conventional wisdom? For example, let's say I had access to some RAPM points at the rim and wanted to create a metric based on this.
A higher R^2 value means I match closely with the "truth," but if these results don't match what we generally know(like Gilbert being a few spots from where you'd expect him to be), do you discard such a metric?

My final question is for anyone who has tried incorporating playoff RAPM as their target variable. Is there any noticeable change between playoff and regular season values that prevents you from using your regular season model for your post-season one? I question any playoff RAPM, as stars play too many minutes, which might lead to high collinearity. Unless someone has solved this issue, you'd need multiple years of data(which leads to aging effects) to get any reasonable sample. I would like to see if that is the case.

Also, I wish there were more available spasms, lol. It doesn't seem like a too daunting task; maybe the reward just isn't worth it.

SkyJuke · Post by **SkyJuke** » Mon Jul 22, 2024 8:00 am

Also, I wish there were more available spasms, lol. It doesn't seem like a too daunting task; maybe the reward just isn't worth it.

I meant to say SPMs here, lol.
I just realized you co-created ESPN's RPM. Is there any reason for it being updated based on what you and Steve worked on, and then it gets taken off the website entirely? Also, since it seems you are not in collaboration with ESPN anymore, will we ever get it un-black-boxed, or does it remain a mystery? Also are your methods any different from more modern AIOs(LEBRON,RAPTOR(RIP),EPM,etc)

J.E. · Post by **J.E.** » Tue Jul 23, 2024 2:12 am

On the cutoff, I think clearly the answer is no. You want negative examples in your dataset just as much as you want positive ones.
Normalization: Unless you meant some other kind of normalization, yes, you should probably be z-scoring X

Three years seems standard

says who

but it still has some errors

There isn't an RAPM version on this planet that doesn't "have errors". It's probably unwise to try to subjectively determine whether one version has more errors than another, and even worse to subjectively potentially call any version "error-free"

fwiw, ~8 years seems to be the sweet spot

Are there any current SPMs that aim to do this, and where can I see more readings?

I think that goal is implied with almost any SPM, including BPM, which is well documentated on bbr.com

An additional question would be when creating any NBA metric, is the R^2 or any statistical measure of fit enough to test how good your metric is?

You should be using leave-one-out out-of-sample mean-squared prediction error, instead of in-sample mean-squared error (the latter being equivalent to R2)
Don't forget to weigh observations. It's more important to correctly predict LeBron's SPM/RAPM impact than, say, Jalen Smith's

I'm always just 2 months away from being 2 months away from reviving my site, which would then include a bunch more documentation

DSMok1 · Post by **DSMok1** » Tue Jul 23, 2024 5:22 pm

Here are my thoughts on some of your questions:

Length of RAPM: What size of RAPM (e.g., 2-year, 3-year) have you found to be the best when producing SPM models?
Long enough to reduce noise and short enough that the single player response value isn't messing with their changing performance due to aging. Depends on if you're using a prior-informed version that helps account for aging. The better the prior, the shorter the duration that can work. I'd still use a minimum of 2 years. Without informed prior, at least 3 years I'd say.

Averaging Stats: Is using a player's average stats across a period a promising approach for the X variables in the regression?
If your target variable is averaged anyway, you probably could get away with that?

Cross-Validation: I'm using Sklearn's Ridge CV to produce my results. Should I leave out some samples of my data when training the model to improve performance?
If you don't test out of sample you end up with an overfit and non-robust model.

Normalization: Does normalizing data every two years improve model performance?
I don't think this is a good idea, personally.

Cutoff Threshold: Since my basic version is meant to work in a game and I'm only concerned with the best seasons, should I limit my regression to only players with a specific cutoff of RAPM?
I'm not sure. Do realize that low-minutes RAPM values have less real data and more of whatever the RAPM prior is.

Non-linear Variables: Reading through other models that have their methodologies online, it seems non-linear variables could improve performance. What non-linear stats or methods have you found to be most effective?
Based on my Westbrook experience--be very, very careful with nonlinear terms. It can improve fit and performance for 95% of players (normal players) and be nonsensical for 1% of players.

Are there any current SPMs that aim to do this (predict future lineup performance), and where can I see more readings?
BPM explicitly does not attempt to predict the future. The objective of BPM is to accurately divide up credit for the team performance to the players; if a player has a crazy/unsustainable 3 point run that is not regressed to the mean.

DARKO is explicitly predictive and I highly recommend it.

SkyJuke · Post by **SkyJuke** » Wed Jul 24, 2024 7:35 pm

fwiw, ~8 years seems to be the sweet spot

My current confusion with using longer RAPM(which has less noise and should have less collinearity), even when age-adjusted, would be how I accurately get a predictor variable for this longer RAPM. Averaging values seems to make some intuitive sense, but I'm not sure if that's right. I think if I'm using raw 8-year RAPM, averaging seems right,idk, but if I use an age-adjusted version, I should come up with age-adjusted box-score numbers which should fit with the RAPM values almost 1-1m interesting to think about and come up with.

To build the SPM model for EPM, a 10-year RAPM sample from 2004 to 2013 was calculated and used for offense and a 4-year RAPM from 2018 to 2021 for defense. The sample for defense was smaller and more recent to allow for player-tracking data to be used. DEPM values from 2014 to 2017 use a slightly different model based on the player-tracking data that were available at the time, and before 2014, no player-tracking data was used.

This from EPM is my confusion. Using a 10-year RAPM sample seems perfectly fine to me, but if I were to use players' average stats that have been padded/regressed, I think this would lose some of the peak play we see in some players. I'm not sure, and I might just be rambling.

Narrow regression basis. A very-long-term Regularized Adjusted Plus/Minus was used as basis for developing the original BPM. This made the issues with capturing outliers significantly worse, as no player was elite or an outlier for the entire 14 years captured in the regression basis

Some of these questions have been asked and answered before, but I want to find my own truth.

The Regression Basis
One of the unique things about this analysis is the regression basis that was used. It is not a simple long-term Regularized Adjusted Plus/Minus (RAPM).

Four 5-year long RAPM regressions (covering 1996-97 to 2015-16)
Bayesian prior-informed, using a prior that is based only on team quality and minutes per game in the given season. This eliminates the need for an aging adjustment within the regression to handle year-to-year aging shifts.
The prior also solves the issue of low-minutes players regressing towards league-average.
Special thanks to James Brocato of the Dallas Mavericks for running this Bayesian RAPM specially for the development of this new version of BPM!
Since these are only 5-year-long sets, a smaller portion of a player’s career is captured, including higher highs and (to some extent) lower lows.

Here is a sample of the dataset. Yes, Steve Nash was ridiculous on offense, and no, the box score still can’t fully capture that fact.

I don't think this is a good idea, personally.

Yeah, normalizing values across years didn't seem to improve model performance when I tested it, and it was annoying to code.

DARKO is explicitly predictive, and I highly recommend it.

Yeah, DARKO's methodology seems sound, but I'll have to find a way to get time-decay RAPM and a proper decay rate.

One question I have is: Is it better to build a model for RAPM and one for O-RAPM and then subtract to get your D-RAPM values? Maybe I should just test that out. Or build an O-RAPM metric and a D-RAPM metric and sum(weighted), both of which will produce my final metric.

The choice of RAPM length also seems to affect the scale of individual plus-minus values. I do not know how many points per 100 possessions the best player in the league truly impacts on his team in a season, but I want mine to be as close to that.

I'm always just 2 months away from being 2 months away from reviving my site, which would then include a bunch more documentation

Hopefully, that comes out soon.

I'm not going to blog about this, so I'll post my processes here if people don't mind; I also heard there is/was an APBR metric competition; if I ever get around to finishing this, maybe I'll join that.

Thanks for the replies. I think I'm ready to improve on what I have right now, and after this, I should be able to tackle adding an on/off component, which might improve performance. Just for documentation reasons, my next step would be figuring out whether I want to adjust box score stats early on in the season, how to add an on/off component, and whether to adjust that. It seems using this SPM as a prior for season RAPM values has been what's tested; I'll try that out. I'm not exactly sure I want to run RAPM every season for a personal project, so I might just come up with an RAPM adjacent stat like Ben Taylor's Augmented Plus Minus and what was done in RAPTOR.

Also, thanks to those who've published their methodologies for creating these metrics; unfortunately, most of these sites are very old.

SkyJuke · Post by **SkyJuke** » Wed Jul 24, 2024 11:16 pm

If you don't test out of sample you end up with an overfit and non-robust model.

leave-one-out out-of-sample mean-squared prediction error

One question I have about these remarks specifically: I know using in-sample R^2 is bad, and I made a mistake doing this, but I'm using the cross_val_score function in Python from sklearn, and I wonder if this is the right thing to do to test performance. Or should I fit a model to some number of years and test on years outside that sample? I tried LOO, but it seemed computationally intensive.

Don't forget to weigh observations. It's more important to correctly predict LeBron's SPM/RAPM impact than, say, Jalen Smith's

On weighing observations, a simple way to do this is to check players with more minutes played and weigh them more. It seems their rapm data is more real, right?

Crow · Post by **Crow** » Mon Jul 29, 2024 1:18 am

You say "creating a Statistical Plus-Minus (SPM) model for personal use". Sharing it at some point, even just here and even once or just prior period or temporarily might provoke useful comment by showing the product of the methods chosen and allowing comparisons with other meteics and subjective assessments.

But do as you wish.

APBRmetrics

Creating an SPM

Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM

Re: Creating an SPM