RAPM results different when using individual possessions vs stints

gnomp · Post by **gnomp** » Sat Apr 20, 2024 3:57 pm

Hi, this is my first time posting here. Please let me know if this is not the place to post questions like this.

I've just been tinkering around with generating some RAPM numbers using Ryan Davis's python code https://github.com/rd11490/NBA_Tutorial ... pm/rapm.py (relatedly, is this code considered good for RAPM calculation? Are there any needed adjustments, like are the lambda values good?).

However, the RAPM numbers I'm getting are similar but still somewhat different when I input a dataframe with rows grouped by stints vs individual possessions. Namely, when using stints, the variance of the resulting RAPMs is considerably larger than when using possessions. Is this normal? Or does it indicate I'm making an error somewhere?

Here's the top 20 in 2014-16 3-year RAPM (with rubber band effect of -0.35/pts per 1-point lead) when I group by stints:

https://imgur.com/gallery/jhdfkKF

Now here's the top 20 when grouping by individual possessions:

https://imgur.com/gallery/Oq4YZy4

As you can see, the high-end players have higher RAPMs with the stint-based calculation. The low-end players have a similar effect, with more extreme RAPMs when using stints.

J.E. · Post by **J.E.** » Mon Apr 22, 2024 3:21 am

gnomp wrote: ↑Sat Apr 20, 2024 3:57 pmhttps://github.com/rd11490/NBA_Tutorial ... pm/rapm.py (relatedly, is this code considered good for RAPM calculation? Are there any needed adjustments, like are the lambda values good?).

There are a couple of things in that implementation that I'm not a huge fan of, but it's not immediately clear how much, if at all, it hurts the final results

- why use points/100 instead of just points? That'll just lead to lambda values not being comparable to the reference implementation (Joe Sill, 2010)
- I'd rather not fit an intercept, in fear of the underlying Ridge implementation including the intercept column in lambda calculations
- all the "lambda to alpha" (and vice versa) stuff is totally unnecessary

Using stints has the drawback - and I seem to make this point on a bi-annual basis - of one having to be very careful with weighing observations, which is something that most people seem to get wrong
Further, using stints makes it harder, if not impossible, to implement things like the rubber band effect

gnomp · Post by **gnomp** » Mon Apr 22, 2024 10:18 pm

Thanks for the reply JE.

Wouldn't the intercept be interpreted as the expected pts/poss of a lineup of five players with 0 O-RAPM vs five players with 0 D-RAPM? Is that the wrong interpretation? The point about the intercept being subjected to lambda is interesting... maybe that's why the intercept of ~103 in my results slightly undershoots the league-wide average O-rating during that 2014-16 period.

I did attempt to still incorporate a rubber effect into my stints, using the value of -3.5 pts/100 per 10-point lead that I believe you found about a decade ago (I think you're using a slightly different rubber band value now though). Before I grouped the individual possessions into stints, I just subtracted the rubber band value from the points column according to the lead for that row. So if a team with a 10-point lead scored two points in a given possession, I set the points value for that row equal to 2.035 points. I only then grouped possessions into stints (this was all done pre-processing), summing the (rubber band-adjusted) points values for all included rows/possessions and weighting the stint by how many possessions were included. If I'm not mistaken, that should properly account for the differing rubber band effect over the course of any one stint.

J.E. · Post by **J.E.** » Tue Apr 23, 2024 6:56 am

gnomp wrote: ↑Mon Apr 22, 2024 10:18 pm Wouldn't the intercept be interpreted as the expected pts/poss of a lineup of five players with 0 O-RAPM vs five players with 0 D-RAPM? Is that the wrong interpretation?

That's not incorrect, but you can get practically the same effect from simply subtracting average pts/poss from your Y

I only then grouped possessions into stints (this was all done pre-processing), summing the (rubber band-adjusted) points values for all included rows/possessions and weighting the stint by how many possessions were included

Ok yeah, that should work
In the past, there was sometimes only stint data available. If that was your original source, incorporating the rubber band effect is tricky. But if you start from individual possessions and group them, you should be good

My suggestion would be to start with a small, maybe (10, 10) matrix filled with random.seed-filled 1/0s, and play around with weights & intercepts until the result from the non-weighed version is equal to the other

gnomp · Post by **gnomp** » Wed Jun 19, 2024 3:19 pm

I'm happy to say I finally figured out the issue. It was this little function:

def lambda_to_alpha(lambda_value, samples):
return (lambda_value * samples) / 2.0

You were right that converting lambdas to alphas in this way is unnecessary, and moreover it just decreases the regularization parameter when using stints vs possessions since samples = train_x.shape[0]. Now I just use an alpha of 2200 (found using CV on 5-year RAPMs) and get virtually identical results with stints or possessions. I also removed the intercept. Thanks for the help.

In trying to come up with a replacement-level prior to correct for the fact that low-minute players are regressed towards 0, I ran RAPM with all players with less than 1000 possessions played over a 5-year period assigned to the same special player ID. This cumulative "replacement player" was spectacularly bad, like -6 RAPM (again with alpha=2200 over 5-year samples). I saw you posted a version of RAPM on twitter yesterday using that kind of prior. Is this similar to the methodology you use when calculating a replacement-level prior?

I am a bit confused by your point of using points instead of points per 100. If I have a stint of 10 possessions compared to a stint of a single possession, don't you have to divide the former's points by 10 to make everything apples-to-apples? And I'm curious what you mean by "comparable to the reference implementation"?

v-zero · Post by **v-zero** » Wed Jun 19, 2024 4:31 pm

Within any regression the scale of your response variable should be the same across all observations, so you are correct that every stint of N possessions should be standardised. I prefer to standardise to per100, but per possession will net you the same result, albeit with a different lambda (which is what JE is referring to regarding the reference (OG) RAPM).

Hopefully it goes without saying that the number of possessions in a stint becomes the weight for that stint once it has been standardised, it isn't at all complicated to understand why, simply from an intuition perspective, but as JE says people often get confused over this.

I believe what JE is advocating for is a row for every individual possession. I wouldn't advocate for that, but JE and I are pretty divergent on RAPM. This alleviates any weights issue, and makes it possible to handle the current score line at the start of each possession as a feature, if the whole rubber band thing is something you care about. The most significant downside, however, is the enormous matrix you get in return.

I do have a model that works at the start of every possession, but it wasn't built to estimate RAPM, it was built for realtime outcome prediction, and I have found no advantage in using it to attempt to estimate player impact.

J.E. · Post by **J.E.** » Thu Jun 20, 2024 2:30 am

gnomp wrote: ↑Wed Jun 19, 2024 3:19 pm In trying to come up with a replacement-level prior to correct for the fact that low-minute players are regressed towards 0, I ran RAPM with all players with less than 1000 possessions played over a 5-year period assigned to the same special player ID. This cumulative "replacement player" was spectacularly bad, like -6 RAPM (again with alpha=2200 over 5-year samples). I saw you posted a version of RAPM on twitter yesterday using that kind of prior. Is this similar to the methodology you use when calculating a replacement-level prior?

Yes, I did something like this, though I had more 1 bucket, and then fit a polynomial through the buckets' coefficients

This alleviates any weights issue, and makes it possible to handle the current score line at the start of each possession as a feature, if the whole rubber band thing is something you care about. The most significant downside, however, is the enormous matrix you get in return

The largest "spread" this season in regards to who's benefitting or suffering most from "individual strength of schedule", solely stemming from rubber band effect (ignoring opponents), is around 4.5 points/100. Maybe it's a matter of personal preference, but I'd rather not give players like James Wiseman an undeserved 2 point boost by removing the rubber-band adjustment

In regards to matrix size, I can run 28-year RAPM with 16GB of RAM on MacOs (but not other operating systems fwiw). I start combining rows & weighing when necessary, for other analyses, but not yet this one

v-zero · Post by **v-zero** » Thu Jun 20, 2024 7:54 am

I probably phrased that poorly. I don't use an RAPM based metric any more, I have developed other methods. I haven't found a rubber band adjustment useful in my methods.

I would genuinely be interested in somebody comparing the predictive quality of an RAPM metric which includes the adjustment, and one which does not.

Crow · Post by **Crow** » Fri Jun 28, 2024 3:30 pm

Is scaling appropriate? Done or not in popular versions?

https://www.cryptbeam.com/rapm/

J.E. · Post by **J.E.** » Sat Jun 29, 2024 3:35 am

Crow wrote: ↑Fri Jun 28, 2024 3:30 pm Is scaling appropriate? Done or not in popular versions?

https://www.cryptbeam.com/rapm/

I'm gonna go ahead and say "inappropriate"

The year-to-year swings of the penalization parameter (lambda) are actually extremely small.
If you were to put a prior on your lambda (yes..) then it becomes even more obvious that it's fairly stable.

Even if we saw wild year-to-year swings, they'd mostly be a function of the noise in the underlying data. That noise can come from obvious things such as increased three-point-rate. Another potential source of lambda swings - small year-to-year fluctuations in league-wide variance of player impact - is also a fairly normal thing to expect, I would argue

Bottom line: lambda is what it is because it leads to the best out of sample predictions. Arbitrarily changing it for "prettier" end-results defeats the original purpose of this technique and, per definition, leads to worse predictions

APBRmetrics

RAPM results different when using individual possessions vs stints

RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints

Re: RAPM results different when using individual possessions vs stints