Flaws with RAPM

xkonk · Post by **xkonk** » Thu Dec 31, 2020 7:40 pm

The tail-of-population idea has two other problems I can think of.
1) It assumes that the NBA cleanly lops off the ~450 best basketball players, which it doesn't. It likely does get the top, I don't know, 100? I don't have a strong opinion. But below a certain number there is certainly some randomness and churn that factors into who's in the league and who isn't. So if you look at the number of people that should be at the left end of the tail, there are fewer of them in the NBA than in the population.

2) Selection (can in general, and presumably does in this case) distorts distributions and relationships between variables. The tail-of-population idea is based on the entirety of humans (or humans that play basketball, or something else large) but looking at point differentials in the NBA selects on who gets into the NBA, which distorts the relationship between basketball talent and point differential. It would be an example of Berkson's paradox https://en.wikipedia.org/wiki/Berkson's_paradox . Another example is the fairly well-known fact that SAT scores do a good job of predicting who gets into graduate school but does not do a good job of predicting (can in fact be negatively correlated with) success in grad school.

DSMok1 · Post by **DSMok1** » Thu Dec 31, 2020 7:49 pm

Agreed with Jerry.

My research (back in 2010) showed that the players above league average were basically matching the tail end of the huge normal distribution. The top 100-150 players.

Here's the graphic I had developed back then:

rainmantrail · Post by **rainmantrail** » Fri Jan 01, 2021 3:50 am

xkonk wrote: ↑Thu Dec 31, 2020 7:40 pm Another example is the fairly well-known fact that SAT scores do a good job of predicting who gets into graduate school but does not do a good job of predicting (can in fact be negatively correlated with) success in grad school.

This is a different problem though. There are two entirely different data generation processes going on here. Getting into graduate school is almost entirely based on one's academic history and SAT (or similar) scores. So it makes perfect sense that SAT scores do a good job of predicting who gets in because that decision is literally based on it. But once you get into graduate school, making it through is an entirely different thing. The far more relevant factors in getting through grad school are whether or not someone also has a full-time job, whether they have a spouse and/or kids at home, whether or not they are paying their own tuition, and how strong their work ethic is. The SAT scores and academic history should be viewed more as gate-keepers, or minimum requirements having been met. To expect that they would be predictive of whether or not someone completes their degree would be foolish. The brightest students generally are the ones who don't "need" to finish school. In my graduate program, most of the top students already had great careers, and had little time for homework. It was the students in the middle of the pack and those struggling to understand the deeper concepts who were most dedicated to completing the program because they needed that degree more than the top students who already had great careers.

rainmantrail · Post by **rainmantrail** » Fri Jan 01, 2021 4:10 am

J.E. wrote: ↑Thu Dec 31, 2020 6:44 pm You're taking an (originally tail-end-of-population) subset of people, but then you're providing them with resources that aren't available to the outside world...

I think the end result is a mix of tail-end and gaussian

Distributions of various simple BoxScore stats will tell the same story. None of the histograms look like a pure tail-end distribution

This would be expected though, even if the true underlying distribution of player values were the right-tail of a Gaussian. It wouldn't follow that simple box scores stats like PPG, RPB, APG, etc. would also follow this right-tailed distribution. The data generation process for those metrics is quite different.

J.E. wrote: ↑Thu Dec 31, 2020 6:44 pm
Either way, BoxScore priors are a simple way to reasonably deal with it

This was my plan too. It should certainly help. Although the L2 norm is still going to try to squish it back into a distribution that doesn't apply, the effect should be far less intense given good priors. But then, we're really relying heavily on those priors to be accurate. It's a delicate balance. The approach I plan to take with my predictive models is to build as strong of an RAPM or some sort of adjusted RAPM type metric as I can and use that output as my target variable for another model. Probably something similar to what Daniel is doing with his BPM metrics, but I'm going to try to incorporate as much player tracking data and player meta-data as I can. Ultimately, I'm interested in trying to build something that is more predictive than RAPM or BPM. If I'm going to accomplish that, then I'll need to be able to explain more of the variance than those models are capable of. Especially with respect to defense. I'll also need my statistical modeling assumptions to be as true as possible. I think it's doable, but it's a heavy lift. I just think this is a fun project to work on, and I'm joining the party years after much of this stuff has already been thought through. But I do think there is more that can be done, and I'm hoping to build on that.

vzografos · Post by **vzografos** » Fri Jan 01, 2021 5:07 am

ok not really into the whole RAPM measures (not metrics!) but since you mentioned boxscores and skewed distributions....when I was looking into specific individual boxscore stats (continuous) the models that fitted best was the Weibull model. It can adjust for left and right skewness and it can approximate the Gaussian distribution as well.

Not sure if it is relevant to your discussion (since I really dont want to read the whole thread from 10 years ago) but I just thought I mention it.

For the discrete boxscores I seem to remember that a Poisson or Binomial model fitted the best.

rainmantrail · Post by **rainmantrail** » Fri Jan 01, 2021 5:49 am

vzografos wrote: ↑Fri Jan 01, 2021 5:07 am ok not really into the whole RAPM measures (not metrics!) but since you mentioned boxscores and skewed distributions....when I was looking into specific individual boxscore stats (continuous) the models that fitted best was the Weibull model. It can adjust for left and right skewness and it can approximate the Gaussian distribution as well.

Not sure if it is relevant to your discussion (since I really dont want to read the whole thread from 10 years ago) but I just thought I mention it.

For the discrete boxscores I seem to remember that a Poisson or Binomial model fitted the best.

That sounds like a good approach. The Weibull is pretty flexible.

Yes, I meant 'measures', not 'metrics'.

xkonk · Post by **xkonk** » Mon Jan 04, 2021 9:37 pm

rainmantrail wrote: ↑Fri Jan 01, 2021 3:50 am The far more relevant factors in getting through grad school are whether or not someone also has a full-time job, whether they have a spouse and/or kids at home, whether or not they are paying their own tuition, and how strong their work ethic is.

The SAT issue is just an example of the problem; it applies it a wide variety of situations. But I am curious why you think that high school academic history/SAT scores aren't dependent on a student's job/financial situation, family issues, work ethic, etc.?

vzografos · Post by **vzografos** » Mon Jan 04, 2021 9:53 pm

xkonk wrote: ↑Mon Jan 04, 2021 9:37 pm But I am curious why you think that high school academic history/SAT scores aren't dependent on a student's job/financial situation, family issues, work ethic, etc.?

Look at Trump

rainmantrail · Post by **rainmantrail** » Tue Jan 05, 2021 6:02 am

xkonk wrote: ↑Mon Jan 04, 2021 9:37 pm But I am curious why you think that high school academic history/SAT scores aren't dependent on a student's job/financial situation, family issues, work ethic, etc.?

I'm not sure if you're being serious or not here. Assuming you're not joking, there is a world of difference between a typical high school student and a graduate school student with respect to real life challenges. High school students don't have careers, spouses, and children to worry about. Graduate students often do.

colts18 · Post by **colts18** » Wed Jan 27, 2021 2:38 am

colts18 wrote: ↑Wed Dec 30, 2020 7:37 pm I was watching a game recently and saw a bench player come in the game late in the quarter. He gets fouled then heads to the free throw line for 2. That play made me realize another flaw of RAPM. That bench player is getting credit from him teammates drawing fouls and getting the team into penalty. Bench player comes in and gets 2 Free Throw shots despite the fact that he had nothing to do with the 5 earlier fouls drawn that got his team in the bonus.

Things like will artificially boost the offense of some players and depress the Offensive RAPM's of the players who don't get the benefit of playing in the penalty. The same thing can happen on the defensive side of the ball where the bench player gets a bad RAPM because his teammates fouled a lot.

I found an 82games article that confirms the point that I was trying to make.

www.82games.com/bonus.htm

Teams have a 102 Offensive Rating early in the quarter when they aren't in foul trouble, but that O Rating jumps to 112 when in the bonus. I believe that RAPM needs to be adjusted for time in the bonus. If not, players will get artificially boosted or downgraded for being in the penalty, which they have no control over.

A good example of this is the Stockton/Malone Jazz. Stockton was put on a pitch count late in his career. Jerry Sloan devised a scheme where Stockton would leave with about 6 minutes left in the 1st quarter then come back at the start of the 2nd quarter. While that happened, Malone stayed on the court for the rest of the 1st quarter. Malone would rest at the start of the 2nd quarter while Stockton was in the game. They both played at the end of the 2nd quarter. In the 3rd quarter they would repeat this with Stockton subbing out early and Malone resting at the start of the 4th quarter.

In effect, that meant Malone at the end of all 4 quarters which meant he got the offensive boost of being in the penalty quite a bit. Stockton otoh, was not on in the 1st or 3rd quarter so he only got to play in the penalty in the 2nd quarter and 4th quarter. According to the pbpstats website, Malone played 41% of his possessions in the penalty while Stockton played 28% in the penalty. RAPM should downgrade Malone's offensive numbers while simultaneously boosting Stockton's Offensive numbers. However, Malone's Defensive RAPM should get boosted to account for playing in the penalty a lot while Stockton's Defensive RAPM should be take a small hit to account for the effect. According to the RAPM that J.E. put out from the 01-03 period, Stockton had a balanced profile, good on offense and defense. Malone had a really good offensive RAPM, but his D RAPM sucked. His defense should get upgraded to reflect the difficult situation he was in.

Here is a visualization of Malone and Stockton's playing time which shows how their substitution patterns affected the amount of time they played in the bonus.

https://www.pbpstats.com/wowy-combo-pla ... ds=304,252

DSMok1 · Post by **DSMok1** » Wed Jan 27, 2021 2:54 pm

That's a very interesting point! I wonder how that adjustment could be made.

J.E. · Post by **J.E.** » Wed Jan 27, 2021 5:41 pm

You can just add another dummy variable for "offensive team in the bonus"

DSMok1 · Post by **DSMok1** » Wed Jan 27, 2021 6:46 pm

J.E. wrote: ↑Wed Jan 27, 2021 5:41 pm You can just add another dummy variable for "offensive team in the bonus"

Have you explored this topic, Jerry?

colts18 · Post by **colts18** » Thu Jan 28, 2021 9:19 pm

DSMok1 wrote: ↑Wed Jan 27, 2021 2:54 pm That's a very interesting point! I wonder how that adjustment could be made.

It can be a variable like HCA. "In the Penalty" or "Not in the Penalty" for both offense and defense.

DSMok1 · Post by **DSMok1** » Thu Jan 28, 2021 9:44 pm

colts18 wrote: ↑Thu Jan 28, 2021 9:19 pm
DSMok1 wrote: ↑Wed Jan 27, 2021 2:54 pm That's a very interesting point! I wonder how that adjustment could be made.
It can be a variable like HCA. "In the Penalty" or "Not in the Penalty" for both offense and defense.

Yes, but how do we "credit" the point value of being in the penalty to individual players with that scheme?

I guess it could be added as a post-processing step, based on who drew the fouls (on offense) or committed the fouls (on defense).

APBRmetrics

Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM

Re: Flaws with RAPM