Page 1 of 2
Bayesian RAPM
Posted: Fri Sep 21, 2018 10:16 pm
by Jinxed
This Topic has been split off from viewtopic.php?f=2&t=9517
DSMok1 wrote: ↑Thu Sep 20, 2018 2:02 pm
I am working with a 20 year RAPM dataset (1997-2017) to develop a new BPM formulation.
With a pure linear statistical plus minus, here are the residuals:
A negative residual would indicate the player produced more than the box score would indicate (non-box-score contribution) while a positive residual indicates the player produced less than the box score would indicate.
What I see is that post players are the hardest to measure with box score statistics. Much of their contribution comes from post defense--or lack thereof. More generally, defense is not shown in the box score. Players elite on defense are underrated with box scores and players bad on defense are overrated (particularly if they are post players). Jason Collins and Nick Collison vs. Hassan Whiteside and Carlos Rogers.
Is this 20 year dataset age adjusted?
Re: Estimated non boxscore RPM value
Posted: Mon Sep 24, 2018 11:14 am
by DSMok1
Jinxed wrote: ↑Fri Sep 21, 2018 10:16 pm
Is this 20 year dataset age adjusted?
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
Re: Estimated non boxscore RPM value
Posted: Tue Sep 25, 2018 12:14 am
by xkonk
DSMok1 wrote: ↑Mon Sep 24, 2018 11:14 am
Jinxed wrote: ↑Fri Sep 21, 2018 10:16 pm
Is this 20 year dataset age adjusted?
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
Shouldn't it be both? Young guys who play a lot of minutes may be good now, but they might just be predicted to be good in the future. Unless the prior is based on the previous season's minutes played.
Re: Estimated non boxscore RPM value
Posted: Tue Sep 25, 2018 11:41 am
by DSMok1
xkonk wrote: ↑Tue Sep 25, 2018 12:14 am
DSMok1 wrote: ↑Mon Sep 24, 2018 11:14 am
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
Shouldn't it be both? Young guys who play a lot of minutes may be good now, but they might just be predicted to be good in the future. Unless the prior is based on the previous season's minutes played.
I looked into an age term along with the MPG term and the age term added nothing when the MPG term was done properly (note--it's not just MPG--also takes into account overall team strength). Age had 0 correlation with the residual.
Here's a group of players, the minutes-weighted average vanilla RAPM (which regresses low minutes players toward 0), the average Prior, and average total result:
Re: Estimated non boxscore RPM value
Posted: Tue Sep 25, 2018 3:41 pm
by Crow
Minutes adjusted for "overall team strength".... sounds good initially. Can an adjustment for position strength take it further? Position broadly defined by PG or G, Wing or Forward, Big or C groupings.
But are you ok with the team quality adjustment changing the raw data? A standard adjustment may not apply equally to all players in all situations? As with luck adjustments and others (shot defense) used various places. Probably helpful on average but for super serious evaluation it may be desirable to see metrics raw and adjusted.
Re: Estimated non boxscore RPM value
Posted: Tue Sep 25, 2018 5:04 pm
by DSMok1
Crow wrote: ↑Tue Sep 25, 2018 3:41 pm
Minutes adjusted for "overall team strength".... sounds good initially. Can an adjustment for position strength take it further? Position broadly defined by PG or G, Wing or Forward, Big or C groupings.
But are you ok with the team quality adjustment changing the raw data? A standard adjustment may not apply equally to all players in all situations? As with luck adjustments and others (shot defense) used various places. Probably helpful on average but for super serious evaluation it may be desirable to see metrics raw and adjusted.
I looked at the overall distribution of error and didn't see any broad patterns, so I felt it best to keep the prior as simple as possible.
RAPM simply pulls all players toward 0, so using a basic MPG & team strength prior helps RAPM pull players toward a more realistic value for them. This particularly helps the overall distribution--very low minutes players always congregate near 0 with vanilla RAPM, whereas low to mid-minute players end up typically well below 0 (and below their true value). Adding the prior makes the least difference for the very high minute players.
Re: Estimated non boxscore RPM value
Posted: Tue Oct 09, 2018 5:57 pm
by tarrazu
DSMok1 wrote: ↑Mon Sep 24, 2018 11:14 am
Jinxed wrote: ↑Fri Sep 21, 2018 10:16 pm
Is this 20 year dataset age adjusted?
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
How are you implementing the prior? Is this within the glmnet framework?
Re: Estimated non boxscore RPM value
Posted: Tue Oct 09, 2018 6:34 pm
by DSMok1
tarrazu wrote: ↑Tue Oct 09, 2018 5:57 pm
How are you implementing the prior? Is this within the glmnet framework?
No, it is implemented in pre & post processing, not within the ridge regression itself. Essentially the prior is subtracted from the observed lineup results, then the regression is run and the prior is added back in. The idea is to capture an appropriate rough estimation for each player's expected value, based solely on MPG and how good the team is. The result is not extremely sensitive to small changes in prior.
Re: Estimated non boxscore RPM value
Posted: Wed Oct 10, 2018 1:29 am
by Crow
Could you walk thru one player estimate step by step? I don't understand how the Bayesian RAPM table and your words sync up. What exactly is the 3rd column and how is calculated? Same for 5th and final columns. The table itself is not sufficient to do the calculation of the final column by simple mathematical operators is it?
Re: Estimated non boxscore RPM value
Posted: Wed Oct 10, 2018 12:53 pm
by DSMok1
Crow wrote: ↑Wed Oct 10, 2018 1:29 am
Could you walk thru one player estimate step by step? I don't understand how the Bayesian RAPM table and your words sync up. What exactly is the 3rd column and how is calculated? Same for 5th and final columns. The table itself is not sufficient to do the calculation of the final column by simple mathematical operators is it?
You have 10 players on the court. Team A outscores Team B by +20 per 100 possessions played (the stint was much shorter than 100 possessions, of course).
However, Team A had their starters in and Team B was playing scrubs. Each player has their own Bayesian Prior based on how many MPG they are playing that season and how good the team as a whole is. Suppose Team A has an average Bayesian Prior of +1 for each player. Conversely, Team B has an average Bayesian Prior of -1 for those 5 players.
In the pre-processing step, the observed production is adjusted by subtracting out the priors. The adjusted observation would be +20 -(+5) + (-5) = +10.
Then, the full RAPM would be run on the season or group of seasons as a whole, using these adjusted observations.
The RAPM results for each player are then added to the average prior for that player over the duration of the RAPM--this is the post-processing step.
For example: in the 20 year RAPM, Kobe Bryant had a wide range of Bayesian Priors, with an average prior over the duration of +3.5 (as seen above). Each observation was adjusted based on the specific Bayesian Prior for the given season. The RAPM was then run on the adjusted observations, yielding a rating for Kobe of -0.1. Then, his average prior was added back in (the +3.5). Thus his total Bayesian RAPM was +3.4.
For most players that have a very large number of observations, like Kobe, the addition of the Bayesian Prior only has a minor impact. For players with fewer minutes, it can be very significant. It also makes a significant difference if there is a lot of collinearity with a few players and RAPM has trouble disentangling them. (Jimmy Butler, for instance, jumped from +1.2 to +3.7 with the addition of his the prior.)
Re: Estimated non boxscore RPM value
Posted: Wed Oct 10, 2018 3:32 pm
by Crow
I still can't follow the table from starting data to end values.
Which columns get added /subtracted step by step on the table? It doesn't seem to flow left to right but rather right to left to somehow all the way to the right. There is no column for Kobe's -0.1. You can compute it but his case is confusing as there are 2 columns with 3.4. Can you walk thru the TABLE figures for someone else without this similarity so I can see what column is subtracted from which at the end? I can't follow the table for Duncan or anyone. Is the BPM column used for anything? Why is it ordered the way it is?
Maybe it is just me, but I can't follow the table at all.
Re: Estimated non boxscore RPM value
Posted: Wed Oct 10, 2018 4:55 pm
by DSMok1
Crow wrote: ↑Wed Oct 10, 2018 3:32 pm
I still can't follow the table from starting data to end values.
Which columns get added /subtracted step by step on the table? It doesn't seem to flow left to right but rather right to left to somehow all the way to the right. There is no column for Kobe's -0.1. You can compute it but his case is confusing as there are 2 columns with 3.4. Can you walk thru the TABLE figures for someone else without this similarity so I can see what column is subtracted from which at the end? I can't follow the table for Duncan or anyone. Is the BPM column used for anything? Why is it ordered the way it is?
Maybe it is just me, but I can't follow the table at all.
The table doesn't have any of the calculations. Here are the columns:
- RAPM (That's vanilla RAPM, no prior)
- BPM (Box Plus/Minus--for information only)
- net_rtg_adj (Average team quality -- used in the prior)
- Re_MPG (Average minutes per game -- used in the prior)
- Prior_Tot (The actual average prior used for that player)
- BayRAPM_tot (The Bayesian RAPM estimate for the player)
Re: Bayesian RAPM
Posted: Wed Oct 10, 2018 8:10 pm
by Crow
Well, I can use the table in pieces and I'll just let understanding the details go this time.
Re: Bayesian RAPM
Posted: Thu Oct 11, 2018 12:29 am
by xkonk
I have lots of questions, but not sure how much Daniel wants to get into it.
Re: Bayesian RAPM
Posted: Thu Oct 11, 2018 2:20 pm
by DSMok1
xkonk wrote: ↑Thu Oct 11, 2018 12:29 am
I have lots of questions, but not sure how much Daniel wants to get into it.
I'll try to answer the best I can! I'm not exactly an expert on the mathematical nuances, though.