Bayesian RAPM

Home for all your discussion of basketball statistical analysis.
Jinxed
Posts: 25
Joined: Mon Jun 13, 2011 9:53 pm

Bayesian RAPM

Post by Jinxed » Fri Sep 21, 2018 10:16 pm

This Topic has been split off from viewtopic.php?f=2&t=9517

DSMok1 wrote:
Thu Sep 20, 2018 2:02 pm
I am working with a 20 year RAPM dataset (1997-2017) to develop a new BPM formulation.

With a pure linear statistical plus minus, here are the residuals:


A negative residual would indicate the player produced more than the box score would indicate (non-box-score contribution) while a positive residual indicates the player produced less than the box score would indicate.

What I see is that post players are the hardest to measure with box score statistics. Much of their contribution comes from post defense--or lack thereof. More generally, defense is not shown in the box score. Players elite on defense are underrated with box scores and players bad on defense are overrated (particularly if they are post players). Jason Collins and Nick Collison vs. Hassan Whiteside and Carlos Rogers.
Is this 20 year dataset age adjusted?

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Estimated non boxscore RPM value

Post by DSMok1 » Mon Sep 24, 2018 11:14 am

Jinxed wrote:
Fri Sep 21, 2018 10:16 pm
Is this 20 year dataset age adjusted?
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

xkonk
Posts: 287
Joined: Fri Apr 15, 2011 12:37 am

Re: Estimated non boxscore RPM value

Post by xkonk » Tue Sep 25, 2018 12:14 am

DSMok1 wrote:
Mon Sep 24, 2018 11:14 am
Jinxed wrote:
Fri Sep 21, 2018 10:16 pm
Is this 20 year dataset age adjusted?
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
Shouldn't it be both? Young guys who play a lot of minutes may be good now, but they might just be predicted to be good in the future. Unless the prior is based on the previous season's minutes played.

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Estimated non boxscore RPM value

Post by DSMok1 » Tue Sep 25, 2018 11:41 am

xkonk wrote:
Tue Sep 25, 2018 12:14 am
DSMok1 wrote:
Mon Sep 24, 2018 11:14 am
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
Shouldn't it be both? Young guys who play a lot of minutes may be good now, but they might just be predicted to be good in the future. Unless the prior is based on the previous season's minutes played.
I looked into an age term along with the MPG term and the age term added nothing when the MPG term was done properly (note--it's not just MPG--also takes into account overall team strength). Age had 0 correlation with the residual.

Here's a group of players, the minutes-weighted average vanilla RAPM (which regresses low minutes players toward 0), the average Prior, and average total result:

Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Crow
Posts: 5566
Joined: Thu Apr 14, 2011 11:10 pm

Re: Estimated non boxscore RPM value

Post by Crow » Tue Sep 25, 2018 3:41 pm

Minutes adjusted for "overall team strength".... sounds good initially. Can an adjustment for position strength take it further? Position broadly defined by PG or G, Wing or Forward, Big or C groupings.

But are you ok with the team quality adjustment changing the raw data? A standard adjustment may not apply equally to all players in all situations? As with luck adjustments and others (shot defense) used various places. Probably helpful on average but for super serious evaluation it may be desirable to see metrics raw and adjusted.

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Estimated non boxscore RPM value

Post by DSMok1 » Tue Sep 25, 2018 5:04 pm

Crow wrote:
Tue Sep 25, 2018 3:41 pm
Minutes adjusted for "overall team strength".... sounds good initially. Can an adjustment for position strength take it further? Position broadly defined by PG or G, Wing or Forward, Big or C groupings.

But are you ok with the team quality adjustment changing the raw data? A standard adjustment may not apply equally to all players in all situations? As with luck adjustments and others (shot defense) used various places. Probably helpful on average but for super serious evaluation it may be desirable to see metrics raw and adjusted.
I looked at the overall distribution of error and didn't see any broad patterns, so I felt it best to keep the prior as simple as possible.

RAPM simply pulls all players toward 0, so using a basic MPG & team strength prior helps RAPM pull players toward a more realistic value for them. This particularly helps the overall distribution--very low minutes players always congregate near 0 with vanilla RAPM, whereas low to mid-minute players end up typically well below 0 (and below their true value). Adding the prior makes the least difference for the very high minute players.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

tarrazu
Posts: 58
Joined: Mon Aug 04, 2014 5:02 pm

Re: Estimated non boxscore RPM value

Post by tarrazu » Tue Oct 09, 2018 5:57 pm

DSMok1 wrote:
Mon Sep 24, 2018 11:14 am
Jinxed wrote:
Fri Sep 21, 2018 10:16 pm
Is this 20 year dataset age adjusted?
It's better than age adjusted. It uses a Bayesian prior based on playing time each season.
How are you implementing the prior? Is this within the glmnet framework?

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Estimated non boxscore RPM value

Post by DSMok1 » Tue Oct 09, 2018 6:34 pm

tarrazu wrote:
Tue Oct 09, 2018 5:57 pm
How are you implementing the prior? Is this within the glmnet framework?
No, it is implemented in pre & post processing, not within the ridge regression itself. Essentially the prior is subtracted from the observed lineup results, then the regression is run and the prior is added back in. The idea is to capture an appropriate rough estimation for each player's expected value, based solely on MPG and how good the team is. The result is not extremely sensitive to small changes in prior.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Crow
Posts: 5566
Joined: Thu Apr 14, 2011 11:10 pm

Re: Estimated non boxscore RPM value

Post by Crow » Wed Oct 10, 2018 1:29 am

Could you walk thru one player estimate step by step? I don't understand how the Bayesian RAPM table and your words sync up. What exactly is the 3rd column and how is calculated? Same for 5th and final columns. The table itself is not sufficient to do the calculation of the final column by simple mathematical operators is it?

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Estimated non boxscore RPM value

Post by DSMok1 » Wed Oct 10, 2018 12:53 pm

Crow wrote:
Wed Oct 10, 2018 1:29 am
Could you walk thru one player estimate step by step? I don't understand how the Bayesian RAPM table and your words sync up. What exactly is the 3rd column and how is calculated? Same for 5th and final columns. The table itself is not sufficient to do the calculation of the final column by simple mathematical operators is it?
You have 10 players on the court. Team A outscores Team B by +20 per 100 possessions played (the stint was much shorter than 100 possessions, of course).

However, Team A had their starters in and Team B was playing scrubs. Each player has their own Bayesian Prior based on how many MPG they are playing that season and how good the team as a whole is. Suppose Team A has an average Bayesian Prior of +1 for each player. Conversely, Team B has an average Bayesian Prior of -1 for those 5 players.

In the pre-processing step, the observed production is adjusted by subtracting out the priors. The adjusted observation would be +20 -(+5) + (-5) = +10.

Then, the full RAPM would be run on the season or group of seasons as a whole, using these adjusted observations.

The RAPM results for each player are then added to the average prior for that player over the duration of the RAPM--this is the post-processing step.

For example: in the 20 year RAPM, Kobe Bryant had a wide range of Bayesian Priors, with an average prior over the duration of +3.5 (as seen above). Each observation was adjusted based on the specific Bayesian Prior for the given season. The RAPM was then run on the adjusted observations, yielding a rating for Kobe of -0.1. Then, his average prior was added back in (the +3.5). Thus his total Bayesian RAPM was +3.4.

For most players that have a very large number of observations, like Kobe, the addition of the Bayesian Prior only has a minor impact. For players with fewer minutes, it can be very significant. It also makes a significant difference if there is a lot of collinearity with a few players and RAPM has trouble disentangling them. (Jimmy Butler, for instance, jumped from +1.2 to +3.7 with the addition of his the prior.)
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Crow
Posts: 5566
Joined: Thu Apr 14, 2011 11:10 pm

Re: Estimated non boxscore RPM value

Post by Crow » Wed Oct 10, 2018 3:32 pm

I still can't follow the table from starting data to end values.

Which columns get added /subtracted step by step on the table? It doesn't seem to flow left to right but rather right to left to somehow all the way to the right. There is no column for Kobe's -0.1. You can compute it but his case is confusing as there are 2 columns with 3.4. Can you walk thru the TABLE figures for someone else without this similarity so I can see what column is subtracted from which at the end? I can't follow the table for Duncan or anyone. Is the BPM column used for anything? Why is it ordered the way it is?

Maybe it is just me, but I can't follow the table at all.

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Estimated non boxscore RPM value

Post by DSMok1 » Wed Oct 10, 2018 4:55 pm

Crow wrote:
Wed Oct 10, 2018 3:32 pm
I still can't follow the table from starting data to end values.

Which columns get added /subtracted step by step on the table? It doesn't seem to flow left to right but rather right to left to somehow all the way to the right. There is no column for Kobe's -0.1. You can compute it but his case is confusing as there are 2 columns with 3.4. Can you walk thru the TABLE figures for someone else without this similarity so I can see what column is subtracted from which at the end? I can't follow the table for Duncan or anyone. Is the BPM column used for anything? Why is it ordered the way it is?

Maybe it is just me, but I can't follow the table at all.
The table doesn't have any of the calculations. Here are the columns:
  • RAPM (That's vanilla RAPM, no prior)
  • BPM (Box Plus/Minus--for information only)
  • net_rtg_adj (Average team quality -- used in the prior)
  • Re_MPG (Average minutes per game -- used in the prior)
  • Prior_Tot (The actual average prior used for that player)
  • BayRAPM_tot (The Bayesian RAPM estimate for the player)
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Crow
Posts: 5566
Joined: Thu Apr 14, 2011 11:10 pm

Re: Bayesian RAPM

Post by Crow » Wed Oct 10, 2018 8:10 pm

Well, I can use the table in pieces and I'll just let understanding the details go this time.

xkonk
Posts: 287
Joined: Fri Apr 15, 2011 12:37 am

Re: Bayesian RAPM

Post by xkonk » Thu Oct 11, 2018 12:29 am

I have lots of questions, but not sure how much Daniel wants to get into it.

DSMok1
Posts: 850
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Bayesian RAPM

Post by DSMok1 » Thu Oct 11, 2018 2:20 pm

xkonk wrote:
Thu Oct 11, 2018 12:29 am
I have lots of questions, but not sure how much Daniel wants to get into it.
I'll try to answer the best I can! I'm not exactly an expert on the mathematical nuances, though.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Post Reply