Possible Priors for Bayesian RAPM

DSMok1 · Post by **DSMok1** » Thu Oct 04, 2012 5:27 pm

I have been pondering the weak points of APM and RAPM, and have come to the conclusion that perhaps the best approach, with no obvious drawbacks, would be to use an informed prior (rather than a set value) using purely MPG as the independent variable. Assuming coaches are rational, they will in general play their best players more. The advantage of this approach over regressing toward a point is that low minutes players will not trend higher than mid-minutes players.

I plotted J.E.'s 12-year average RAPM vs. MPG, and got the following charts:

Please look at the better, interactive version of those charts to get more of an idea of what is going on:
http://public.tableausoftware.com/views ... Dashboard1

Here is the data on the linear regressions:

Trend Lines Model

A linear trend model is computed for sum of DRAPM given sum of MPG. The model may be significant at p <= 0.05.

Model formula: ( MPG + intercept )
Number of observations: 653
DF (degrees of freedom): 2
Residual DF: 651
SSE (sum squared error): 2175.67
MSE (mean squared error): 3.34205
R-Squared: 0.0115163
Standard error: 1.82813
p (significance): 0.0060518

A linear trend model is computed for sum of ORAPM given sum of MPG. The model may be significant at p <= 0.05.

Model formula: ( MPG + intercept )
Number of observations: 653
DF (degrees of freedom): 2
Residual DF: 651
SSE (sum squared error): 1549.59
MSE (mean squared error): 2.38033
R-Squared: 0.193412
Standard error: 1.54283
p (significance): < 0.0001

A linear trend model is computed for sum of RAPM given sum of MPG. The model may be significant at p <= 0.05.

Model formula: ( MPG + intercept )
Number of observations: 653
DF (degrees of freedom): 2
Residual DF: 651
SSE (sum squared error): 2909.32
MSE (mean squared error): 4.46901
R-Squared: 0.16907
Standard error: 2.114
p (significance): < 0.0001

Individual trend lines:
Pane(r,c) p Equation
(1,1) < 0.0001 RAPM = 0.157742*MPG + -4.63461
(2,1) < 0.0001 ORAPM = 0.124976*MPG + -3.67126
(3,1) 0.0060518 DRAPM = 0.0326414*MPG + -0.955727

Crow · Post by **Crow** » Thu Oct 04, 2012 5:36 pm

Nearly all the correlation with minutes coming from offensive RAPM, very little from defensive RAPM. Not surprising given that coaches are aware of standard boxscore statistics with provide a fairly good view of offensive impact (but not total) and probably are not paying much or any attention generally to defensive RAPM or defensive RAPM that contrasts from the message of boxscore defensive statistics. I don't know how much awareness on average they have or regard for position counterpart offensive statistics (their own defensive stats).

bbstats · Post by **bbstats** » Fri Oct 05, 2012 4:55 pm

Seems to reinforce the notion that coaches don't sub in players optimally for defense.

And if that wasn't an official notion until now, may it be known as such.

DSMok1 · Post by **DSMok1** » Mon Oct 08, 2012 10:59 am

bbstats wrote:Seems to reinforce the notion that coaches don't sub in players optimally for defense.

And if that wasn't an official notion until now, may it be known as such.

This is also the case for baseball offense vs. defense. If the talent spread is wider on O than D (and quality more easily measured as well), then this is the distribution that you would expect to see.

schtevie · Post by **schtevie** » Mon Oct 08, 2012 12:36 pm

Daniel, I (and I expect others) would profit were you to offer a brief discussion of the relevant criteria (and tradeoffs) for selecting the "best" possible prior for (R)APM.

My sense is that bootstrapping year-one estimates and incorporating information from position-adjusted aging curves, such that out-year results "best" conform to such "known" characteristics, would be a better way to proceed.

DSMok1 · Post by **DSMok1** » Mon Oct 08, 2012 1:15 pm

schtevie wrote:Daniel, I (and I expect others) would profit were you to offer a brief discussion of the relevant criteria (and tradeoffs) for selecting the "best" possible prior for (R)APM.

My sense is that bootstrapping year-one estimates and incorporating information from position-adjusted aging curves, such that out-year results "best" conform to such "known" characteristics, would be a better way to proceed.

Certainly.

The big issue with APM is it's tremendous noise levels--multicollinearity dominates small samples. RAPM helps the multicollinearity problem greatly, at the expense of biasing the sample toward the fixed prior. If a player has few minutes, the prior will dominate.

Read my writeup on APM and stabilization here: http://godismyjudgeok.com/DStats/2011/n ... ilization/

Now, if you use multi-year samples, you will solve many of the issues, but are no longer measuring what happened in a given year.

What I am looking for here is a prior that will help with the collinearity, while introducing the minimum of bias, and allow a Bayesian RAPM to be meaningful and as little-skewed as possible for a single season, with no input from other seasons.

The most logical, and least biasing prior (in my opinion) would be MPG, indicating the coach's perceptions of the player's abilities. Obviously, an imbalanced team will have issues with this, since multiple good players could play the same position while another position would necessitate bad players playing.

However, I think the chart above indicates that a prior of this sort, if properly weighted (to have optimal cross-validation within season) has some promise compared to basic RAPM. Players with few minutes played would not skew towards, say, -1.9, or whatever the RAPM prior happens to be. They would instead skew toward, say, -4, while players with 30 MPG might skew somewhat toward 0.

I believe this is a promising prior that warrants more research.

schtevie · Post by **schtevie** » Tue Oct 09, 2012 5:49 pm

Daniel, thanks for the prompt reply, and the reference to your writeup. A few comments:

DSMok1 wrote:...Now, if you use multi-year samples, you will solve many of the issues, but are no longer measuring what happened in a given year.

What I am looking for here is a prior that will help with the collinearity, while introducing the minimum of bias, and allow a Bayesian RAPM to be meaningful and as little-skewed as possible for a single season, with no input from other seasons.

What I take from looking at series of yearly results, what Jeremias has kindly generated, is that it is really, really important to begin with the best possible prior for year 1. Otherwise, you are wasting several years of data, simply playing catch-up. This effect can also be clearly seen with the recently posted multi-year data where Shaq apparently peaks at age 32, John Stockton at 40, etc. and where implausibly low initial estimates obtain for the likes of Tim Duncan and Dirk Nowitzki, amongst others, (implausible, that is, in terms of what we believe about ordinary aging effects). We know that such results are incorrect, and not just individually, given that there is some kind of adding up constraint.

The issue is how to best deal with it. And in that context I am not sure what you mean by "as little-skewed as possible for a single season" and "with no input from other seasons". To my mind, the "integrity" of 2001 estimates is expendable if information contained therein can be used to create the best possible prior for 2002.

DSMok1 wrote:The most logical, and least biasing prior (in my opinion) would be MPG, indicating the coach's perceptions of the player's abilities. Obviously, an imbalanced team will have issues with this, since multiple good players could play the same position while another position would necessitate bad players playing.

However, I think the chart above indicates that a prior of this sort, if properly weighted (to have optimal cross-validation within season) has some promise compared to basic RAPM. Players with few minutes played would not skew towards, say, -1.9, or whatever the RAPM prior happens to be. They would instead skew toward, say, -4, while players with 30 MPG might skew somewhat toward 0.

I believe this is a promising prior that warrants more research.

Again, it's not exactly clear to me why we should care for a "least biasing prior" when RAPM, by construction, provides biased estimates, especially of the extremes, but MPG is an interesting idea. However, wouldn't an age and position adjusted MPG be better? John Stockton in 2001 played fewer minutes, not because he wasn't still really good, but because he was old. Similarly, Shaq's 2001 minutes were near the top of the league, rather aberrational for a center.

DSMok1 · Post by **DSMok1** » Tue Oct 09, 2012 6:29 pm

Certainly, there is an important place for using maximally-informative priors, like using prior years with aging curve properly applied. I've been advocating that approach for years now, as the best method overall to estimate true talent.

I was just brainstorming how to use data from only 1 season and get the best estimate from that data alone, with a minimum of bias.

Crow · Post by **Crow** » Thu Oct 18, 2012 4:48 pm

Could a minutes-based prior be usefully adjusted by something else to improve its awareness of defensive impacts without adding unwanted / unacceptable bias? Height? Draft pick number?

APBRmetrics

Possible Priors for Bayesian RAPM

Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM

Re: Possible Priors for Bayesian RAPM