Is WP a legitimate stat?

xkonk · Post by **xkonk** » Tue Mar 26, 2013 6:36 pm

v-zero wrote:If they are accurately measuring what happened, and they are representative of a measurement of some underlying statistical quantity, then their mean represents an unbiased estimate of that quantity (assuming a few things hold). What that implies is that they should be the best unbiased estimate of themselves for future prediction, so if they fail to predict then they also are failing to accurately explain.

I.E. If you claim to measure how well a player played, then that measurement should also predict how well that player plays in future, unless the player's performance is drawn entirely at random - if player game-to-game performance isn't entirely random, then accurate measurements of it should lead to better future predictions. Ergo if WP predicts badly in comparison to others it is because it is less able to accurately measure the level of play of players from game to game, and hence fails because it is flawed, not because it creates outliers. Maybe it does create outliers, but if it does it is because it is flawed.

Let's say that this year is easily LeBron's best year in an objective sense; Naismith came down and told us so, and LeBron will do worse in future (and did so worse in past) seasons. An explanatory metric (like WP or WS or PER) with any kind of accuracy would give him a very high rating. When used to predict future performance, they would do relatively poorly because LeBron will do worse. A predictive metric like RAPM, or any metric that used regression to the mean or data from previous seasons, would give him a comparatively low rating. When used to predict future performance, it would do relatively well. Explanatory models will be sensitive to outlier-type seasons (or aging, or changes in player role or team philosophy, or any of the reasons that cause a player's production to change across seasons) whereas predictive models will smooth them out a bit.

I think you have glossed over the flip side to your second paragraph, which is that your prediction of future performance should also be a measure of how well a player has already played. Yet the Sport Skeptic work (and maybe Neil has corroborating work?) shows that RAPM is relatively poor at describing the data it came from; i.e. player performance for 2010 does not sum to team performance for 2010. That is because RAPM is, by definition, a biased estimate (relevant to your first paragraph). This is exactly how it's supposed to work; it is supposed to predict out-of-sample, not explain in-sample.

None of which is to say that RAPM is inherently flawed or that WP is as good as the WoW guys claim. It's to point out that they (and models like them) have different purposes.

v-zero · Post by **v-zero** » Tue Mar 26, 2013 6:48 pm

I'm not sure you've really understood what I have said, and I didn't really explain it well, but nothing you've said really applies to what I am talking about. I am talking about statistics, really, not basketball.

I do agree that allowing access (and the obvious method is a Kalman-filter) to past performance game-to-game will improve the estimates made by box score metrics, but that isn't what I'm getting at. I'm saying the upper limit of the predictive ability of WP is set by its failure to explain who caused games to be won, not its failure to be built for prediction. It does so happen that using proper cross-validation does lead to more predictive metrics, but that does not make them flawed in terms of ability to explain the past. If you calculate RAPM, then force it to sum to team efficiency for each team, you will have a metric that is basically as predictive as RAPM and as explanatory as WP or any other that sums to team differential.

DSMok1 · Post by **DSMok1** » Tue Mar 26, 2013 7:29 pm

My philosophy for building ASPM was to match how Sabermetrics does its projections:

1. Explain what happened accurately
2. Add regression to the mean plus weights from prior years + aging to create a predictive model.

ASPM is point 1. I have not released a comprehensive part 2 to my model. Win Shares, Wins Produced, and PER, are also supposed to be point 1.

You can build a model that goes straight to point 2--adding the regression to the mean and prior year data. xRAPM does that, but not aging, at this point.

If you do cross validation, you will get regression to the mean built into the model.

However--if using box-score data, it is so stable that regression to the mean for cross validation accuracy is not very significant. It is very significant for APM, which is unstable. In other words--my estimate of single season TRUE TALENT level for ASPM (which would be performance + regression to the mean) will be far more confident and therefore have a wider spread than an RAPM model.

Now, if your box-score stat measuring what happened is poor, you would need a lot more regression to the mean to get best out-of-sample performance.

xkonk · Post by **xkonk** » Tue Mar 26, 2013 9:57 pm

The idea that one model can do an excellent job of both explaining and predicting runs counter to my knowledge of issues like overfitting/shrinkage/validating out-of-sample, etc. Perhaps RAPM could be made explanatory by forcing it to sum to differential, but then would it still have the predictive ability? Maybe ASPM could have regression to the mean, but then would it lose explanatory ability? I'm not saying that one approach can't be successful at both, I mean one literal model or set of numbers. Is there a counter-example anywhere? Does xRAPM do an excellent job of summing to team differential, or is there an example from some other field?

DSMok1 · Post by **DSMok1** » Tue Mar 26, 2013 10:20 pm

xkonk wrote:The idea that one model can do an excellent job of both explaining and predicting runs counter to my knowledge of issues like overfitting/shrinkage/validating out-of-sample, etc. Perhaps RAPM could be made explanatory by forcing it to sum to differential, but then would it still have the predictive ability? Maybe ASPM could have regression to the mean, but then would it lose explanatory ability? I'm not saying that one approach can't be successful at both, I mean one literal model or set of numbers. Is there a counter-example anywhere? Does xRAPM do an excellent job of summing to team differential, or is there an example from some other field?

I don't think it is possible to be top notch R^2 both for lineups in sample and R^2 on lineups/season out of sample, purely because of the regression to the mean component. That is quite clear if you look at baseball--it is cut and dried what a player contributed to his team. However, to get an out-of-sample prediction of what the player is expected to produce, given the in-sample production, one must regress to the mean significantly.

Basketball is different in that it is non-trivial to calculate what a player contributed to the team, let alone do an out-of-sample prediction.

v-zero · Post by **v-zero** » Tue Mar 26, 2013 10:29 pm

I think you may misunderstand the real intention of building a mathematical model. There is no actual difference between building a model for 'prediction' and 'explanation'. Improper use of OLS when sample size and multicollinearity are a problem is simply bad modelling. It isn't 'explanatory modelling', because its explanations are wrong. When APM comes up with single year numbers for some guy who got six minutes and gives him a rating of +35, that has happened because OLS is dumb and it just so happens that a large amount of error can be explained away by giving him a stupid rating. RAPM will tell you much more about who caused wins and losses than APM can, and will also, as a result, tell you more about what they might do in future.

As I stated (and I have done this) you can force RAPM to sum to efficiency differential and it becomes as good as any other metric at explaining, and yet will retain almost all of its predictive power.

EvanZ · Post by **EvanZ** » Wed Mar 27, 2013 12:45 am

v-zero wrote:
As I stated (and I have done this) you can force RAPM to sum to efficiency differential and it becomes as good as any other metric at explaining, and yet will retain almost all of its predictive power.

How do you do this? Do you add another penalty term?

v-zero · Post by **v-zero** » Wed Mar 27, 2013 1:18 am

EvanZ wrote:How do you do this? Do you add another penalty term?

Just find the difference between possession weighted team RAPM for the sample and efficiency differential for the sample, and then add a constant value to each player's rating (on a per-team basis) such that their adjusted RAPM weighted by possessions sums to efficiency differential. I believe this is how DSMok1 adjusts ASPM also.

EvanZ · Post by **EvanZ** » Wed Mar 27, 2013 3:17 am

When you do that, I assume the prediction becomes worse. But does it "explain" it better? If so, that would seem to be contrary to your previous point.

v-zero · Post by **v-zero** » Wed Mar 27, 2013 10:15 am

It will lose some predictive power, but that doesn't run contrary to my point because what I am suggesting is simple a way to force RAPM into acting like WP, not a way to have RAPM directly measure the actual hidden variable as if it was observable.

DSMok1 · Post by **DSMok1** » Wed Mar 27, 2013 12:02 pm

V-Zero:

Suppose we KNOW that a group of players have played at a +1 rate for each player.

So for pure explanatory power, all lineups with those 5 players would be rated, 100% accurately, as +5.0.

On the other hand, based on the sample size and regression to the prior, it would be 100% accurate to rate those players at +0, +0.5, -0.5, +0, and +0, respectively, out of sample--with lineup predictions for that 5 out of our sample rated at +0.0.

So if I rate the players as +1, I am have created a perfect explanatory metric, but if I rate them at at +0, +0.5, -0.5, +0, and +0, respectively, I have created a perfect predictive metric.

My point: one cannot linearly force one type of metric to act as the other, because there is differing regression based on the appropriate priors for each player. You must necessarily lose either true explanatory or predictive power when deriving a metric for the other purpose.

v-zero · Post by **v-zero** » Wed Mar 27, 2013 1:05 pm

I understand what you are getting at, I just don't agree. The whole point of this is that we cannot measure player performance directly, it is a hidden variable, we must infer it.

If we accept that players can be represented by independent variables and that in a game the performance of a lineup is the sum of the performances of the players plus some error term, then it must be the case that the best unbiased estimate of this lineup will sum to its total performance. RAPM introduces bias not because it is inherently better for prediction when considering all models, not to add some regression to the mean, but to allow better dismantling of the collinearity and sample size problems in the data. You can argue that it introduces regression to the mean, but consider this: Can you achieve RAPM by taking APM and regressing the output values towards the mean? Nope, because regression towards the mean isn't all that introducing a prior is doing, it is also informing the model about the likelihood of certain players taking certain values. Cross validation may provide a feeling of regression to the mean, but its only real purpose is to test that estimates are sensible, and to tune the free parameter of the model.

All of my experience comes from Physics, and that field is the greatest and most powerful field of model builders the world will ever have. Physics covers the vast array of possible styles and techniques, mathematically, but every single physicist you ask will respond to the question of the difference between a model that explains and one that predicts (and ideally makes testable hypotheses) with 'that doesn't make sense, they are not different things' or something similar.

Explanation and prediction are just two sides of the same coin, and a model should not favour either, as if it fails at one it fails at the other, regardless of what shoehorned correlation it may possess.

N.B. As an aside I do agree that a model built initially to explain can and should be suitably modified for best prediction when the estimates are uncertain. It is always a good idea to try to quantify uncertainty, and make adequate allowances for it.

APBRmetrics

Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?

Re: Is WP a legitimate stat?