Let's say that this year is easily LeBron's best year in an objective sense; Naismith came down and told us so, and LeBron will do worse in future (and did so worse in past) seasons. An explanatory metric (like WP or WS or PER) with any kind of accuracy would give him a very high rating. When used to predict future performance, they would do relatively poorly because LeBron will do worse. A predictive metric like RAPM, or any metric that used regression to the mean or data from previous seasons, would give him a comparatively low rating. When used to predict future performance, it would do relatively well. Explanatory models will be sensitive to outlier-type seasons (or aging, or changes in player role or team philosophy, or any of the reasons that cause a player's production to change across seasons) whereas predictive models will smooth them out a bit.v-zero wrote:If they are accurately measuring what happened, and they are representative of a measurement of some underlying statistical quantity, then their mean represents an unbiased estimate of that quantity (assuming a few things hold). What that implies is that they should be the best unbiased estimate of themselves for future prediction, so if they fail to predict then they also are failing to accurately explain.
I.E. If you claim to measure how well a player played, then that measurement should also predict how well that player plays in future, unless the player's performance is drawn entirely at random - if player game-to-game performance isn't entirely random, then accurate measurements of it should lead to better future predictions. Ergo if WP predicts badly in comparison to others it is because it is less able to accurately measure the level of play of players from game to game, and hence fails because it is flawed, not because it creates outliers. Maybe it does create outliers, but if it does it is because it is flawed.
I think you have glossed over the flip side to your second paragraph, which is that your prediction of future performance should also be a measure of how well a player has already played. Yet the Sport Skeptic work (and maybe Neil has corroborating work?) shows that RAPM is relatively poor at describing the data it came from; i.e. player performance for 2010 does not sum to team performance for 2010. That is because RAPM is, by definition, a biased estimate (relevant to your first paragraph). This is exactly how it's supposed to work; it is supposed to predict out-of-sample, not explain in-sample.
None of which is to say that RAPM is inherently flawed or that WP is as good as the WoW guys claim. It's to point out that they (and models like them) have different purposes.