Neil Paine wrote:I plan to do an even more thorough examination of this when I have time (which seems like never), but here is the evidence that Statistical Plus/Minus metrics (whether ASPM/BPM or
SPM) are the best of the boxscore metrics...
Let me preface by saying the real test for any boxscore stat should be
how it predicts team performance out of sample. It's
long been known that any boxscore stat can boast a high correlation with team W% in-sample just by employing a team adjustment (like BPM does) or
otherwise setting things up so that points scored/allowed and possessions employed/acquired add up at the individual level to team totals. What matters is how well a metric predicts the performance of a future team, given who its players are and how well those players have performed in the metric in the past.
I looked at metrics from that perspective
here, and found that over the 2001-2012 period, ASPM did better than any other boxscore metric at predicting out-of-sample team performance, especially the further out of sample you go (using data from 2 and 3 years prior to predict Year Y). Over the summer, I also ran the same test using data from 1978-2014 for
my SPM metric, Daniel's old ASPM (a version behind the current BPM), PER, WS/48 and a plus/minus estimate constructed from Basketball on Paper's ORtg/%Poss/DRtg
Skill Curves.
(Just to expound on the BoP metric, I set up a fake 5-man unit w/ the player and 4 avg teammates. The teammates' ORtg changed based on the player's %Poss,
like this. So If I use 25% of poss, my avg teammate uses 18.8% on avg. If tradeoff is 1.2, then he gains 1.2*(20-18.8) of ORtg.)
Code: Select all
+-----------+--------+--------+--------+--------+
| Metric | Year-1 | Year-2 | Year-3 | Year-4 |
+-----------+--------+--------+--------+--------+
| SPM-1 | .776 | .662 | .593 | .532 |
| ASPM-1 | .763 | .647 | .577 | .511 |
| PER-1 | .663 | .598 | .538 | .485 |
| bop_1.2-1 | .743 | .614 | .528 | .465 |
| WS48-1 | .734 | .598 | .515 | .462 |
+-----------+--------+--------+--------+--------+
Btw, I came to the 1.2 tradeoff from running the same test over 1977-2014 (I didn't include ASPM in this test because it didn't extend back to 1977). Each BoP number represents the usage-efficiency tradeoff for that version of the metric:
Code: Select all
+-----------+--------+--------+--------+--------+
| metric | Year-1 | Year-2 | Year-3 | Year-4 |
+-----------+--------+--------+--------+--------+
| SPM-1 | .775 | .658 | .590 | .534 |
| PER-1 | .660 | .594 | .532 | .486 |
| bop_1.2-1 | .741 | .609 | .521 | .465 |
| bop_1.3-1 | .741 | .609 | .521 | .464 |
| bop_1.1-1 | .741 | .608 | .520 | .465 |
| bop_1.4-1 | .740 | .609 | .521 | .463 |
| bop_1.0-1 | .741 | .607 | .519 | .465 |
| bop_0.9-1 | .741 | .606 | .518 | .465 |
| bop_0.8-1 | .740 | .604 | .516 | .464 |
| WS48-1 | .733 | .593 | .507 | .463 |
+-----------+--------+--------+--------+--------+
In any event, no matter how many times I look at which metric does the best job of predicting future team performance, the Statistical Plus/Minus metrics are always in a class by themselves, particularly as you use data from further out of the sample being predicted. I still need to plug Daniel's new BPM into this framework, but I would be surprised if it didn't perform much better than PER and Win Shares/48 in a similar test.