Since I did the test last year, not even one game from this season was predicted. RPM only had 3 years of data then. RAPM had 16 years of data. I found out in-sample was not an issue for BPM in 2001-16 so all games were predicted for BPM.
Prediction Target: 1984/85-2015/2016 - 38658 games (Regular and Playoff)
Source Data: 1983/84-2014/2015 (Regular: Playoff games were excluded after finding out the undesirable punishment effect in metric scores for players that take part in post-season games because of high intensity)
Tested Metrics: All popular advanced public metrics - RPM (2627 games were predicted - 2 years), RAPM (19345 games were predicted - 15 years), BPM, WS and PER. As for WP it was dropped very quickly because of very subpar performance. On top of those; MPG + USG + 2 different non-empirical simple linear boxscore metrics were included.
Retrodiction Method: Each game was predicted by calculating previous year's per-possesion or per-minute metric score depending on the metrics' formula and regular season average in the previous year for players that take part in the game. Then each team assigned a Total Metric Score by using the actual minutes or possessions and thus winner was predicted. On 2-6 rare occasions where the metric scores were equal to each other home team claimed winner because of the factor of HCA. Players below 250 MP in the previous year and rookies were assigned average values. Then each game's unique roster turnover rate was calculated depending on the new founding of teams, signings, trades, rookies etc and most importantly their in-game minutes for that particular game. E.g: 90% RT minutes from team A and 80% RT minutes from team B makes the game's turnover score 85%. For those that curious there were only 6 games where the roster turnover was 100%. It means both teams had 100% roster turnover and all minutes came from completely new players for those teams. Average roster turnover rate for games was around 33%.
PA/RT: Prediction Accuracy / Roster Turnover Graph.
Y axis: Prediction Accuracy
X axis: Roster Turnover.
In-Parentheses: Game Count. Full means the complete data (38658 games or 19345 games for RAPM comparisons and 2627 games for RPM comparisons). Half and quarter are obvious. % percentage numbers represent Roster Turnover Rate. Top 500-400-300 etc. means games with the most roster turnover rate. There has been lots of ties for roster turnover positions between the games so in those cases all of them were included. That's why half or quarter or top 300 200 etc. may actually represent a bit bigger numbers.
As promised;
RPM. RPM comparison became problematic because of the sample problem. I couldn't go to high roster turnover rates because the sample became very small and randomness took part. Still the graph pretty much shows the obvious trend and since RPM and BPM follows an extremly similar path it can be concluded RPM suffers the same consequences. If I could go to higher roster turnover rates both metrics would suffer and MPG would prevail.
RPM-BPM following extremly similar path when it's limited to RPM data size for comparison.
And BPM-MPG on the whole data (38658 games) once again.
PER, WS, USG and simple linear box-score metrics will come later.
Shortly, PER is a bit better. WS is worse. Simple linear box-score metrics do shine but still it's not enough.
As for Usage, I included it because once Kevin Pelton said something along the lines of Player Usage Rate translates to other teams and scenerios very well and that's why PER was closing the gap between BPM when the roster turnover rate became greater. It couldn't be more wrong. Usage was a lot worse than other metrics at any roster turnover rate.
Edit:For those that are curious, blends made it worse. In the best case scenerio, you could only get marginal improvements if you optimize for ONLY 1 RT LEVEL by sacricing the trend, graph and pretty much everything. That marginal improvement for an exact roster turnover rate is completely artificial. Blends do suffer overfitting a lot. But I see some still confuse this with predicting next year's team wins which is a completely different thing where RPM is directly and BPM is indirectly optimized for it.