I have been preparing for some time now to reconstruct Box Plus/Minus (BPM), with a goal of addressing the major existing issues.
Here are some issues that have been identified (I will add more to this post as more are brought forward):
- Poor handling of outliers on offense
- Mishandling of interaction terms (related to the first)
- Poor estimation of defense
- Poor handling of blocks (as shown by college BPM being dominated by block%)
- Box score stats only (i.e. anything that can be calculated from the stats we have from the 80s.)
- No PbP stats, not even things like "assisted by" ratios.
- Nothing super complex that can't be done by someone with Excel and a good knowledge of math.
- Focus on Explanation, not Prediction. What happens should be credited to the team. No luck adjustment. (A good explanatory stat can be converted to a predictive stat with appropriate regression to the mean.)
Here is a sample of the data:
What I am interested in on this forum thread is to get ideas from the public about possible ways to reformulate the metric to achieve these goals as comprehensively as possible.
This will be an ongoing effort for some time.
For reference, the current BPM formulation is written up at https://www.basketball-reference.com/about/bpm.html