mewfert wrote: ↑Tue May 07, 2019 3:59 am
Couple of random ideas to make BPM potentially more translatable & cross-comparable to NCAA & other leagues:
1) For the minutes variable, explicitly tie it somehow to the length of the game for a particular league. Thinking about NBA vs. NCAA, 48 vs. 40 minutes. Should the minutes input get stretched out by 20% for NCAA? Related: the 4 additional games for the "regressed" MPG could also be tied to some standard of league season length. Since NCAA teams are playing half as many games, this regression has twice as much effect.
The MPG variable is a little unusual, because it doesn't represent anything at all from a game-state perspective. In fact, it's more of a prior than anything, representative of "the little things that a coach picks up on that aren't shown in the box score." It's highly correlated with a number of the other inputs.
I would love to come up with some better way to handle playing time. Playing time very clearly adds information beyond the box score statistics, and the regression fits reflect that. That is why it was included.
That said, for a true linear "BPM-style" statistic, MPG should not be included (i.e. one that could be used on a one-game sample.) It does not stabilize very quickly at all compared to other statistics since it is measured each game rather than each possession.
I welcome thoughts on the meaning, interpretation, and usefulness of playing time as an input.
mewfert wrote: ↑Tue May 07, 2019 3:59 am2) For the team adjustment, rather than using pure efficiency margin use something like standard deviations above the mean. Problem here is that the EM "spread" in college is much wider than in NBA, which I believe has the effect of ultimately goosing the BPM for players on better teams. Also, different methods of calculating adjusted efficiencies can give different spreads. For example, in NCAA the BB Reference adjusted ratings have a significantly wider spread than the Kenpom ratings: 2019 Virginia is +39 at Basketball Reference and +34 at Kenpom; the 20th ranked team is +24 versus +20; similar differences at the other end.
Anyhow, thanks for doing this and looking forward to seeing what you come up with!
This is an interesting discussion. At it's core, a pure efficiency margin should be straightforward to replicate--as long as you are not using priors. Once priors are included, or garbage time is excluded, it gets much less replicable.
The very center of the BPM method is that it should sum to the team's overall efficiency differential, as the best unbiased measure of the team's overall strength.
I do not necessarily feel that a properly constructed BPM will have the same issue of over-valuing players on good teams. I think that a poor handling of a few of the input variables is messing with outlier statistics that are compiled on some of the good NCAA teams.