Reconstructing Box Plus/Minus

Home for all your discussion of basketball statistical analysis.
Adam H
Posts: 1
Joined: Sun Mar 31, 2019 3:47 pm

Re: Reconstructing Box Plus/Minus

Post by Adam H » Fri May 17, 2019 1:30 am

DSMok1 wrote:
Tue May 14, 2019 8:40 pm
One of the first things I have done with the new BPM framework is to run a fully linear analysis similar to Nathan Walker's SPR and Kevin Ferrigan's DRE. I still am running the "team adjustment" at the end to make it match the team rating.
Did you use per 100 possessions or per game data to determine the coefficients? I have not thought about nor put in anywhere near the amount work you have at this, but it appears to me that per game stats more accurately reflect single year RAPM than per 100 possessions stats or advance stats (other than TS%). I do not know if that remains true for multi-year RAPM.

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Fri May 17, 2019 2:53 am

Crow wrote:
Fri May 17, 2019 1:19 am
These are not the classic 4 Factors but it would be good to regularly offer these 4 components (or some other variation) for GmBPM2 and season level BPM2. Maybe BRef could also offer BPM2 on its splits page. More detail, less case that it is a blackbox hard to interpret. If BPM2 components were divided like the 4 factors it would ease comparison to RPM. DRE and SPR could be useful with splits as well, preferably standardized.




What is the difference between contribution and points generated?

In rough terms and on average, what % of total credit is funneling to players via the team adjustment? Any consideration of doing the team adjustment as something more discriminatory than same for anybody (on court and off when things happen)?
Contribution is the per 100 possessions GmBPM rating times the percentage of possessions actually played. Points created then translates that to the actual number of possessions played in this game. Based on the pace of the game.

I have not come up with any better way to split up the remaining team adjustment, which includes the intercept of the regression and any unassigned credit as well. It seems that some way could be developed where the post players could get more of the credit or debit for defense and perimeter players less, but I haven't figured out a good way mathematically.

Taking it another direction, it would be very informative and a better approach to actually do this on every stint during the game where the lineups remained the same. That way defense in particular could be better assigned to the players that actually were on the court. Then you could just sum it up at the end of the game to come up with a cumulative rating for that game.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Fri May 17, 2019 2:55 am

Adam H wrote:
Fri May 17, 2019 1:30 am

Did you use per 100 possessions or per game data to determine the coefficients? I have not thought about nor put in anywhere near the amount work you have at this, but it appears to me that per game stats more accurately reflect single year RAPM than per 100 possessions stats or advance stats (other than TS%). I do not know if that remains true for multi-year RAPM.
RAPM is denominated in terms of 100 possessions, and I have used per 100 possessions data for all of this GmBPM regression. In theory, that should match up as well as anything I would think.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

colts18
Posts: 304
Joined: Fri Aug 31, 2012 1:52 am

Re: Reconstructing Box Plus/Minus

Post by colts18 » Fri May 17, 2019 5:13 am

DSMok1 wrote:
Thu May 16, 2019 3:20 pm
Interesting GmBPM (that's the linear BPM) evaluation:

What if we let the points from 3 pointers, 2 pointers, and free throws each have a different value? Remember, this regression attributes value to the player AND ALL LEFTOVER VALUE TO THE REST OF THE TEAM. So--maybe some scoring shows more for the individual player, while other scoring is more generic (anybody could do it). Also, some scoring may be more valuable from a spacing perspective.

Does that make sense?

The results from the same GmBPM linear model:

https://docs.google.com/spreadsheets/d/ ... k841J7nXi8

Interestingly, points from free throws are worth the same as before, as are free throw attempts.

2 pointers have less value and 2 point attempts are less penalized. In other words, this regression is indicating 2 pointers don't matter as much to the player.

Conversely, 3 pointers matter considerably more! Made 3 pointers are worth more and missed 3 pointers are penalized more heavily!

This is very interesting to me. What would it look like if we had the data to split out at-rim from 2 point jumpers?

P.S. The rest of the coefficients in the GmBPM regression did not change significantly at all.
P.P.S. This helps elite shooters the most, depresses the value of 2pt scorers and bad 3 point shooters. In other words, it really helps Stephen Curry.
Have you thought about adding a height variable to the 3 point part of the regression? I would imagine that 3 pointers would be more valuable offensively for big men while not as valuable for smaller players. It could add accuracy so that the proper players are credited for being great 3 point shooters.

The same could be done for rebounds. Big men who don't rebound are almost always bad defenders. Defensive rebounding could mean more for big men than it does for PG's. A Russell Westbrook defensive rebound is not as valuable as a defensive rebound from a Center.

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Fri May 17, 2019 10:03 am

Well, for this linear GmBPM model, I won't be adding a height or any other interactions.

However, for the full new box plus minus model, these things certainly need to be explored further. I am leaning towards using a position indicator rather than height, since I think that is more applicable to what is happening on the court. I also want this regression to be portable to non-NBA context that may not have height available as an input.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Mike G
Posts: 4413
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Reconstructing Box Plus/Minus

Post by Mike G » Fri May 17, 2019 12:45 pm

...maybe some scoring shows more for the individual player, while other scoring is more generic (anybody could do it). Also, some scoring may be more valuable from a spacing perspective...
In an earlier table, you gave these values:
3fg 3.11
3fga -0.82
2fg 1.25
2fga -0.48
ft 0.74
fta -0.21

Adding the value of each make to the cost of the attempt, we get these net values for a made shot:

Code: Select all

pts  value   %
3    2.29   0.76
2    0.77   0.39
1    0.53   0.53
The final column is the fraction of the points attributed to the individual scorer, if I have interpreted this correctly.
Should more credit be given to the assist man, than to those who have perhaps set a screen or spaced the floor? Or perhaps done nothing useful?

Is a missed shot just as bad when teams are shooting .640 or .460? Or if they are rebounding especially well?
On a team with few scorers and good rebounding, it seems a low shooting% is not as detrimental. I'm thinking Westbrook/Thunder, where you have a few guys that shoot high% but are not good shot creators; and they get a lot of OReb. Then a 46% shot is not so bad as with the Warriors who shoot 60%.

Different playoff series may require different parameters to successfully assign individual credit. A 60% shooter who doesn't rebound may be a hero in one environment and a pariah in another.

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Fri May 17, 2019 1:51 pm

Mike G wrote:
Fri May 17, 2019 12:45 pm
...maybe some scoring shows more for the individual player, while other scoring is more generic (anybody could do it). Also, some scoring may be more valuable from a spacing perspective...
In an earlier table, you gave these values:
3fg 3.11
3fga -0.82
2fg 1.25
2fga -0.48
ft 0.74
fta -0.21

Adding the value of each make to the cost of the attempt, we get these net values for a made shot:

Code: Select all

pts  value   %
3    2.29   0.76
2    0.77   0.39
1    0.53   0.53
The final column is the fraction of the points attributed to the individual scorer, if I have interpreted this correctly.
Should more credit be given to the assist man, than to those who have perhaps set a screen or spaced the floor? Or perhaps done nothing useful?

Is a missed shot just as bad when teams are shooting .640 or .460? Or if they are rebounding especially well?
On a team with few scorers and good rebounding, it seems a low shooting% is not as detrimental. I'm thinking Westbrook/Thunder, where you have a few guys that shoot high% but are not good shot creators; and they get a lot of OReb. Then a 46% shot is not so bad as with the Warriors who shoot 60%.

Different playoff series may require different parameters to successfully assign individual credit. A 60% shooter who doesn't rebound may be a hero in one environment and a pariah in another.
Nicely done, Mike!

The logical thing is absolutely to extend this to assisted vs. unassisted, and also to split 2 pointers into at-rim vs. midrange.

That said, that data is outside of what I want to use for an historical BPM model, since it is only available since around 2000. Lots of directions for research!

Regarding team effects--at a seasonal level, BPM always judges shooting vs. the rest of the shooters on the team. That can't be done within a single game. It wouldn't be stable enough. Stability is an issue here.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Mike G
Posts: 4413
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Reconstructing Box Plus/Minus

Post by Mike G » Fri May 17, 2019 5:21 pm

... at a seasonal level, BPM always judges shooting vs. the rest of the shooters on the team. That can't be done within a single game. It wouldn't be stable enough.
I do it for playoff series. Since you have Team A vs Team B, league averages aren't even relevant as parameters. No additional "team adjustment" is required; team and player credits stabilize in unison.

Yeah, one game is sometimes bizarre, 2 is much better, and then the operation is pretty smooth.
The big adjustment is with rebounds; apparently there are team rebounds not assigned to any player, but just as important to the teams, as they result in a possession.
Team turnovers are an issue, and assists granted liberally to one or both teams.

You could try estimating assisted % of points. Once you make these corrections, correlations improve.

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Tue May 21, 2019 5:31 pm

An update on this "GmBPM2" Linear model effort:

I bootstrapped the standard errors of the regression coefficients shown above (the one with shooting from various locations split out) and found that the shooting terms were all highly correlated and had very large standard errors.

I decided to combine fg2a and fg3a, using fga instead. This removes a lot of the issue, without a penalty on the R^2.

The updated GmBPM2 linear model is below. The R^2 is 0.646.



Also included lower in that sheet is an interesting output from the bootstrapping procedure: a table of correlations between the coefficient estimates. It shows how the coefficients vary vs. one another between the various bootstrap iterations of the regression. For example, if my coefficient for FGA goes up, my coefficient for FG2 and FG3 go down. (Which makes sense). More interesting is how some of the other coefficients relate together.

Here is a look at this updated GmBPM2 Linear model Top 50:
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Mike G
Posts: 4413
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Reconstructing Box Plus/Minus

Post by Mike G » Wed May 22, 2019 1:16 am

Still no sign of Westbrook '17 : BPM = 15.6
And Harden peaked in 2015 ?

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Wed May 22, 2019 1:20 am

Mike G wrote:
Wed May 22, 2019 1:16 am
Still no sign of Westbrook '17 : BPM = 15.6
And Harden peaked in 2015 ?
I'm sorry, I should have clarified the dates. This is only 1997 through 2016, which is the 20 year sample I am working with.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

eminence
Posts: 137
Joined: Sun Sep 10, 2017 8:20 pm

Re: Reconstructing Box Plus/Minus

Post by eminence » Thu Jun 13, 2019 1:52 pm

Without too many details I think shooting changes are a pretty good idea, as I usually found 3pt spacers to be the most underrated archetype (though BPM handled them better than most box-score stats).

vzografos
Posts: 23
Joined: Thu Sep 06, 2018 10:42 am

Re: Reconstructing Box Plus/Minus

Post by vzografos » Thu Jun 13, 2019 3:21 pm

DSMok1 wrote:
Fri Apr 12, 2019 4:55 pm
  1. Box score stats only (i.e. anything that can be calculated from the stats we have from the 80s.)
  2. No PbP stats, not even things like "assisted by" ratios.
  3. Nothing super complex that can't be done by someone with Excel and a good knowledge of math.
  4. Focus on Explanation, not Prediction. What happens should be credited to the team. No luck adjustment. (A good explanatory stat can be converted to a predictive stat with appropriate regression to the mean.)

How are you getting on?

Not sure what your 4th goal actually means, but if I understand correctly you are trying to make a new "feature" (or call it metric) that has similar performance to the old Box +/- but without it's shortcomings (as you mention them) right?

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Thu Jun 13, 2019 4:47 pm

vzografos wrote:
Thu Jun 13, 2019 3:21 pm
DSMok1 wrote:
Fri Apr 12, 2019 4:55 pm
  1. Box score stats only (i.e. anything that can be calculated from the stats we have from the 80s.)
  2. No PbP stats, not even things like "assisted by" ratios.
  3. Nothing super complex that can't be done by someone with Excel and a good knowledge of math.
  4. Focus on Explanation, not Prediction. What happens should be credited to the team. No luck adjustment. (A good explanatory stat can be converted to a predictive stat with appropriate regression to the mean.)

How are you getting on?

Not sure what your 4th goal actually means, but if I understand correctly you are trying to make a new "feature" (or call it metric) that has similar performance to the old Box +/- but without it's shortcomings (as you mention them) right?
Gradually working on this project!

Yes, the idea is to maintain the existing general concept of BPM (i.e. historic applicability, general structure) and significantly improve the handling of outliers.

Thus far I have focused on the linear version of BPM, currently called GmBPM. It should be very stable and should handle outlier numbers very well.

Then I intend to build upon that framework and add nonlinear terms as appropriate to help handle nuances while hopefully not destroying applicability to outlier values (like the existing BPM did).
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

DSMok1
Posts: 902
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Reconstructing Box Plus/Minus

Post by DSMok1 » Thu Jun 13, 2019 5:11 pm

Recently, I evaluated another angle to the linear GmBPM regression.

I allowed a different intercept/constant for each position, just to see what would happen to the coefficients.

Interestingly, only one coefficient changed much. The AST coefficient jumped from 0.43 to 0.60. The coefficients for all of the other terms stayed almost exactly the same.

The constants for the positions were PG = 0 (baseline), SG =1.06, SF = 1.62, PF = 1.70, C = 1.56.

What are thoughts as to the reasons behind this behavior? I've got a few ideas I am in the process of exploring.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Post Reply