APBRmetrics

Posted: **Fri Oct 10, 2014 3:29 pm**

In the last couple of days/weeks I've been trying to "improve" my SPM (Statistical Plus Minus). As most of you probably know, SPM serves as a prior ("starting point") for xRAPM.
I would like to improve my SPM's accuracy since that will, in turn, lead to better xRAPM accuracy. No SPM will ever be perfect and, to some degree, the RAPM-part of xRAPM will be able to rectify some of the most egregious errors of SPM, but I think there's more room for improvement in the SPM-world than there's in the RAPM-world, so that's what I'm working on.

Why are inaccurate SPM ratings a problem for xRAPM? Imagine the (offensive) SPM for two random players A and B (that play the same position one one team) to be +0 (A) and +4 (B), when in reality A is actually +4, and B is 0 - this case can easily occur with two players where one is more of a "stat stuffer" who fails to do all the little things (think Ricky Davis), and the other is doing all those things not recorded in the BoxScore like setting good screens etc. (think Battier, Nick Collison etc.).
To get xRAPM, we feed those "somewhat off" SPM values into RAPM. Even if we're lucky and lineups of player A perform 4 points better than lineups with player B (they might not because of noise), it will take some time and data before A's rating approaches 4 and B's rating approaches 0. With just half a season, both players might be at +2. Due to the SPM's being off, players that primarily play with B will be expected to have a higher +/- than players that play with A. Since they probably won't have a higher +/- with B (because A is the better player) their xRAPM rating will (incorrectly) suffer. As you can see, SPM values that are off not only give a wrong impression of the player in question, but maybe also for teammates (and to a small degree, opponents)

To test my own SPM for glaring weaknesses I came back to an idea I had a couple of years back:
- For every year in the dataset compute SPM ratings
- Use SPM ratings from year X-1 and lineup data from year X to compute "expected team points scored while on court" and "expected opponent points scored while on the court" for every player
- compute the difference between "expected points" <-> "actual points" for every player
- sum up the differences over the entire sample to identify players that are regularly over- or underrated by SPM. You could instead look at things on a year to year basis, but for now I'll work with the totals

Here's the list of most under- and overrated players, for offense, since 2002 (Please note that you could create such a list of "most blatant errors" for every NBA rating system that humankind will create.)

Code: Select all

╔══════════════════════╦══════════════════════╗
║         Name         ║ Actual-Exp (offense) ║
╠══════════════════════╬══════════════════════╣
║ Dirk Nowitzki        ║                 1016 ║
║ Steve Nash           ║                  731 ║
║ Kobe Bryant          ║                  619 ║
║ Kevin Garnett        ║                  483 ║
║ Mike Conley          ║                  478 ║
║ Chris Paul           ║                  472 ║
║ Hedo Turkoglu        ║                  470 ║
║ Zach Randolph        ║                  459 ║
║ Eduardo Najera       ║                  441 ║
║ Carlos Boozer        ║                  394 ║
║ ..                   ║                      ║
║ Antoine Walker       ║                 -354 ║
║ Jacque Vaughn        ║                 -368 ║
║ Ricky Davis          ║                 -379 ║
║ Paul Pierce          ║                 -405 ║
║ Jason Collins        ║                 -408 ║
║ Shareef Abdur-Rahim  ║                 -411 ║
║ Gerald Wallace       ║                 -412 ║
║ Mark Blount          ║                 -417 ║
║ Richard Jefferson    ║                 -458 ║
║ Vince Carter         ║                 -592 ║
╚══════════════════════╩══════════════════════╝

This is basically saying that, since 2002, Dirk Nowitzki's lineups have scored 1000 points more than SPM expected them to score. As such, Nowitzki is responsible for a lot of SPM's OOS error. Our goal should be, to some degree at least, to find ways so that SPM can better seperate the Nowitzki's from the Vince Carters.
Here's the pitfall: We probably shouldn't do so at all cost. If we start adding terms to SPM that might not actually make sense, from a basketball standpoint, we might reduce in sample error, but probably not out of sample error. But obviously if we find a glaring weakness of (any kind of) SPM we should be trying to attack and improve it.

My first reaction after seeing Nowitzki and Nash at the top was to add numbers from a shot metric like EvanZ had created where a player's shooting efficiency from specific spots on the court gets compared to the average efficiency in those spots. Nowitzki and Nash generally scored pretty well in that metric. I don't actually have shot location data, so my next thought was to use 3P%*2P%*FT% (Nash is part of the 50/40/90 club and Nowitzki came close), or possibly 3P%*2P%*FT%*(FGA+FTA*0.5). The list above was actually generated after adding these terms. (Nowitzki's error was even higher before adding it) Nowitzki, unfortunately, is not easy to seperate using BoxScore stats. He doesn't excel at rebounding/assists/blocks/steals.
Another problem is that Kobe Bryant and Vince Carter are at completely opposite sides of the spectrum even though their statistical profiles are similar

So, if anyone has any ideas about possible interaction terms or stats that might be extractable from PlayByPlay, which could help lower the SPM rating of the overrated and raise the rating of the underrated, let me know. It can't be '+/-' though, as that's something that the RAPM-part of xRAPM will take care of, anyway. The plan is to add useful terms to SPM, recompute the list of "most under/overrated" and then look for new terms. It's a process that can go on forever, and at some point should be using other sources of data

Posted: **Fri Oct 10, 2014 3:55 pm**

I don't know what terms you are using already, but my latest ASPM construction uses the following terms--check for any you may not have:

Code: Select all

Raw BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% – g*USG%*TO% +
h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr – Lg3PAr) – k] + l*sqrt(AST%*TRB%)

╔════════╦═════════════════════╦═══════════╦═════════════════╗
║ Coeff. ║        Term         ║ BPM Value ║ Variable Format ║
╠════════╬═════════════════════╬═══════════╬═════════════════╣
║ a      ║ Regr. MPG           ║ 0.120051  ║ 48.0            ║
║ b      ║ ORB%                ║ 0.137600  ║ 100.0           ║
║ c      ║ DRB%                ║ -0.151938 ║ 100.0           ║
║ d      ║ STL%                ║ 1.144182  ║ 100.0           ║
║ e      ║ BLK%                ║ 0.449468  ║ 100.0           ║
║ f      ║ AST%                ║ -0.310548 ║ 100.0           ║
║ g      ║ TO%*USG%            ║ 0.723784  ║ .000*100.0      ║
║ h      ║ Scoring             ║ 0.610605  ║                 ║
║        ║   USG%              ║           ║ 100.0           ║
║        ║   TO%               ║           ║ .000            ║
║        ║   TS% & TmTS%       ║           ║ .000            ║
║ i      ║   AST Interaction   ║ 0.019936  ║ 100.0           ║
║ j      ║   3PAr Interaction  ║ 0.380536  ║ .000            ║
║ k      ║   Threshold Scoring ║ 0.269667  ║                 ║
║ l      ║ sqrt(AST%*TRB%)     ║ 0.691501  ║ 100.0*100.0     ║
╚════════╩═════════════════════╩═══════════╩═════════════════╝

All of those terms/interactions were found to be very highly significant when regressing onto your long term (14 year) RAPM, and no other terms helped much at all.

Are any of those terms ones you don't use currently?

Also, that pretty much matches up to the terms used by Neil Paine for the FiveThirtyEight SPM, though he added FTr as well, and used cuberoot(AST*TRB*PTS) instead of my sqrt(AST%*TRB%).

FWIW, here is my regression output when I was doing variable selection for this, my final model. (Note--this was an unweighted regression and without team adjustments).

Code: Select all

╔═══════════════════════╦════════╗
║ Regression Statistics ║        ║
╠═══════════════════════╬════════╣
║ Multiple R            ║ 0.7297 ║
║ R Square              ║ 0.5325 ║
║ Adjusted R Square     ║ 0.5261 ║
║ Standard Error        ║ 1.9504 ║
║ Observations          ║ 894    ║
╚═══════════════════════╩════════╝
ANOVA
╔════════════╦═════╦════════╦═══════╦══════╦════════════════╗
║            ║ df  ║   SS   ║  MS   ║  F   ║ Significance F ║
╠════════════╬═════╬════════╬═══════╬══════╬════════════════╣
║ Regression ║  12 ║ 3816.9 ║ 318.1 ║ 83.6 ║   2.19E-136    ║
║ Residual   ║ 881 ║ 3351.4 ║ 3.804 ║      ║                ║
║ Total      ║ 893 ║ 7168.3 ║       ║      ║                ║
╚════════════╩═════╩════════╩═══════╩══════╩════════════════╝

╔═══════════════╦══════════════╦═══════════╦══════════╦═══════════╦═══════════╦═══════════╗
║               ║ Coefficients ║ Std. Err. ║  t Stat  ║  P-value  ║ Lower 95% ║ Upper 95% ║
╠═══════════════╬══════════════╬═══════════╬══════════╬═══════════╬═══════════╬═══════════╣
║ Intercept     ║ -5.6659      ║ 0.7287    ║  -7.7754 ║ 2.096E-14 ║ -7.0961   ║ -4.2358   ║
║ AST%          ║ -0.2642      ║ 0.0468    ║  -5.6484 ║ 2.185E-08 ║ -0.3560   ║ -0.1724   ║
║ ORB%          ║  0.1366      ║ 0.0432    ║   3.1600 ║ 1.632E-03 ║  0.0518   ║  0.2214   ║
║ DRB%          ║ -0.1278      ║ 0.0346    ║  -3.6912 ║ 2.369E-04 ║ -0.1957   ║ -0.0598   ║
║ MPG           ║  0.1006      ║ 0.0133    ║   7.5401 ║ 1.168E-13 ║  0.0744   ║  0.1267   ║
║ STL%          ║  1.1382      ║ 0.1459    ║   7.8005 ║ 1.741E-14 ║  0.8518   ║  1.4246   ║
║ BLK%          ║  0.3936      ║ 0.0686    ║   5.7349 ║ 1.341E-08 ║  0.2589   ║  0.5283   ║
║ Shot%         ║ -0.9725      ║ 0.0734    ║ -13.2545 ║ 1.073E-36 ║ -1.1165   ║ -0.8285   ║
║ TS%xShot%     ║  1.4735      ║ 0.1225    ║  12.0330 ║ 5.518E-31 ║  1.2332   ║  1.7138   ║
║ TOV%xUSG%     ║ -0.0075      ║ 0.0014    ║  -5.2554 ║ 1.854E-07 ║ -0.0103   ║ -0.0047   ║
║ AST%xShot%    ║  0.0109      ║ 0.0021    ║   5.2075 ║ 2.384E-07 ║  0.0068   ║  0.0150   ║
║ sqrt(TRBxAST) ║  0.6262      ║ 0.0769    ║   8.1457 ║ 1.285E-15 ║  0.4753   ║  0.7771   ║
║ 3PAr*Shot%    ║  0.2281      ║ 0.0369    ║   6.1795 ║ 9.819E-10 ║  0.1556   ║  0.3005   ║
╚═══════════════╩══════════════╩═══════════╩══════════╩═══════════╩═══════════╩═══════════╝

Posted: **Fri Oct 10, 2014 4:05 pm**

J.E. looking at your xRAPM stats from your site, I noticed how extreme some of the defensive values are. I calculated the 01-14 xRAPM offensive and Defensive RAPM Standard deviations and they are very close (1.89 offense to 1.84 defense). But on regular RAPM, the standard deviation on offense is much higher. That's why your xRAPM has someone like Ben Wallace really high with consistent 9+ defensive xRAPM seasons even though I don't think he was that good on defense. Maybe taking out the height variable might make xRAPM more consistent.

Posted: **Fri Oct 10, 2014 6:20 pm**

That's why your xRAPM has someone like Ben Wallace really high with consistent 9+ defensive xRAPM seasons even though I don't think he was that good on defense.

not only was ben wallace that good on defense, he was very good overall at generating wins, for quite some time. his 01-02 season simulation shows generates wins at a rate similar to that of some of the best seasons of shaq, olajuwon, etc., and a few of his other seasons are not far from that...

it's nice to see a rating system rate high defensive minded players like ben wallace, mookie blaylock, shawn bradley, and andrei kirilenko, because those players did in fact generate wins at high rates, primarily because of their defense...

Posted: **Fri Oct 10, 2014 6:59 pm**

bchaikin wrote:That's why your xRAPM has someone like Ben Wallace really high with consistent 9+ defensive xRAPM seasons even though I don't think he was that good on defense.

not only was ben wallace that good on defense, he was very good overall at generating wins, for quite some time. his 01-02 season simulation shows generates wins at a rate similar to that of some of the best seasons of shaq, olajuwon, etc., and a few of his other seasons are not far from that...

it's nice to see a rating system rate high defensive minded players like ben wallace, mookie blaylock, shawn bradley, and andrei kirilenko, because those players did in fact generate wins at high rates, primarily because of their defense...

xRAPM has Ben Wallace as a top 5 player every year from 01-06 including being the #1 player in 2006. xRAPM is basically saying that Wallace's defensive impact equals the offensive impact of LeBron and Nash. I find that hard to believe. Regular RAPM never had top flight defensive players on the level of LeBron and Nash's offense.

Top 10 defensive xRAPM seasons

Year Name Def per 100
2004 Ben Wallace 9.8
2006 Ben Wallace 9.2
2003 Ben Wallace 8.9
2005 Ben Wallace 7.8
2002 Ben Wallace 7.7
2001 Ben Wallace 7.6
2001 Shawn Bradley 7.5
2012 Dwight Howard 7.4
2001 Dikembe Mutombo 7.4
2011 Dwight Howard 7.2

Do you think its likely that Ben Wallace had the 6 best defensive seasons since 2001? I doubt it. The stat is probably overrating shotblockers which is why you have to go all the way down to #20 to find the first Tim Duncan season and #34 to find the first KG season.

Posted: **Fri Oct 10, 2014 7:29 pm**

If you don't correct for home scorekeeping bias, you can overrate certain players. Ben Wallace is a prime example.
In 2002-03, he blocked 140 shots at home and 90 on the road. Was he "given" an extra 50 or so blocks at home?
In '04 again an "extra" 52 blocks at home, in equal minutes as away.

In 2005, he had 58 blocks in away games and 118 in home games -- more than twice as many!
For his career, he avg'd 2.68 Blk/36 at home and 2.10 on the road -- 2.39 overall.

If you correct by the factor (2.10/2.39), then his career Blk% may be not 5.0 but 4.4
How does that affect his SPM ?

Posted: **Fri Oct 10, 2014 7:39 pm**

xRAPM has Ben Wallace as a top 5 player every year from 01-06 including being the #1 player in 2006.

from 00-01 to 05-06, detroit was the league's 2nd best defensive team (only san antonio was better) allowing just 99.7 pts/100poss. ben wallace played by far the most minutes of any pistons player during that time (17165, no other single pistons player played more than 11200 min), and he played 1/7 of the team's total minutes played (regular season)...

he alone accounted for:

- 29% of the team's total def rebs (also close to 1/3 of the team's total off rebs)...
- 1/5 of the team's total steals...
- over 2/5 of the team's total blocked shots...

and had more rebs and blocks than tim duncan those 6 seasons playing less minutes (17165/17198), and over twice as many steals...

he was named all-D 1st team 5 of those 6 seasons, DPOY 4 of those 6 seasons (almost 5 DPOYs in a row, he was 2nd in the voting in 03-04)...

xRAPM is basically saying that Wallace's defensive impact equals the offensive impact of LeBron and Nash. I find that hard to believe.

i certainly don't - he was as close to a bill russell clone as we are ever likely to see...

Posted: **Fri Oct 10, 2014 8:04 pm**

bchaikin wrote:xRAPM has Ben Wallace as a top 5 player every year from 01-06 including being the #1 player in 2006.

from 00-01 to 05-06, detroit was the league's 2nd best defensive team (only san antonio was better) allowing just 99.7 pts/100poss. ben wallace played by far the most minutes of any pistons player during that time (17165, no other single pistons player played more than 11200 min), and he played 1/7 of the team's total minutes played (regular season)...

he alone accounted for:

- 29% of the team's total def rebs (also close to 1/3 of the team's total off rebs)...
- 1/5 of the team's total steals...
- over 2/5 of the team's total blocked shots...

and had more rebs and blocks than tim duncan those 6 seasons playing less minutes (17165/17198), and over twice as many steals...

he was named all-D 1st team 5 of those 6 seasons, DPOY 4 of those 6 seasons (almost 5 DPOYs in a row, he was 2nd in the voting in 03-04)...

xRAPM is basically saying that Wallace's defensive impact equals the offensive impact of LeBron and Nash. I find that hard to believe.

i certainly don't - he was as close to a bill russell clone as we are ever likely to see...

In 2006 Ben Wallace had a +9.2 defensive xRAPM. That is basically saying his supporting cast was way below average defensively and he propped them up. In 2006 the Pistons finished 5th in defense with a 103.1 D rating. In 2007 without Wallace they only drop to 7th with a 104.2 D rating. If he was that great, then the Pistons wouldn't have finished so high on defense without him.

Posted: **Fri Oct 10, 2014 8:39 pm**

In 2006 Ben Wallace had a +9.2 defensive xRAPM. That is basically saying his supporting cast was way below average defensively and he propped them up.

i am not defending any particular +/- rating, just saying he was a great defender, and that it's nice to see a rating system rate defensive minded players high...

If he was that great, then the Pistons wouldn't have finished so high on defense without him.

ben wallace won 4 DPOYs in 5 seasons. how great of a defender do you have to be to do that? btw no one before or since has won 4 DPOY's over 5 seasons - only him, and the only other player to win as many as 4 DPOY's was dikembe mutombo...

Posted: **Fri Oct 10, 2014 9:05 pm**

Regarding Steve Nash being one of the top 'overachievers' in RAPM vs ASPM: He's also at the top of career Away/Home assist differential.
I don't know home and away Ast%, but b-r.com gives up the totals. Teams tend to make more FG at home, but over a long career, individual players are likely affected about the same.

So here are the top 10 assist guys in Nash's prime years, 2001-2012:

Code: Select all

. Ast/36 : Home     Away       fac      Ast%     adj%
Nash       9.63     9.92     1.015      44.5     45.2
Kidd       9.22     8.15      .938      39.7     37.3
A Miller   8.19     7.04      .925      34.9     32.3
B Davis    8.19     6.98      .920      36.5     33.6
Billups    6.70     5.59      .909      29.3     26.6

Paul      10.33     9.28      .947      46.3     43.8
LeBron     6.42     6.11      .975      34.1     33.2
Parker     6.84     6.38      .966      32.4     31.3
D Williams 9.20     8.44      .957      42.7     40.9
Kobe       4.94     4.43      .946      25.1     23.7

It doesn't do much to bump an assist rate by 1.5%
But relative to everyone else -- and relative to the value of an 'average' Ast%-point -- Nash's assists may be 'worth' about 7% more. The other nine avg .943 in their 'correction factor.'

Suns' scorekeepers in the Nash era were very stingy, relative to others, in awarding assists. Visiting teams also had very low Ast/FG in Phx.

Posted: **Sun Oct 12, 2014 8:49 pm**

Thank you all for suggestions.

DSMok1, thanks for posting the ASPM formula. I do have several questions:
- you choose to take sqrt of (AST%*TRB%), but not for (USG%*TO%). Any reason for this?
- at some point in the formula you have a constant: "2*(TS% - TmTS%)". Why not replace the "2" with a variable?
- Did you run a regression with the those variables for offense and defense seperately, or is this the formula for regressing onto "total RAPM"?
- Can you say a few words on how you came up with the

Code: Select all

h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr – Lg3PAr) – k]

term? (what, in a "basketball sense", does it represent?)

As for Ben Wallace, his average SPM from 2001 to 2012 was +5.3. His worst defensive SPM rating was in '08 with a +3.7, with his best being a +8.8 in 2002. Even after being given those stellar defensive ratings he's the 7th most underrated defender, by SPM, since 2001 (he's 7th on the "actual points scored against - expected points scored against" list). I'd say xRAPM "did the right thing" by bumping up his defensive ratings to what you see, and I'd agree that he was "that good on defense"

The home/away splits are definitely interesting. I'm wondering if we can, at some point, completely remove scorekeeper bias thanks to SportVU

Posted: **Sun Oct 12, 2014 9:05 pm**

J.E. wrote:The home/away splits are definitely interesting. I'm wondering if we can, at some point, completely remove scorekeeper bias thanks to SportVU

You'd always have debate about a particular assist or block.

Or, you could just use Ast% and Blk% in a player's Away games.

Posted: **Sun Oct 12, 2014 9:43 pm**

he's 7th on the "actual points scored against - expected points scored against" list

out of curiosity just what is this list?...

Posted: **Mon Oct 13, 2014 11:39 am**

J.E. wrote:Thank you all for suggestions.

DSMok1, thanks for posting the ASPM formula. I do have several questions:
- you choose to take sqrt of (AST%*TRB%), but not for (USG%*TO%). Any reason for this?
- at some point in the formula you have a constant: "2*(TS% - TmTS%)". Why not replace the "2" with a variable?
- Did you run a regression with the those variables for offense and defense seperately, or is this the formula for regressing onto "total RAPM"?
- Can you say a few words on how you came up with the
Code: Select all
h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr – Lg3PAr) – k]
term? (what, in a "basketball sense", does it represent?)

- Firstly, theoretically I'd like to keep as much as possible linear in terms of team possessions or opportunities. USG%*TO% = TO/Tm Poss; sqrt(AST%*TRB%) has a denominator of "opportunities". Secondly, these forms were the best in the variable selection.
- There's already a variable for those terms if you multiply everything through--that was just the way I like to think about the regression.
- This is the regression for total; I also ran the same regression to "split" the total into O and D. I run the regression a second time to minimize error on O and D at the same time, where the new regression predicts O and D is defined as total - O.
- It represents (shot usage)*(efficiency or value of said shots). You can also multiply it through and end up with a string of simple interaction terms.

J.E. wrote:As for Ben Wallace, his average SPM from 2001 to 2012 was +5.3. His worst defensive SPM rating was in '08 with a +3.7, with his best being a +8.8 in 2002. Even after being given those stellar defensive ratings he's the 7th most underrated defender, by SPM, since 2001 (he's 7th on the "actual points scored against - expected points scored against" list). I'd say xRAPM "did the right thing" by bumping up his defensive ratings to what you see, and I'd agree that he was "that good on defense"

The home/away splits are definitely interesting. I'm wondering if we can, at some point, completely remove scorekeeper bias thanks to SportVU

ASPM gives him the best 5 defensive seasons in modern history (since, from 02 to 07, excepting 05. All in the range of +6.0 to +6.5. That's just a box score stat, though.

I think our gold standard would be the 14 year average RAPM: http://public.tableausoftware.com/profi ... 14YearRAPM

Any other stat is likely going to be more biased, in my opinion.

That shows KG averaging +7.3 over the last 14 seasons, and clearly ahead of the pack. Duncan second at +5.9 over that period.

Posted: **Mon Oct 13, 2014 5:17 pm**

dsmok1, if asst% helps on offense, would it help on defense? if you reference player and team ts% on offense, have you tried doing so on defense? Is the confidence interval on steals and blocks (among the few defensive variables) comparatively high? Can that be reduced with more defensive variables? Have you considered defensive versions of most or all of your interactive variables? Overall, why isn't the defensive side as much as possible the flipside of what you've with the offensive side? Even if you dont do exactly the same, why not at least do something to simulate that part of player impact? Why are you comfortable or resigned to leaving the defense without shot defense, spacing defense and assist defense beyond steals?

If, as I expect, the main answer is that counterpart defensive data are not perfect because of team defense and imperfect matchup, I'll go back to is it better to be right 60-70% and off somewhat on the rest or just leave shot defense and other nuance out 100%? Would it be worth even a trial look or simultaneous existence? If you still opt to leave a lot out, do you think you should have bold notes or footnotes stating this so the lesser informed on the details know about these major absences in ASPM?

APBRmetrics

Possible Steps to improve SPM

Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM

Re: Possible Steps to improve SPM