bbstats' regression based on Statistical +/- and Net +/-

Crow · Post by **Crow** » Thu Jun 02, 2011 7:44 pm

I am not bothered by threads with multiple discussions within them but maybe it would be somewhat easier to have this in its own thread, since it is worthy of independent recognition and focus and would be easier to find in the future from its own thread.

bbstats
Post subject: Re: 2011 Finals - Mia vs Dal
Unread postPosted: Thu Jun 02, 2011 6:55 am
Offline

Joined: Thu Apr 21, 2011 1:25 pm
Posts: 20
I thought it would be interesting to make a regression based on Statistical +/- and Net +/- to analyze single-game stats. Using the two in tandem, I can explain the variation in 2011 RAPM with an R^2 of 0.8.

So I will be doing this for each game of the finals. Here is game 1. Small sample size on the Off-court or On-court possessions leads to some very noisy Net +/- numbers, but I'm very satisfied with what this looks like for game 1.

EDIT: Main storyline here: Dirk had the best game, but couldn't overcome Bosh+James. Bibby actually looked better than Chalmers just by advanced stats, but Plus-Minus tells a much different story.

Here's the basic formula:
Adj Rtg=0.16*s_ORAPM + 0.374*s_DRAPM +0.253*NetPlusMinus - 0.081

where
NetPlusMinus=Efficiency Margin On Court Per 100 - Efficiency Margin Off Court Per 100
Which you can derive from
=PlusMinus*100/(Min%*Pace) - (PlusMinus - TeamMOV)*100/((1-Min%)*Pace)
Where 48 minutes would be Min%=1.00, not 0.2

These are all in basketball-reference format, so eFG% & TS% are out of 1, Usg% etc are out of 100.
s_ORAPM=(ORTG*Usg%/100)*1.063 - Usg*0.946 + Stl%*0.41 + eFG%* 13.56 - TS%*21.6 + TOV%*0.41
(R^2 of .58 against 3-year RAPM, all significant to p<0.01 except TOV% is 0.025. Counter-intuitive coefficients basically act as corrections to the ORTG*Usg/100 term).

where higher is better (just like JE's RAPM numbers):
s_DRAPM=STL%*0.71 + BLK%*0.24 - DRTG*0.036 + ORTG*0.0855 + TRB%*3.28 - ORB%*1.7 - DRB%*1.55 + TOV%*0.038 - TS%*14.55
(R^2 of 0.36 against 3-year RAPM, all significant to p<0.01 except TOV% is 0.058)

Report this post
Top
Profile Send private message
Reply with quote
EvanZ
Post subject: Re: 2011 Finals - Mia vs Dal
Unread postPosted: Thu Jun 02, 2011 7:22 am
Offline

Joined: Thu Apr 14, 2011 3:41 pm
Posts: 87
Location: Hotlanta
What are the R^2 values for your statistical +/- and net +/- ratings applied to RAPM separately?

Report this post
Top
Profile Send private message
Reply with quote
bbstats
Post subject: Re: 2011 Finals - Mia vs Dal
Unread postPosted: Thu Jun 02, 2011 8:16 am
Offline

Joined: Thu Apr 21, 2011 1:25 pm
Posts: 20
I don't have any 3-year values for RAPM, but here are the R^2s that I can speak of.

3-year RAPM:

stats R^2 of ORAPM is 0.582846219
stats R^2 of DRAPM is 0.362107365

using ORAPM and DRAPM (0.9*O + 0.88*D) to predict total RAPM has an R^2 of 0.47790076

I don't have 3-year Net +/-, although just averaging Net ratings would do this.
However, 1-year Net +/- has an R^2 value of 0.7429 against 1-year RAPM.

EDIT: update:

1-year RAPM, I can get a better overall R^2 (0.845597007) with the following:

Code:
Coefficients P-value
TS% -5.428736055 0.00200997
ORB% -1.275170356 7.45836E-15
DRB% -1.217492048 1.24842E-13
TRB% 2.462921227 6.36515E-14
TOV% 0.037129144 0.000861596
USG% 0.015780302 0.069906937
ORtg 0.067946978 1.74526E-10
DRtg -0.047036021 7.28146E-13
Netper100 0.254334109 1.6014E-101

Sorry, didn't mean to turn this into a development thread. Back to the playoffs!

Report this post
Top
Profile Send private message
Reply with quote
EvanZ
Post subject: Re: 2011 Finals - Mia vs Dal
Unread postPosted: Thu Jun 02, 2011 11:26 am
Offline

Joined: Thu Apr 14, 2011 3:41 pm
Posts: 87
Location: Hotlanta
The p-value on USG...

Report this post
Top
Profile Send private message
Reply with quote
bbstats
Post subject: Re: 2011 Finals - Mia vs Dal
Unread postPosted: Thu Jun 02, 2011 12:08 pm
Offline

Joined: Thu Apr 21, 2011 1:25 pm
Posts: 20
Yeah, no bueno. But I wasn't very comfortable having ORTG without USG.

Anywho, I get better 1-year p-values and R^2 only by using ORTG, DRTG, Net +/-, and Min%*Team Efficiency margin.

Crow · Post by **Crow** » Thu Jun 02, 2011 7:48 pm

Would be interesting to see this new metric matched up your other recent metric proposals and Rhuidean's and others.

I wonder if it might be worthwhile to adopt a standard APBRmetric player ID and have that in metric files for easier comparisons. And maybe get back to posting files into the yahoo APBRmetric file group Ed Kupfer set up.
Ideally a super metric spreadsheet could be constructed and kept updated.

EvanZ · Post by **EvanZ** » Thu Jun 02, 2011 7:58 pm

Seems like bball-value's PID is becoming the de facto standard.

Crow · Post by **Crow** » Thu Jun 02, 2011 8:01 pm

A regression based on Statistical +/- and Net +/ does seem like an approach worth checking to see how good it is.

s_ORAPM=(ORTG*Usg%/100)*1.063 - Usg*0.946 + Stl%*0.41 + eFG%* 13.56 - TS%*21.6 + TOV%*0.41

Do you have a table that shows the impact of usage changes with all else equal? That would seem quite useful.

Would it be fair to say eFG% x standard deviations above average matters more, helps more than a similar standard deviation average on TS% or does the first term change this outcome? Does this model feature have sufficient support?

How about a table here too?

s_DRAPM=STL%*0.71 + BLK%*0.24 - DRTG*0.036 + ORTG*0.0855 + TRB%*3.28 - ORB%*1.7 - DRB%*1.55 + TOV%*0.038 - TS%*14.55

Do you have a table that shows the impact of rebound% changes with all else equal? That would seem quite useful as well, say by position averages for these rebound %s.

I might produce these 3 tables myself later if you haven’t already or select not to.

bbstats · Post by **bbstats** » Fri Jun 03, 2011 1:26 am

I really, really need to get Excel on my laptop. I only have my Chrome notebook and this little netbook...can't do much right now. I can try to do your suggestions later, Crow.

It is probably more interesting to point out that I introduced some new variables into the 1-year regression that forced most of the Advanced-Stats to have incredibly high p-values. After throwing out all but DRTG and ORTG, I still have a way-higher R^2 than before. I would completely change over to a system like this (especially if I did a longer study) if not for the fact that it tells us very little about what the player did, specifically. The regression is based on: ORTG, DRTG, Net +/-, and Min%*TeamEffMargin (I get an R^2 of .85 against 1-year RAPM for +500min players.

Although this seems to work better (higher R^2), it totally botches some of the top rankings: Paul Pierce and Kevin Garnett jump out to the top (from their great Net ratings).

EvanZ · Post by **EvanZ** » Fri Jun 03, 2011 1:38 am

Maybe Google Spreadsheet? Not sure how advanced it is, though.

Crow · Post by **Crow** » Mon Jun 06, 2011 7:55 am

All else held equal, a 1 percentage point change in usage leads to about a .2 change in ORAPM.

A 1 percentage point change in both eFG% and TS% would increase ORAPM by at least .5.

A player with the exact average rebound %s in the league for all players would receive just a tiny .01 positive bump in the formula for that. I guess it is well centered.

APBRmetrics

bbstats' regression based on Statistical +/- and Net +/-

bbstats' regression based on Statistical +/- and Net +/-

Re: bbstats' regression based on Statistical +/- and Net +/-

Re: bbstats' regression based on Statistical +/- and Net +/-

Re: bbstats' regression based on Statistical +/- and Net +/-

Re: bbstats' regression based on Statistical +/- and Net +/-

Re: bbstats' regression based on Statistical +/- and Net +/-

Re: bbstats' regression based on Statistical +/- and Net +/-