supersub15
PostPosted: Mon May 17, 2010 10:22 am Post subject: Finding a ratio of offensive output to usage Reply with quote
This idea was kicked around in a thread on RealGM and thought it'd be a good idea to discuss it here.
Basically, we want to show how a player's efficiency in producing points for his team when he has the ball in his hands as compared to his usage level.
The poster came up with the following formula (using Hollinger's Usage Rate formula):
{[(PPG+APG) – TOPG]*(lg_pace/team_pace)}/adj_UsgR
The Usage Rate is adjusted to each player's actual minutes per game, instead of the per40 used by Hollinger.
You end up with results like this:
LeBron – 1.13:1
Carmelo - 0.95:1
Kobe - 0.97:1
Wade - 1.02:1
M. Ellis - 0.89:1
The original thread is here: http://forums.realgm.com/boards/viewtop ... 25&start=0
Thoughts?
Back to top
View user's profile Send private message
Jon Nichols
Joined: 18 Aug 2005
Posts: 370
PostPosted: Mon May 17, 2010 10:49 am Post subject: Reply with quote
Interesting idea, but why not just use something like Offensive Rating/Usage Rate? Better yet, standardize them with z-scores. Or perhaps you are trying to ignore offensive rebounds?
If you want to go with just points, assists, and turnovers, weighting them differently might make sense.
Back to top
View user's profile Send private message AIM Address
DSMok1
Joined: 05 Aug 2009
Posts: 592
Location: Where the wind comes sweeping down the plains
PostPosted: Mon May 17, 2010 11:56 am Post subject: Reply with quote
Jon Nichols wrote:
Interesting idea, but why not just use something like Offensive Rating/Usage Rate? Better yet, standardize them with z-scores. Or perhaps you are trying to ignore offensive rebounds?
If you want to go with just points, assists, and turnovers, weighting them differently might make sense.
I agree with this. Assists do not equal points, necessarily...
Back to top
View user's profile Send private message Send e-mail Visit poster's website
bgassassin
Joined: 08 May 2010
Posts: 5
PostPosted: Thu May 27, 2010 1:27 am Post subject: Reply with quote
Looks like I missed this. LOL.
I'll discuss it further with you all later since my goal was to come here for a second round of testing.
Back to top
View user's profile Send private message
DSMok1
Joined: 05 Aug 2009
Posts: 592
Location: Where the wind comes sweeping down the plains
PostPosted: Thu May 27, 2010 9:19 am Post subject: Reply with quote
I just calculated a formula that is similar to this, as part of my latest Statistical Plus/Minus regressions.
The results are output in terms of points/100poss added or subtracted from the team efficiency differential.
I start with numbers that are already pace-independent so as to avoid having to use a pace adjustment.
Here we go:
((TS%*2*(1-1.357*TOV%))-0.878)*USG%*90.131-0.228
What's going on here? First, TS% is converted into PPP by multiplying by 2. Then, the turnovers are removed--but a turnover is worth more than 1 point; it's actually 1.357 (according to the regression). Then, a points per possession threshold is applied above which additional usage helps, and below which additional usage hurts. That is the 0.878. Then, the actual USG% is multiplied, and then a coefficient to scale the result into actual points. 0.228 is the league-wide average for contribution; subtracting this out will make the league total sum to 0. All terms are as defined by Basketball Reference; thus USG% does not include assists.
If you want to add in the assist term, add in AST%*13.442-1.951.
Top players over the past 5 years in this rating (including the assist term):
Code:
Player Season Tm TS% AST% TOV% USG% Sco+Ast Value
Chris Paul 2008-09 NOH 0.599 54.5 13.5 27.5 7.65
LeBron James 2009-10 CLE 0.604 41.8 12.3 33.5 7.32
LeBron James 2008-09 CLE 0.591 38 11 33.8 6.82
Chris Paul 2007-08 NOH 0.576 52.2 12.1 25.7 6.81
Dwyane Wade 2008-09 MIA 0.574 40.3 11.6 36.2 6.16
Chauncey Billups 2005-06 DET 0.602 39.3 12 22.9 5.79
Steve Nash 2006-07 PHO 0.654 50.1 21 22.9 5.74
Chauncey Billups 2007-08 DET 0.619 34.7 13 23 5.43
LeBron James 2007-08 CLE 0.568 37.3 11.4 33.5 5.33
Amare Stoudemire 2007-08 PHO 0.656 7.5 10.3 28.2 5.21
Steve Nash 2005-06 PHO 0.632 44.4 19 23.3 5.06
Jose Calderon 2007-08 TOR 0.607 42.3 14.2 16.8 5.06
LeBron James 2005-06 CLE 0.568 32.8 10.7 33.6 5.06
Tony Parker 2008-09 SAS 0.556 40.1 11.6 31.7 4.90
Dirk Nowitzki 2006-07 DAL 0.605 17.8 9.5 28.9 4.81
Steve Nash 2007-08 PHO 0.641 47.3 21.6 22 4.74
Kobe Bryant 2005-06 LAL 0.559 24.1 9 38.7 4.68
Dwyane Wade 2009-10 MIA 0.562 36.4 12.2 34.9 4.61
Kobe Bryant 2006-07 LAL 0.58 25.5 10.9 33.6 4.60
Steve Nash 2009-10 PHO 0.615 50.9 21.4 22.9 4.56
Deron Williams 2008-09 UTA 0.573 47.8 16.5 24.7 4.51
Dirk Nowitzki 2005-06 DAL 0.589 14.7 7.9 30 4.50
Jose Calderon 2008-09 TOR 0.613 41 16.7 16.9 4.41
Brandon Roy 2008-09 POR 0.573 25.4 9 27.4 4.40
Allen Iverson 2005-06 PHI 0.543 34.9 10.2 35.8 4.38
As you can see, players who handle the ball a lot and shoot a lot at a high efficiency are good. Very Happy . Post players will appear once the value of offensive rebounds is included; still, Dirk and Amare show up on this list because of ridiculous efficiency.
An interesting player on this list is Jose Calderon from the last 2 years, with his high efficiency shooting and very high assist percentage.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
supersub15
Joined: 21 Sep 2006
Posts: 273
PostPosted: Fri May 28, 2010 6:43 am Post subject: Reply with quote
Nice formula. Why would offensive rebounds matter? Aren't attempts off of Orebs already included in the TS%?
Mike Dunleavy and Jermaine O'Neal - 2009-2010: 0.00
Does that mean that they are at the right usage?
Back to top
View user's profile Send private message
Mike G
Joined: 14 Jan 2005
Posts: 3547
Location: Hendersonville, NC
PostPosted: Fri May 28, 2010 7:49 am Post subject: Reply with quote
What about a term like
(TS%/rtTS%)^e
where rtTS% is "the rest of the team's TS%" ?
The exponent may or may not be 1.
DWade's TS%/rt this year was 1.038; LeBron 1.058; Nash 1.046
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
DSMok1
Joined: 05 Aug 2009
Posts: 592
Location: Where the wind comes sweeping down the plains
PostPosted: Fri May 28, 2010 9:06 am Post subject: Reply with quote
Mike G wrote:
What about a term like
(TS%/rtTS%)^e
where rtTS% is "the rest of the team's TS%" ?
The exponent may or may not be 1.
DWade's TS%/rt this year was 1.038; LeBron 1.058; Nash 1.046
That doesn't adjust for the value of the actual usage itself (which, if well over 20%, can be quite valuable.)
Back to top
View user's profile Send private message Send e-mail Visit poster's website
DSMok1
Joined: 05 Aug 2009
Posts: 592
Location: Where the wind comes sweeping down the plains
PostPosted: Fri May 28, 2010 10:26 am Post subject: Reply with quote
supersub15 wrote:
Nice formula. Why would offensive rebounds matter? Aren't attempts off of Orebs already included in the TS%?
Mike Dunleavy and Jermaine O'Neal - 2009-2010: 0.00
Does that mean that they are at the right usage?
I don't think it is possible to know the right usage. What 0 means is that they didn't add to or subtract from the team based on their shooting/turnovers/assists. However, I suspect Jermaine O'Neal added something with offensive rebounds. Offensive rebounds do benefit the team, since they are saving a possession.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Rhuidean
Joined: 11 Mar 2010
Posts: 40
Location: East Bay, CA
PostPosted: Fri May 28, 2010 10:37 am Post subject: . Reply with quote
Pretty cool formula, DSMok1. I got pointed here from the Raptors forum on RealGM. A few questions:
1) How exactly did you set up the SPM part of the regression, to learn the weights? Did you regress the player ratings onto a linear function of the TS%, USG%, and AST%?
I'm thinking you'd set up a formula like:
Player_Rating = ((TS%*2*(1-C_1*TOV%))-0.878)*USG% + AST%*C_2 + C_3
The right-hand side is linear in C_1, C_2, and C_3 so you can set it up a linear regression problem and find the appropriate weights. I assume that this is what you did? Did you include other variables in your regression problem? If you did, this might give misleading weights, I think.
2) Why is AST% added afterwards? The basic flavor of the formula seems to be to look at my Points per Possession minus the league average PPP, then scale up by my usage. This makes a lot of intuitive sense. But then you add on assists afterwards, rather than incorporating this into PPP.
3) The .878 I guess is a league average PPP rating. The main immediate use of your formula seems to be for evaluating efficiency vs. usage in a team context. So perhaps using a team average (instead of the .878) would be useful?
Thanks for the cool post.
Back to top
View user's profile Send private message Send e-mail
DSMok1
Joined: 05 Aug 2009
Posts: 592
Location: Where the wind comes sweeping down the plains
PostPosted: Fri May 28, 2010 10:54 am Post subject: Re: . Reply with quote
Rhuidean wrote:
Pretty cool formula, DSMok1. I got pointed here from the Raptors forum on RealGM. A few questions:
1) How exactly did you set up the SPM part of the regression, to learn the weights? Did you regress the player ratings onto a linear function of the TS%, USG%, and AST%?
I'm thinking you'd set up a formula like:
Player_Rating = ((TS%*2*(1-C_1*TOV%))-0.878)*USG% + AST%*C_2 + C_3
The right-hand side is linear in C_1, C_2, and C_3 so you can set it up a linear regression problem and find the appropriate weights. I assume that this is what you did? Did you include other variables in your regression problem? If you did, this might give misleading weights, I think.
2) Why is AST% added afterwards? The basic flavor of the formula seems to be to look at my Points per Possession minus the league average PPP, then scale up by my usage. This makes a lot of intuitive sense. But then you add on assists afterwards, rather than incorporating this into PPP.
3) The .878 I guess is a league average PPP rating. The main immediate use of your formula seems to be for evaluating efficiency vs. usage in a team context. So perhaps using a team average (instead of the .878) would be useful?
Thanks for the cool post.
1) This formula is part of an experimental Statistical Plus/Minus formula regressed against the 6-year average adjusted plus/minus results Ilardi posted a while back. You are correct on how the regression was done, though I had a lot more terms (including rebounding and defensive stats as well). The overall formula approximates each player's adjusted plus/minus. This equation is just a subset.
2) I would prefer to add AST% within the part of the equation scaled for usage, but didn't have access to appropriate numbers--I just used basketball-reference's advanced stats.
The perfect form, I think, would be something like:
(PPP_shooting - C_1*TOV% - C_2*Assisted% + C_3*AST% - PPPThreshold)*(AdjustedUsage)*C_4 - C_5
(I'm not quite sure how to implement the assists exactly, but you get the idea).
What that could do would be to give credit for shooting percentage, losses for turnovers AND how often the player's shots were assisted by others but adding in how often they assisted others. And the Adjusted Usage would have to have to be of the form: (FGA+TO-C_6*ASSISTED+C_7*ASSISTS)/Team Total. Hopefully C_6 and C_7 would be the same.
Can we figure out how to do this?
3) The 0.878 was calculated by the regression; I believe it is below the league average PPP adjusted for turnovers--isn't it? Interesting to ponder whether context is critical or not. Wouldn't we be comparing to the opponent's average, not the team's average?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike G
Joined: 14 Jan 2005
Posts: 3547
Location: Hendersonville, NC
PostPosted: Fri May 28, 2010 11:04 am Post subject: Reply with quote
DSMok1 wrote:
Mike G wrote:
What about a term like
(TS%/rtTS%)^e
where rtTS% is "the rest of the team's TS%" ?
The exponent may or may not be 1.
DWade's TS%/rt this year was 1.038; LeBron 1.058; Nash 1.046
That doesn't adjust for the value of the actual usage itself (which, if well over 20%, can be quite valuable.)
I complee agreetly.
Not suggesting you should drop Usg%, but maybe use this term in place of (TS%*2).
It's another stab at context-independence.
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
Rhuidean
Joined: 11 Mar 2010
Posts: 40
Location: East Bay, CA
PostPosted: Fri May 28, 2010 12:01 pm Post subject: Re: . Reply with quote
DSMok1 wrote:
1) This formula is part of an experimental Statistical Plus/Minus formula regressed against the 6-year average adjusted plus/minus results Ilardi posted a while back. You are correct on how the regression was done, though I had a lot more terms (including rebounding and defensive stats as well). The overall formula approximates each player's adjusted plus/minus. This equation is just a subset.
The problem with adding the additional terms is that you'll get tremendous "leakage" from one variable to the next, no? For example, lets say that you also included 3 point percentage as a variable...there will be leakage from this variable to TS%. The more variables you have, the harder it is for regression to select "good" solutions. And in this case, things like defensive rebounding are truly irrelevant, and should just be lumped in with the noise.
Also, perhaps it should just be restricted to offensive +/-, since this is primarily an offensive stat?
Quote:
2) I would prefer to add AST% within the part of the equation scaled for usage, but didn't have access to appropriate numbers--I just used basketball-reference's advanced stats.
The perfect form, I think, would be something like:
(PPP_shooting - C_1*TOV% - C_2*Assisted% + C_3*AST% - PPPThreshold)*(AdjustedUsage)*C_4 - C_5
(I'm not quite sure how to implement the assists exactly, but you get the idea).
What that could do would be to give credit for shooting percentage, losses for turnovers AND how often the player's shots were assisted by others but adding in how often they assisted others. And the Adjusted Usage would have to have to be of the form: (FGA+TO-C_6*ASSISTED+C_7*ASSISTS)/Team Total. Hopefully C_6 and C_7 would be the same.
Can we figure out how to do this?
Yeah, I like that latter formula a lot. Hrm, I think that is pretty doable, we just have a polynomial. We can expand the polynomial and then clump terms, defining new variables as needed.
I did the algebra and uploaded a copy of it onto Google Docs:
https://docs.google.com/fileview?id=0B8 ... NmNh&hl=en
The only problem with this approach I suggest is that you won't necessarily learn weights than satisfy the equality properties you want (see the Google Doc if confused.) I'm not sure if this can be enforced algorithmically in a clean way, since I'm pretty sure the constraints are not convex (but maybe there is a way to transform them and make things nice, I don't see it yet...)
However, it might be possible that using this approach I suggest, the weights (approximately) satisfy the desired properties.
If anyone knows of a better way to deal with this, please lemme know.
Quote:
3) The 0.878 was calculated by the regression; I believe it is below the league average PPP adjusted for turnovers--isn't it? Interesting to ponder whether context is critical or not. Wouldn't we be comparing to the opponent's average, not the team's average?
Interesting, I did not know that .878 was calculated by regression. That is a pretty damn cool formula, then. I guess the calculations I just uploaded on Google docs don't take that it into account, but I guess you could modify things slightly and get a similar linear regression problem.
Back to top
View user's profile Send private message Send e-mail
DSMok1
Joined: 05 Aug 2009
Posts: 592
Location: Where the wind comes sweeping down the plains
PostPosted: Fri May 28, 2010 12:22 pm Post subject: Re: . Reply with quote
Rhuidean wrote:
The problem with adding the additional terms is that you'll get tremendous "leakage" from one variable to the next, no? For example, lets say that you also included 3 point percentage as a variable...there will be leakage from this variable to TS%. The more variables you have, the harder it is for regression to select "good" solutions. And in this case, things like defensive rebounding are truly irrelevant, and should just be lumped in with the noise.
Also, perhaps it should just be restricted to offensive +/-, since this is primarily an offensive stat?
I was very aware of the leakage potential, so I only used variables in the regression that are close to orthogonal. The variables I used:
TS% TOV% USG% TeamDrtg ORB% DRB% AST% STL% BLK% MPG
TeamOrtg was explored, but it's not quite significant in the latest regression. Team DRtg was significant because steals and blocks only capture a portion of the impact on defense. In practice, though, I will sum up all player's impacts and adjust the total sum to the adjusted team efficiency differential.
Quote:
Quote:
2) I would prefer to add AST% within the part of the equation scaled for usage, but didn't have access to appropriate numbers--I just used basketball-reference's advanced stats.
The perfect form, I think, would be something like:
(PPP_shooting - C_1*TOV% - C_2*Assisted% + C_3*AST% - PPPThreshold)*(AdjustedUsage)*C_4 - C_5
(I'm not quite sure how to implement the assists exactly, but you get the idea).
What that could do would be to give credit for shooting percentage, losses for turnovers AND how often the player's shots were assisted by others but adding in how often they assisted others. And the Adjusted Usage would have to have to be of the form: (FGA+TO-C_6*ASSISTED+C_7*ASSISTS)/Team Total. Hopefully C_6 and C_7 would be the same.
Can we figure out how to do this?
Yeah, I like that latter formula a lot. Hrm, I think that is pretty doable, we just have a polynomial. We can expand the polynomial and then clump terms, defining new variables as needed.
I did the algebra and uploaded a copy of it onto Google Docs:
https://docs.google.com/fileview?id=0B8 ... NmNh&hl=en
The only problem with this approach I suggest is that you won't necessarily learn weights than satisfy the equality properties you want (see the Google Doc if confused.) I'm not sure if this can be enforced algorithmically in a clean way, since I'm pretty sure the constraints are not convex (but maybe there is a way to transform them and make things nice, I don't see it yet...)
However, it might be possible that using this approach I suggest, the weights (approximately) satisfy the desired properties.
If anyone knows of a better way to deal with this, please lemme know.
Quote:
3) The 0.878 was calculated by the regression; I believe it is below the league average PPP adjusted for turnovers--isn't it? Interesting to ponder whether context is critical or not. Wouldn't we be comparing to the opponent's average, not the team's average?
Interesting, I did not know that .878 was calculated by regression. That is a pretty damn cool formula, then. I guess the calculations I just uploaded on Google docs don't take that it into account, but I guess you could modify things slightly and get a similar linear regression problem.
Here is the 6-year APM data I am using: https://spreadsheets.google.com/ccc?key ... JeVE&hl=en
Obviously, there are many local minima in a regression like that (that's not purely linear), so I tried many starting points and chose the final result that minimized the error best.
The trouble is getting ahold of the proper data, particularly formulating the AST% and making sure it's on the same baseline as the other input terms.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Rhuidean
Joined: 11 Mar 2010
Posts: 40
Location: East Bay, CA
PostPosted: Fri May 28, 2010 12:46 pm Post subject: Re: . Reply with quote
DSMok1 wrote:
I was very aware of the leakage potential, so I only used variables in the regression that are close to orthogonal. The variables I used:
TS% TOV% USG% TeamDrtg ORB% DRB% AST% STL% BLK% MPG
TeamOrtg was explored, but it's not quite significant in the latest regression. Team DRtg was significant because steals and blocks only capture a portion of the impact on defense. In practice, though, I will sum up all player's impacts and adjust the total sum to the adjusted team efficiency differential.
Hrm, is there a reason not to simply look at the offensive component of APM and then remove those defensive variables entirely (TeamDrtg, DRB%, BLK%, STL%)?
I guess this is something that one should test in some way, but my instinct is to believe that looking at offensive APM can only give sharper results. I dunno.
Also, what is "adjusted team-efficiency differential" in this context? Or what do you mean when you say you make this adjustment? I've been messing around with SPM myself, and I make no adjustments of the weights I get, so am unfamiliar with this concept. Like, I solve the APM regression problem to get the initial ratings, then solve a second regression problem to get the weights and offset...I make no adjustments on either. What adjustments are you making on the weights and/or offset from the second regression?
Quote:
Obviously, there are many local minima in a regression like that (that's not purely linear), so I tried many starting points and chose the final result that minimized the error best.
The trouble is getting ahold of the proper data, particularly formulating the AST% and making sure it's on the same baseline as the other input terms.
In the formulation I had, I've reduced it to a linear regression problem entirely. So there should be no local minima. Of course, the trick is that I've thrown away equality constraints between variables. In other words, the objective function is convex, but the constraint set is not (at least, I'm pretty sure it is not.)
However, if we can solve the unconstrained version and get a solution that lies exactly in the constraint set, we've achieved an epic win (btw, this is actually a very common strategy for dealing with non-convex problems...formulate as a convex optimization problem with non-convex constraints, and then drop constraints.)
My hope is that running something like the formulation from the Google docs would give weightings that are very close to being in the constraint set.
But I have yet to actually run any experiments and see what the solution looks like, though.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:
Post new topic Reply to topic APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page 1, 2 Next
Page 1 of 2
Author Message
DSMok1
Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains
PostPosted: Fri May 28, 2010 12:59 pm Post subject: Reply with quote
On the link in the last post, I have 3 regressions: SPM, OSPM, and DSPM. They're all there and easily adjusted using Solver in Excel.
The output of the SPM formula is in pts/100Poss. A team's roster's SPM*%Min should equal the team's efficiency differential. It won't because of things not captured by the regression, but a small addition or subtraction to all players' SPMs will make this the case. Also, the stats cannot account for strength of schedule, so I make a similar adjustment to compensate for strength of schedule. Sum of team player's SPMs*%min = Team Adjusted Differential, calculated on the team scale directly.
"formulate as a convex optimization problem with non-convex constraints, and then drop constraints." That's what I did.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Rhuidean
Joined: 11 Mar 2010
Posts: 40
Location: East Bay, CA
PostPosted: Fri May 28, 2010 1:41 pm Post subject: Reply with quote
DSMok1 wrote:
On the link in the last post, I have 3 regressions: SPM, OSPM, and DSPM. They're all there and easily adjusted using Solver in Excel.
The output of the SPM formula is in pts/100Poss. A team's roster's SPM*%Min should equal the team's efficiency differential. It won't because of things not captured by the regression, but a small addition or subtraction to all players' SPMs will make this the case. Also, the stats cannot account for strength of schedule, so I make a similar adjustment to compensate for strength of schedule. Sum of team player's SPMs*%min = Team Adjusted Differential, calculated on the team scale directly.
"formulate as a convex optimization problem with non-convex constraints, and then drop constraints." That's what I did.
Cool, I was confused by your local minima remark, that is why I brought that up.
It seems that your approach does a much better job of predicting OSPM than DSPM...OSPM has an R^2 of .547. I'm not too familiar with this particular aspect of SPM in general...is this quantity high, compared to more classical SPM approaches that just regress pace-adjusted box scores? I'm just trying to get a sense of how good this value is.
Hrm, so looking at the upper right of your spreadsheet, a few questions:
1) How close are the coefficients you get to satisfying the constraints you dropped?
2) I'm a bit puzzled...how did you get direct coefficients for something like turnovers? Like, shouldn't you get a bunch of coefficients for polynomials involving basic variables like turnover rate, assist rate, etc? Or are you using a different regression formula? What does yours look like?
3) Kind of interesting how the PPP threshold is much smaller in the case of your OSPM regression. Only .752. I guess that is much lower than the previous value.
4) Makes sense that the dependence on MPG is small..seems like a "nuisance" variable.
I don't completely understand your OPSM regression...is there any way you can break down what you are doing further? I'd probably understand it more immediately if I had Excel or knew how to use its Solver, but it doesn't appear that you are using this variable substitution approach, and are instead solving a regression more akin to the initial "ideal" formula you suggested? Or am I missing something?
edit: several typos
Back to top
View user's profile Send private message Send e-mail
DSMok1
Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains
PostPosted: Fri May 28, 2010 2:21 pm Post subject: Reply with quote
Rhuidean wrote:
DSMok1 wrote:
On the link in the last post, I have 3 regressions: SPM, OSPM, and DSPM. They're all there and easily adjusted using Solver in Excel.
The output of the SPM formula is in pts/100Poss. A team's roster's SPM*%Min should equal the team's efficiency differential. It won't because of things not captured by the regression, but a small addition or subtraction to all players' SPMs will make this the case. Also, the stats cannot account for strength of schedule, so I make a similar adjustment to compensate for strength of schedule. Sum of team player's SPMs*%min = Team Adjusted Differential, calculated on the team scale directly.
"formulate as a convex optimization problem with non-convex constraints, and then drop constraints." That's what I did.
Cool, I was confused by your local minima remark, that is why I brought that up.
It seems that your approach does a much better job of predicting OSPM than DSPM...OSPM has an R^2 of .547. I'm not too familiar with this particular aspect of SPM in general...is this quantity high, compared to more classical SPM approaches that just regress pace-adjusted box scores? I'm just trying to get a sense of how good this value is.
Hrm, so looking at the upper right of your spreadsheet, a few questions:
1) How close are the coefficients you get to satisfying the constraints you dropped?
2) I'm a bit puzzled...how did you get direct coefficients for something like turnovers? Like, shouldn't you get a bunch of coefficients for polynomials involving basic variables like turnover rate, assist rate, etc? Or are you using a different regression formula? What does yours look like?
3) Kind of interesting how the PPP threshold is much smaller in the case of your OSPM regression. Only .752. I guess that is much lower than the previous value.
4) Makes sense that the dependence on MPG is small..seems like a "nuisance" variable.
I don't completely understand your OPSM regression...is there any way you can break down what you are doing further? I'd probably understand it more immediately if I had Excel or knew how to use its Solver, but it doesn't appear that you are using this variable substitution approach, and are instead solving a regression more akin to the initial "ideal" formula you suggested? Or am I missing something?
edit: several typos
It is not easy to compare R^2 values from different regressions, because the APM values regressed to have varying amounts of error. The seminal work on offensive and defensive SPM is here: http://sonicscentral.com/apbrmetrics/vi ... .php?t=327 . Rosenbaum was using 1-Year APMs, I think, so there was a ton of random and measurement error--dropping the R^2 value significantly. The 6-Yr APMs I used have less random and measurement error, so the R^2 will be higher.
Of course, the R^2 is also biased--the values of APM's that have a higher measurement error are weighted less in my regression, but not the simple R^2 calc I did after the fact.
1) The constraints were not "hard" in this case--I simply tried out several initial guesses, some of which led to obviously spurious (and high residual) results.
2) This is a simple linear regression, except for the scoring part discussed above. APM = A*ORB% + B*DRB% + C*BLK% .... .... + Interept. Each player had an APM and stderr. For the whole population, I minimized the sum of the squared residuals, weighted by 1/stderr^2 where stderr was the error associated with the original APM calculation.
3) Interestingly, if I included the scoring variables in the DSPM calculation, you would see where the difference came from--DSPM shows a negative value for scoring that is basically the difference between OSPM and SPM. In other words, a bad scorer that is playing must be a good defender.
4) MPG is significant, so not exactly a nuisance--it indicates that there are some attributes valued by coaches that are not picked up by the other stats that are positive.
What do you use, if not Excel? R? Solver is basically a trial-and-adjustment algorithm that can maximize or minimize a cell by varying up to 200 other cells. I used it to minimize the sum of the squared residuals by varying the coefficients in my SPM regression.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Rhuidean
Joined: 11 Mar 2010
Posts: 40
Location: East Bay, CA
PostPosted: Fri May 28, 2010 2:36 pm Post subject: . Reply with quote
DSMok1 wrote:
It is not easy to compare R^2 values from different regressions, because the APM values regressed to have varying amounts of error. The seminal work on offensive and defensive SPM is here: http://sonicscentral.com/apbrmetrics/vi ... .php?t=327 . Rosenbaum was using 1-Year APMs, I think, so there was a ton of random and measurement error--dropping the R^2 value significantly. The 6-Yr APMs I used have less random and measurement error, so the R^2 will be higher.
Of course, the R^2 is also biased--the values of APM's that have a higher measurement error are weighted less in my regression, but not the simple R^2 calc I did after the fact.
1) The constraints were not "hard" in this case--I simply tried out several initial guesses, some of which led to obviously spurious (and high residual) results.
2) This is a simple linear regression, except for the scoring part discussed above. APM = A*ORB% + B*DRB% + C*BLK% .... .... + Interept. Each player had an APM and stderr. For the whole population, I minimized the sum of the squared residuals, weighted by 1/stderr^2 where stderr was the error associated with the original APM calculation.
3) Interestingly, if I included the scoring variables in the DSPM calculation, you would see where the difference came from--DSPM shows a negative value for scoring that is basically the difference between OSPM and SPM. In other words, a bad scorer that is playing must be a good defender.
4) MPG is significant, so not exactly a nuisance--it indicates that there are some attributes valued by coaches that are not picked up by the other stats that are positive.
What do you use, if not Excel? R? Solver is basically a trial-and-adjustment algorithm that can maximize or minimize a cell by varying up to 200 other cells. I used it to minimize the sum of the squared residuals by varying the coefficients in my SPM regression.
I don't know how to use R. I usually use Matlab and the CVX package (http://cvxr.com/cvx/) Really makes life pleasant. There is also a Python variant of CVX called CVXMOD (http://cvxmod.net/), which is supposed to be pretty good, and is most importantly free. I've not tried it before, though.
And if you are only solving least squares sort of optimization problems, you could just use Octave....basically a GNU version of Matlab.
I think sometime this evening I'll try to setup a prediction problem using this polynomial approach and report my findings. Maybe if I play around with that I can understand your approach a bit better.
Back to top
View user's profile Send private message Send e-mail
bgassassin
Joined: 08 May 2010
Posts: 5
PostPosted: Mon May 31, 2010 3:19 pm Post subject: Reply with quote
Jon Nichols wrote:
Interesting idea, but why not just use something like Offensive Rating/Usage Rate? Better yet, standardize them with z-scores. Or perhaps you are trying to ignore offensive rebounds?
If you want to go with just points, assists, and turnovers, weighting them differently might make sense.
Yes. I was avoiding offensive rebounds since when you look at the components of usage rate it doesn't fit in, and they are not calculated in usage rate. Including them would just inflate the number. I also did not want to weight them because the idea was to look at true output.
DSMok1 wrote:
I agree with this. Assists do not equal points, necessarily...
Not quite following this, but the idea was to look at efficiency from a literal definition. I wanted to show how efficient/effective a player was by comparing their touches to the actual outcome of those touches. That's why I decided to use a ratio for this. And I wanted to do this without making it complex.
Here's an example comparing Boston's starters vs the Lakers' starters in the regular season.
Code:
Boston
Rondo 1.08:1
Allen 1.08:1
Pierce 1.05:1
Garnett 1.04:1
Perkins .84:1
Lakers
Fisher .97:1
Bryant .97:1
Artest .94:1
Gasol 1.04:1
Bynum .98:1
Back to top
View user's profile Send private message
Mike G
Joined: 14 Jan 2005
Posts: 3629
Location: Hendersonville, NC
PostPosted: Tue Jun 01, 2010 9:41 am Post subject: Reply with quote
bga, sorry I haven't found time to examine this very closely at all.
But what do you think about dropping the :1 after every entry?
We know it's a ratio; extra notations just amount to visual clutter.
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
bgassassin
Joined: 08 May 2010
Posts: 5
PostPosted: Tue Jun 01, 2010 7:08 pm Post subject: Reply with quote
Laughing
I can do that.
Back to top
View user's profile Send private message
bgassassin
Joined: 08 May 2010
Posts: 5
PostPosted: Thu Jun 10, 2010 2:39 am Post subject: Reply with quote
I decided to see how this would look for an individual game. This is for Game 3. Kinda figured some would skew higher since it was one game as opposed to a full season.
Code:
Boston
Rondo 1.32
R. Allen 0.12
Pierce 1.22
Garnett 1.15
Perkins 0.49
Wallace 0.83
T. Allen 1.54
Robinson 1.25
Davis 0.88
Lakers
Fisher 0.93
Bryant 0.92
Artest 0.38
Gasol 1.00
Bynum 0.71
Odom 1.77
Brown 1.33
Walton 2.26
Farmar 0.60
Vujacic 2.27
Find ratio of offensive output to usage- supersub15, 2010
Re: Find ratio of offensive output to usage- supersub15, 201
Given the recent brief discussion of offensive impact and usage, I thought I'd bump this for possible reads / re-reads, consideration and perhaps new comments. I don't have any immediately but thought others might. There are other threads on these topics of course; I just picked this more recent one to mention.