APBRmetrics

Posted: **Fri Apr 15, 2011 12:55 am**

recovered page 1 of 5

PostPosted: Wed Dec 15, 2010 3:50 pm Post subject: Reply with quote
back2newbelf wrote:
This season only:

http://www.docdroid.net/5ka/2011.xls.html
Format: Offense(100 possessions)|Defense(100 possessions)|Sum

That early in the season it is obviously of limited use and will produce funny results

So far we have:

offensive player of the year: Hedo Turkoglu (he is also the worst defender though)
defensive player of the year: Darrell Arthur

While it kind of agrees with the media on the MVP race, Nowitzki/Garnett/Ginobli/super-friends all look good, it couldn't disagree more on the Rookie-of-the-year-race, putting Jeff Adrian, Landry Fields and Evan Turner at the top. John Wall is supposed to be 9th worst of all players, Griffin 7th worst
From looking at the top rated players one would think the best basketball age is 35

Also, Shane Battier is suddenly listed as a horrible defender and Chuck Hayes, who used to rock this rating, is the 10th worst player in the league

I haven't figured out how to do this, but would it be possible to use ASPM ratings as a Bayesian prior?
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mathayus

PostPosted: Sun Dec 26, 2010 4:16 pm Post subject: Reply with quote
back2newbelf wrote:

While it kind of agrees with the media on the MVP race, Nowitzki/Garnett/Ginobli/super-friends all look good, it couldn't disagree more on the Rookie-of-the-year-race, putting Jeff Adrian, Landry Fields and Evan Turner at the top. John Wall is supposed to be 9th worst of all players, Griffin 7th worst

In my experience, star rookies very rarely look impressive by +/- metrics. This has led me to conclude that if we truly gave the ROY to the MVP of rookies, in most years it would go to a player who happened to fill a niche on a successful team instead of the big name rookies.

While there would be nothing inherently wrong with that, the big name rookies are actually the ones who tend to go on and become stars by +/- metrics, so focusing on the volume statistics instead of +/- statistics for rookies does serve a useful purpose.
Back to top
View user's profile Send private message
back2newbelf

PostPosted: Sun Dec 26, 2010 6:08 pm Post subject: Reply with quote
mathayus wrote:

In my experience, star rookies very rarely look impressive by +/- metrics. This has led me to conclude that if we truly gave the ROY to the MVP of rookies, in most years it would go to a player who happened to fill a niche on a successful team instead of the big name rookies.

While there would be nothing inherently wrong with that, the big name rookies are actually the ones who tend to go on and become stars by +/- metrics, so focusing on the volume statistics instead of +/- statistics for rookies does serve a useful purpose.

Good point.
I think what also needs to be done is to never use single-season (R)APM to judge rookies. The top rookies will usually play heavy minutes on very bad teams. When RAPM "doesn't know" that the players the rookie is currently playing with already sucked the year before it puts part of the blame on him. This can be avoided with multi-season (R)APM
Back to top
View user's profile Send private message
back2newbelf

PostPosted: Tue Jan 18, 2011 12:01 pm Post subject: Reply with quote
I think we have the error fixed that made lambda so huge. At least it looks more sane now, being around ~2500 for a single season.

Single season approximated RAPM now gets published on http://stats-for-the-nba.appspot.com/ and probably updated every two weeks or so.

The site also contains data from the latest multiyear analysis which included coaches and tried different lambdas for offense and defense for both players and coaches. They were found to be: Offense: 2500, Defense: 7500, Offense(Coach): 6000, Defense(Coach): 4500.
Unfortunately the difference in error on the test sets between this method and using players only with just one lamdba is minimal.
Back to top
View user's profile Send private message
DSMok1

PostPosted: Tue Jan 18, 2011 12:17 pm Post subject: Reply with quote
back2newbelf wrote:
I think we have the error fixed that made lambda so huge. At least it looks more sane now, being around ~2500 for a single season.

Single season approximated RAPM now gets published on http://stats-for-the-nba.appspot.com/ and probably updated every two weeks or so.

The site also contains data from the latest multiyear analysis which included coaches and tried different lambdas for offense and defense for both players and coaches. They were found to be: Offense: 2500, Defense: 7500, Offense(Coach): 6000, Defense(Coach): 4500.
Unfortunately the difference in error on the test sets between this method and using players only with just one lamdba is minimal.

Thanks a lot for this data!

Could you please post the standard errors for each estimate as well? The lack of standard errors makes it very difficult to use this data for additional research!

I find it interesting and expected that the Lambdas broke down the way they did: players regress far more to the mean on defense, as that is a more unstable measure, while coaches have more of an impact on defense than offense.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
deepak

PostPosted: Tue Jan 18, 2011 10:44 pm Post subject: Reply with quote
back2newbelf wrote:
I think we have the error fixed that made lambda so huge. At least it looks more sane now, being around ~2500 for a single season.

Single season approximated RAPM now gets published on http://stats-for-the-nba.appspot.com/ and probably updated every two weeks or so.

The site also contains data from the latest multiyear analysis which included coaches and tried different lambdas for offense and defense for both players and coaches. They were found to be: Offense: 2500, Defense: 7500, Offense(Coach): 6000, Defense(Coach): 4500.
Unfortunately the difference in error on the test sets between this method and using players only with just one lamdba is minimal.

Appreciate it.

I got the following correlation table between your current season RAPM and some various per-minute boxscore statistics:

Code:

Age MPG GmSc USG ORB DRB PPR BLK+STL PTS OFF DEF
OFF 0.122 0.405 0.530 0.206 -0.049 0.039 0.243 -0.005 0.371 1.000 -0.009
DEF 0.139 -0.074 -0.025 -0.152 0.031 0.132 -0.017 0.171 -0.117 -0.009 1.000

all boxscore stats are per 40 minutes
GmSc = PTS + 0.4 * FG - 0.7 * FGA - 0.4*(FTA - FT) + 0.7 * ORB + 0.3 * DRB +
STL + 0.7 * AST + 0.7 * BLK - 0.4 * PF - TOV
USG = FGA + 0.44*FTA + TOV
PPR = 0.7*AST - TOV

Question: Are coaches overly biased towards offensive players, or does RAPM overrate the value of defensive players?

page 2

Author Message
back2newbelf

PostPosted: Wed Jan 19, 2011 1:40 pm Post subject: Reply with quote
deepak wrote:

cool table! Could you do it for GmSc without DRebs, steals and blocks?
Back to top
View user's profile Send private message
Ilardi

PostPosted: Wed Jan 19, 2011 5:44 pm Post subject: Reply with quote
back2newbelf wrote:
deepak wrote:

cool table! Could you do it for GmSc without DRebs, steals and blocks?

And PER?
Back to top
View user's profile Send private message
Crow

PostPosted: Wed Jan 19, 2011 5:48 pm Post subject: Reply with quote
Deepak,

I'd also be interested in seeing the correlations for offensive and defensive splits of Game Score.

And seeing how far you could possibly up the correlation with multi-season Adjusted +/- by optimizing these linear weights, with an additional variable for capturing the residual uncaptured in Game Score shot defense. What is the average absolute value of that shot defense variable?

Adjusted +/- is not perfect, it is an estimate with error. But an optimized to max correlation with Adjusted +/- Game Score could be worth seeing as an intermediate product between the existing linear weight boxscore based metric and Adjusted +/-. The weights may or may not be stable thru different periods but it would be suggestive about whether the linear weights should be changed to try to get closer over the long-run (and maybe also where the Adjusted +/- errors might be higher?). While I am suggesting here doing it with the simple Game Score, ideally such a comparison would be done with newer / probably better boxscore (and play by play) based metrics.

Back2newbelf,

Might your friend be interested in doing a RAPM run that just looked at when the top 8-16 teams play in regular season games against each other? I think it would be useful to see where values from that split vary considerably from the league-wide RAPM.

Any interest in say a 3 season playoffs only run? Or a run where the playoff data was included with regular season data but had a somewhat higher or significantly higher weight?

Or how about preparing up to date multi-season Adjusted +/- splits down to the 4 Factor level?
Back to top
View user's profile Send private message
acollard

PostPosted: Wed Jan 19, 2011 7:27 pm Post subject: Reply with quote
Crow wrote:

Adjusted +/- is not perfect, it is an estimate with error. But an optimized to max correlation with Adjusted +/- Game Score could be worth seeing as an intermediate product between the existing linear weight boxscore based metric and Adjusted +/-. The weights may or may not be stable thru different periods but it would be suggestive about whether the linear weights should be changed to try to get closer over the long-run (and maybe also where the Adjusted +/- errors might be higher?). While I am suggesting here doing it with the simple Game Score, ideally such a comparison would be done with newer / probably better boxscore (and play by play) based metrics.

This is a pretty similar suggestion to DSMOK1's suggestion of using ASPM as a Bayesian prior, and both seem pretty great. I think you could get a more consistent result if you used a statistical approach as a jumping off point for Adjusted +/-.

On the other hand, one of the coolest and best things about Adjusted +/- is the guys who stand out for seemingly unclear reasons, without a lot of box score stats to back it up. Sometimes, its garbage, but other times, it can point toward some surprising truth. This quality would be largely diminished if you somehow averaged or weighted Adjusted +/- with a statistical approach.

I'm also unsure how I feel about using coaches in the +/- formula. Does it make anyone else uneasy? I feel like there just isn't enough data with most or all of them for it to be useful. What's the highest number of coaches a team has had in the past five years? 3? And what about the most amount of teams a coach has coached with? 2? It seems like the coach variable as its used may have a lot of other things about the team he coaches that correlate with him like team environment, chemistry, medical staff, player synergies, home crowd, etc. that may or may not have a lot to do with the coach himself.
Back to top
View user's profile Send private message
Crow

Joined: 20 Jan 2009
Posts: 746

PostPosted: Wed Jan 19, 2011 8:18 pm Post subject: Reply with quote
Any form of metric with Adjusted +/- weighted with or informed by "a statistical approach" could and I think should be a 3rd leg to the 2 "pure" approaches / metrics. You don't have to dispose of the originals and I wouldn't.

I'd glad to see the data with coaches at least once, though I'd probably use without coaches more given the concerns you raised. Having both would, again, allow the comparisons and help spot aberrant / potentially interesting stuff.

It also would be great to have, in public on a regular basis, player pair Adjusted +/-. Comparing the individual ratings with the pair would be suggestive about specific player to player interactions / impacts.

Conceivably you could have player / coach Adjusted +/- pairs too.

The errors might be too high for many to make much of player / opposing player or opposing coach pairs except maybe in a 4-6 season version, but it is also conceivable. Again, if you are searching for interesting things, it might be fun to at least see and maybe more than that.

Last edited by Crow on Thu Jan 20, 2011 2:18 pm; edited 1 time in total
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 236

PostPosted: Thu Jan 20, 2011 6:13 am Post subject: Reply with quote
Crow wrote:

Or how about preparing up to date multi-season Adjusted +/- splits down to the 4 Factor level?

I want to do this sometime with TS%, OReb% and Tov%
Quote:

It also would be great to have in public on a regular basis would be player pair Adjusted +/-. Comparing the individual ratings with the pair would be suggestive about specific player to player interactions / impacts.

Conceivably you could have player / coach Adjusted +/- pairs too.

This is also on my to-do-list
allocard wrote:

I'm also unsure how I feel about using coaches in the +/- formula. Does it make anyone else uneasy? I feel like there just isn't enough data with most or all of them for it to be useful. What's the highest number of coaches a team has had in the past five years? 3? And what about the most amount of teams a coach has coached with? 2?

Player trades help here. A coach might have just coached two teams but there's a good possibility he coached 40+ players.

One big problem with including coaches is probably aging. Kuester has to work with an older (and probably worse) Ben Wallace and Hamilton than coaches before him, but the algorithm thinks they're the same player and punishes Kuester for it
Back to top
View user's profile Send private message
Crow

Joined: 20 Jan 2009
Posts: 746

PostPosted: Thu Jan 20, 2011 2:21 pm Post subject: Reply with quote
Appreciate the shared public data so far and look forward to the additional variations and extensions of Adjusted +/- that you and your friend decide to prepare.
Back to top
View user's profile Send private message
acollard

Joined: 22 Sep 2010
Posts: 49
Location: MA

PostPosted: Thu Jan 20, 2011 5:54 pm Post subject: Reply with quote
[quote=back2newbelf]One big problem with including coaches is probably aging. Kuester has to work with an older (and probably worse) Ben Wallace and Hamilton than coaches before him, but the algorithm thinks they're the same player and punishes Kuester for it[/quote]

Isn't this the same problem for player adjusted +/- over long periods? Players who play with aging superstars or soon to be superstars are devalued, and perhaps players who played with a now declining superstar who was still in his prime are overvalued?

I wonder if adjusted +/- would or could be able to use aging curves in any way, to help avoid errors like this? It seems like it could be useful.
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 236

PostPosted: Fri Jan 21, 2011 6:50 am Post subject: Reply with quote
Updated this years' ranking and added 2 year ranking http://stats-for-the-nba.appspot.com/2-year-ranking
George Hill looks suprisingly good in the one-year-ranking. Going by 82games, he does have a good On/Off rating and, sorted by minutes, 2 of his top 3 5-man-units do not involve Ginobili who has the Spurs' best On/Off

acollard wrote:

Isn't this the same problem for player adjusted +/- over long periods? Players who play with aging superstars or soon to be superstars are devalued, and perhaps players who played with a now declining superstar who was still in his prime are overvalued?

yes that's true
Quote:

I wonder if adjusted +/- would or could be able to use aging curves in any way, to help avoid errors like this? It seems like it could be useful.

I'm sure it's possible and I will probably do this sometime when I'm less busy
Back to top
View user's profile Send private message
DSMok1

Joined: 05 Aug 2009
Posts: 547
Location: Where the wind comes sweeping down the plains

PostPosted: Fri Jan 21, 2011 7:03 am Post subject: Reply with quote
You'd probably have to generate the aging curves ahead of time, apply them in a "preprocessing" phase, and then run the regression. I generated a fairly good aging curve for ASPM, which should look the same as APM. It's at http://sonicscentral.com/apbrmetrics/vi ... php?t=2652 .
_________________
GodismyJudgeOK.com/DStats
Back to top
View user's profile Send private message Visit poster's website
EvanZ

Joined: 22 Nov 2010
Posts: 188

PostPosted: Fri Jan 21, 2011 8:38 am Post subject: Reply with quote
I put the 2yr data up as a .csv file on GoogleDocs. Added rank as the first column:

https://spreadsheets.google.com/pub?key ... output=csv
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
Crow

Joined: 20 Jan 2009
Posts: 746

PostPosted: Fri Jan 21, 2011 3:25 pm Post subject: Reply with quote
Thanks for the 2 season RAPM.

It would be handy to have the team identifiers in the file too for sorting, though users can cobble it together.

I can't recall for sure if Joe Sill used age curves in his RAPM. I think he might have. I also think I recall Steve Ilardi talking about doing it in his next private version of Adjusted +/-.
Back to top
View user's profile Send private message
Crow

Joined: 20 Jan 2009
Posts: 746

PostPosted: Fri Jan 21, 2011 4:37 pm Post subject: Reply with quote
I matched up this 2 season RAPM with basketballvalue's 2 season traditional APM for 328 players with values in both. The r2 was .56, lower than I'd hoped to see.

I also look at the r2 from just guys +4 or better and it was .25. Between +4 and -4, it was .18. Less than -4, .04.

What do you make of this?

What should be done?
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 236

PostPosted: Fri Jan 21, 2011 5:02 pm Post subject: Reply with quote
Crow wrote:

What should be done?

About what? It's not exactly my goal to produce numbers that correlate well with traditional APM
Back to top
View user's profile Send private message
Ilardi

Joined: 15 May 2008
Posts: 257
Location: Lawrence, KS

PostPosted: Fri Jan 21, 2011 5:47 pm Post subject: Reply with quote
Crow wrote:
I matched up this 2 season RAPM with basketballvalue's 2 season traditional APM for 328 players with values in both. The r2 was .56, lower than I'd hoped to see.

I also look at the r2 from just guys +4 or better and it was .25. Between +4 and -4, it was .18. Less than -4, .04.

What do you make of this?

What should be done?

Crow, your R^2 of .56 implies a zero-order correlation (r) between RAPM and APM of .75, with both estimates derived from only 1.5 years of data. That's surprisingly high, imho.

The lower R^2 numbers at higher/lower values of APM is most likely a "truncated range" phenomenon.
page 3

Author Message
Crow

Joined: 20 Jan 2009
Posts: 821

PostPosted: Fri Jan 21, 2011 5:48 pm Post subject: Reply with quote
back2newbelf,

I accept that it is not exactly your goal to produce numbers that correlate well with traditional APM. That is not the goal, the goal is estimating true impact.

Nonetheless I wanted to ask a few simple open-ended questions to possibly hear further thoughts on the metric comparison from whomever wanted to share them.

Currently I'll look at and weigh the estimates from 2 year or longer traditional APM or RAPM and preferably RAPM.

That these versions can vary a fair amount is something I've been noting case by case for awhile when I find it. It is to be expected to a degree but I do think there was value to checking the correlation.

If an even better version of APM can be constructed with comparison and discussion, I'd say great.

Steve,

Yes the r was .747. I reported the r2 because I had gotten the impression that was preferred. Maybe I have some previous r's and r2's reported for metric comparisons scrambled in my mind but I was under the impression that an r2 of .56 was pretty good but not real strong and since these metrics are at the core the same type method I thought it might be higher. Maybe my expectations were too high and I will consider your reaction. Perhaps the correlation would be even higher for a longer time period as you suggest or if other authored versions of traditional APM or RAPM are used. I don't fully understand the lamba value issue but that may be part of it as is the minute cutoff choice.

Just noting what I see since I don't recall a recent comparison of traditional and RAPM values, especially at the 2 year level, and any comparative discussion at Joe's site is gone. Maybe there is some dialog here that could be dug up. But there probably is still room for it to continue.

I was thinking it might not be surprising for the truncated ranges to have lower correlations for the segments but thanks for the reinforcement. I am not surprised that the correlations were stronger in the top segment than the middle or the bottom, but I thought that might be worthing noting too.
Back to top
View user's profile Send private message
Crow

Joined: 20 Jan 2009
Posts: 821

PostPosted: Sat Jan 22, 2011 3:09 pm Post subject: Reply with quote
Players whose 1.5 season traditional APM is 5 or more points higher than this RAPM

Fields, Landry 11.5
Collins, Jason 10.1
Nash, Steve 9.6
Aldridge, LaMarcus 9.4
Dooling, Keyon 8.2
Bass, Brandon 8.2
Nowitzki, Dirk 8.0
Wallace, Gerald 8.0
Gasol, Pau 7.3
Rose, Derrick 7.1
Johnson, Amir 7.0
West, David 6.8
Lopez, Brook 6.4
Fesenko, Kyrylo 6.2
Favors, Derrick 5.9
Chandler, Tyson 5.9
Dunleavy, Mike 5.8
Carter, Vince 5.7
Paul, Chris 5.4
Dorsey, Joey 5.4
Brockman, Jon 5.3
Johnson, Wesley 5.2
Young, Thaddeus 5.1
Gordon, Ben 5.0

Players whose 1.5 season traditional APM is 5 or more points lower than this RAPM

Belinelli, Marco -5.0
Cousins, DeMarcus -5.0
Jones, Solomon -5.0
Wall, John -5.1
Landry, Carl -5.2
Forbes, Gary -5.2
Bayless, Jerryd -5.3
Vasquez, Greivis -5.3
Nocioni, Andres -5.3
Jack, Jarrett -5.4
Jackson, Stephen -5.6
Outlaw, Travis -5.6
Krstic, Nenad -5.8
Williams, Terrence -5.8
Andersen, Chris -5.8
Splitter, Tiago -5.9
Livingston, Shaun -6.0
Boykins, Earl -6.1
Collison, Darren -6.2
Ridnour, Luke -6.3
Matthews, Wes -6.3
Marion, Shawn -6.3
Moon, Jamario -6.3
Chalmers, Mario -6.4
Williams, Jawad -6.4
Milicic, Darko -6.5
Graham, Stephen -6.5
Bledsoe, Eric -7.1
Carney, Rodney -7.2
Maggette, Corey -7.2
Sanders, Larry -7.3
Armstrong, Hilton -7.5
Bryant, Kobe -7.5
Dragic, Goran -7.6
Evans, Maurice -7.6
Webster, Martell -7.7
Powell, Josh -7.7
Bell, Raja -7.8
McGuire, Dominic -7.8
Gortat, Marcin -7.9
Telfair, Sebastian -7.9
Douglas-Roberts, Chris -8.2
Ellington, Wayne -8.5
Hill, Jordan -8.5
Richardson, Jason -8.8
Mason, Roger -9.9
Erden, Semih -9.9
Brown, Kwame -10.0
Law, Acie -10.1
Arroyo, Carlos -10.4
Monroe, Greg -11.1
Batum, Nicolas -11.4
Felton, Raymond -11.8
Thabeet, Hasheem -11.8
House, Eddie -12.1
Hayward, Gordon -12.4

About 25% of the players compared are in one of the 2 groups, so 75% of the estimates are within 5 points of each other. More than 50% were within 3 points of each other. About 40% within 2 points.
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 274

PostPosted: Sun Jan 23, 2011 7:38 am Post subject: Reply with quote
What's not accounted for in adjusted +/-?
Forcing good opponent players to the bench because he just fouled you!

I split all players into 3 groups according to their 2-year rating: [>+1.0, +1.0> & >-1.0, <-1.0], then used basketballgeek.com's 2009/2010 data to compute how many times a player was fouled by players of the different groups

http://stats-for-the-nba.appspot.com/fouling
minimum 50 possessions.

The analysis is far from perfect. One problem is that garbage time players can, for the most part, only be fouled by other garbage time players. Thus they will never look good in the "being fouled by >+1.0" category.

One other problem is that all fouls get treated the same, when in reality it's probably better to make someone pick up his second foul in the 1st quarter, rather than make him pick up his fourth foul with 10 seconds to play in the game
Back to top
View user's profile Send private message
DSMok1

Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Mon Jan 24, 2011 4:42 pm Post subject: Reply with quote
Would it be possible to do this at a lineup level?
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
back2newbelf

Joined: 21 Jun 2005
Posts: 274

PostPosted: Mon Jan 24, 2011 5:10 pm Post subject: Reply with quote
DSMok1 wrote:
Would it be possible to do this at a lineup level?

You need to be a little more clear. Are you talking about fouling? Do you want the lineups that get fouled by certain players, or the players that get fouled by certain lineups? Something else?
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
DSMok1

Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Mon Jan 24, 2011 5:32 pm Post subject: Reply with quote
back2newbelf wrote:
DSMok1 wrote:
Would it be possible to do this at a lineup level?

You need to be a little more clear. Are you talking about fouling? Do you want the lineups that get fouled by certain players, or the players that get fouled by certain lineups? Something else?

No, sorry, I meant the RAPM at the lineup level, like Basketball Value does, but with the lambda-based ridge regression applied.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
back2newbelf

Joined: 21 Jun 2005
Posts: 274

PostPosted: Tue Jan 25, 2011 11:53 am Post subject: Reply with quote
Now with Euroleague approximated RAPM at http://stats-for-the-nba.appspot.com/euroleague-ranking Last season and this season combined. Optimal lambda was, again, 3000.
Rubio looks pretty good
Thanks to http://www.in-the-game.org for providing the data
DSMok1 wrote:
No, sorry, I meant the RAPM at the lineup level, like Basketball Value does, but with the lambda-based ridge regression applied.

Certainly possible, but not exactly at the top of my to-do list (that would be adj. four factors and player pairs)
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
DSMok1

Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Tue Jan 25, 2011 12:10 pm Post subject: Reply with quote
Excellent once again! I'm sure we're getting repetitive saying that over and over...
Very Happy
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Crow

Joined: 20 Jan 2009
Posts: 821

PostPosted: Tue Jan 25, 2011 12:41 pm Post subject: Reply with quote
Rubio unimpressive to me on individual statistical measurement (70th Ranking at hoopsstats) but this RAPM has him at +1.9, 11th best in Euroleague.

He is helping optimize his teammates further on offense and also contributing a bit on defense.

Would he optimize NBA teammates on offense at the same level or more or less? I'd think his defensive impact would be less in the NBA than Euroleague but that is a surface reaction. Not that the test is coming that soon or will be that important but I wanted to touch on it given recent articles.

If RAPM were done for additional earlier Euroleague seasons then some Euroleague RAPM to NBA RAPM comparisons could be done now for guys who came over here. I guess it could be done the other way too. Not sure how much I'd weight general translation projections from one league to another that heavily in a specific player's evaluation but would be good to see the averages and to gather as many examples as possible. See what it says, what you think it says and what results you (and others) get with one approach over time or another.
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 274

PostPosted: Fri Jan 28, 2011 5:43 am Post subject: Reply with quote
I think I found a way to compute standard errors via bootstrapping. Not 100% sure if this is correct though.

Bootstrap sample: From our n observations take n independent draws with replacement.

Then use a Monte Carlo algorithm:
(1)using a random number generator, independently draw a large number of bootstrap samples (B)
(2)for each bootstrap sample evaluate the statistic of interest
(3)calculate the standard deviation of the values

Right now it's only available for the 2-year ranking http://stats-for-the-nba.appspot.com/2-year-ranking
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
DSMok1

Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Fri Jan 28, 2011 10:27 am Post subject: Reply with quote
I don't think it's working right... Sad

The standard errors should be highest on the players with the least data... but it is reversed here. The players that have the least data have their results dominated by the lambda, and thus return a low stdev on the bootstrapping.

What should happen is that the players with essentially no data should have a standard error equal to the standard deviation of the over-all distribution of NBA players, or, I should say, it's based on the lambda. It's a Bayesian deal: the prior is 0, and the lambda defines the spread (I'm not sure how to convert lambda to standard deviation). Then the player data is applied, narrowing the standard error of the estimate.

That's about all I know... Oh, the errors should probably be in the range of 1.5 for the best-known player ranging up to like 6 or 7 for players with no data.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gabefarkas

Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Tue Feb 01, 2011 2:12 pm Post subject: Reply with quote
Are you resampling with replacement, or without?
Back to top
View user's profile Send private message Send e-mail AIM Address
back2newbelf

Joined: 21 Jun 2005
Posts: 274

PostPosted: Wed Feb 02, 2011 3:43 pm Post subject: Reply with quote
gabefarkas wrote:
Are you resampling with replacement, or without?

With replacement. Why do you ask?
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
gabefarkas

Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Wed Feb 02, 2011 9:45 pm Post subject: Reply with quote
back2newbelf wrote:
gabefarkas wrote:
Are you resampling with replacement, or without?

With replacement. Why do you ask?
That's how bootstrapping is supposed to be done, from what I remember. I initially thought maybe that was the issue you were facing, but I guess not.
Back to top
View user's profile Send private message Send e-mail AIM Address
back2newbelf

Joined: 21 Jun 2005
Posts: 274

PostPosted: Wed Feb 09, 2011 7:14 am Post subject: Reply with quote
I did a test on how many years one should use to get best prediction results.

I split this seasons' data into several (N) parts, computed player values on N-1 parts N times, always leaving out just one part. Then, using the computed player values, computed error on the part that was left out (N times, because N parts were left out).

Then I did the same thing but included data from seasons prior. All of this older data is used to compute player values, combined with the parts from this running season, always removing one part from this running season as described above

If I use just this season the error on out-of-sample-2010/2011-data is bigger than if I include 2009/2010. Including 2008/2009 on top of 09/10 improves the error even more and it's actually best when I include 07/08 too. From here on it always gets worse when I include older data.

From best to worst:
3.x year
4.x year
2.x year
5.x year
1.x year
0.x year
_________________
http://stats-for-the-nba.appspot.com/

Posted: **Fri Apr 15, 2011 7:52 am**

page 5

Author Message
back2newbelf

Joined: 21 Jun 2005
Posts: 265

PostPosted: Fri Mar 25, 2011 10:03 am Post subject: Reply with quote
I can now easily update most files by the press of a button, so there might be almost daily updates from now on. Keep in mind that most ratings, especially the multiyear ones, don't change much from day to day. Time of update is now listed at the top of each ranking. The one year ranking will mostly be updated through the playoffs

I'll probably add another rating soon where I compute ratings of coaches on multiyear data (going back to 2002), then add those ratings to the computation of 1 year player ratings while leaving the coach ratings fixed. That might give a better estimation of players that play on super good/bad defensive teams. Of course, prediction performance of this method will have to be tested first.

Comments on the ratings:
-Thibodeau looks awesome, I didn't think it was possible to jump to the #1 spot after less than a full season
-Completely disagrees with the firing of O'Brian, but this method only measures the influence on the players being on the court. Did he have really weird substitution patterns?
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
EvanZ

Joined: 22 Nov 2010
Posts: 272

PostPosted: Fri Mar 25, 2011 10:34 am Post subject: Reply with quote
Are you planning to do adjusted four factors any time soon?
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
EvanZ

Joined: 22 Nov 2010
Posts: 272

PostPosted: Fri Mar 25, 2011 11:55 am Post subject: Reply with quote
Someone asked before, but are there standard errors somewhere? Even a ballpark would be nice for the 1-yr vs. 3.x-yr.
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 265

PostPosted: Fri Mar 25, 2011 11:59 am Post subject: Reply with quote
EvanZ wrote:
Someone asked before, but are there standard errors somewhere? Even a ballpark would be nice for the 1-yr vs. 3.x-yr.

I tried computing those via bootstrapping, but the way I did it wasn't correct. As of right now, I do not know how to compute them
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
Ilardi

Joined: 15 May 2008
Posts: 264
Location: Lawrence, KS

PostPosted: Fri Mar 25, 2011 12:19 pm Post subject: Reply with quote
back2newbelf wrote:
EvanZ wrote:
Someone asked before, but are there standard errors somewhere? Even a ballpark would be nice for the 1-yr vs. 3.x-yr.

I tried computing those via bootstrapping, but the way I did it wasn't correct. As of right now, I do not know how to compute them

Joe Sill has said he has a method, but I believe even there he's not completely sure it's correct. SAS (my stat package of choice) doesn't provide any error estimates with ridge regression, but I'm assuming the error terms have to be lower than those generated by a corresponding regression model without the ridge correction (since the ridge technique helps reign in the coefficient variance inflation that arises due to excessive collinearity among predictors). So, at least we have an upper bound on the errors.

If someone can help elucidate the issue further, I'm sure we'd all be grateful!

page 4

Author Message
Mike G

Joined: 14 Jan 2005
Posts: 3572
Location: Hendersonville, NC

PostPosted: Wed Feb 09, 2011 7:32 am Post subject: Reply with quote
back2newbelf wrote:
I did a test on how many years one should use to get best prediction results.

I split this seasons' data into several (N) parts, computed player values on N-1 parts N times, always leaving out just one part. Then, using the computed player values, computed error on the part that was left out (N times, because N parts were left out).

Then I did the same thing but included data from seasons prior. All of this older data is used to compute player values, combined with the parts from this running season, always removing one part from this running season as described above

If I use just this season the error on out-of-sample-2010/2011-data is bigger than if I include 2009/2010. Including 2008/2009 on top of 09/10 improves the error even more and it's actually best when I include 07/08 too. From here on it always gets worse when I include older data.

From best to worst:
3.x year
4.x year
2.x year
5.x year
1.x year
0.x year
Nice.
3-4 years sounds about right.
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
DSMok1

Joined: 05 Aug 2009
Posts: 602
Location: Where the wind comes sweeping down the plains

PostPosted: Wed Feb 09, 2011 8:57 am Post subject: Reply with quote
I think it will be possible to apply aging curves, back2newbelf, as part of a pre-processing phase. Once that is done, I will be interested in what length of time is best...

Aging curves in preprocessing: for each player, convert his value in the past to current value. If the player was 21 in the previous matchup and now is 25, take the aging from 21 to 25 and add it to the observed score in the previous matchup. Probably just do this at a yearly basis; I have a rough aging curve for APM calculated that I use for ASPM. After the preprocessing, do the same calcs you just; I expect to see maybe even the 5.x take over the best position, and the overall error be significantly lower (maybe the lambdas lower as well).
_________________
GodismyJudgeOK.com/DStats
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi

Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Wed Feb 09, 2011 11:43 am Post subject: Reply with quote
back2newbelf,

A couple quick questions:

1) did you include playoff data in your models?
2a) did you weight each season equally?
2b) if so, have you explored the effect of differential weighting across seasons?
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 260

PostPosted: Wed Feb 09, 2011 12:26 pm Post subject: Reply with quote
Ilardi wrote:
1) did you include playoff data in your models?
2a) did you weight each season equally?
2b) if so, have you explored the effect of differential weighting across seasons?

No, yes, no. Very good points. I'll add everything to my todo list
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
EvanZ

Joined: 22 Nov 2010
Posts: 269

PostPosted: Fri Feb 11, 2011 9:21 am Post subject: Reply with quote
b2nb, question...

Is it possible to break out the rebounding component of the offensive and defensive RAPM?

The reason I ask is because I'm doing some validation of ezPM. One of the things I want to do is regress each individual component of ezPM (off100, def100, reb100) against the components of RAPM, if possible.

As a test of ezPM (and maybe a suggestion for you to do with RAPM), I have regressed ezPM100 against each of its internal components (O100, D100, REB100). Here's the summary for the REB100 regression:
Code:

> summary(ezpm.reb100.lm)

Call:
lm(formula = ezPM100 ~ REB100 - 1, data = ezpm.2010, weights = POSS)

Residuals:
Min 1Q Median 3Q Max
-265.18 -64.45 17.86 96.57 438.14

Coefficients:
Estimate Std. Error t value Pr(>|t|)
REB100 0.9567 0.1490 6.422 7.83e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 126.9 on 226 degrees of freedom
Multiple R-squared: 0.1543, Adjusted R-squared: 0.1506
F-statistic: 41.24 on 1 and 226 DF, p-value: 7.83e-10

The R^2 for the rebounding component for the current season is about 0.15, which lines up very well with my previous regressions of point differential on the four factors. In that study, I found that rebounding accounted for about 15% of point differential. Therefore, it's obviously comforting that ezPM is about the same - i.e. rebounding is not being give more weight than it's involvement in winning.

Hopefully, this makes sense. Have you thought about or previously done these regressions?
-evan
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 260

PostPosted: Wed Feb 23, 2011 9:06 am Post subject: Reply with quote
updated this seasons' ranking http://stats-for-the-nba.appspot.com/ranking11
I still recommend the 4 year ranking though

By 1 year appr RAPM:
-Udoh is now the top rookie, the Warriors are +3.2 when he plays, -4.3 when he doesn't. Wall looks atrocious
- two 'no names' in the top 20: Keyon Dooling and Anthony Tolliver

A whole bunch of Bulls players are rated as above average defenders. I think it's all coaching
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
greyberger

Joined: 27 Sep 2010
Posts: 46

PostPosted: Sun Mar 06, 2011 1:49 am Post subject: Reply with quote
On the subject of APM, Arturo Galletti has a post about the APM available at basketballvalue.com.

I didn't know that a SPM-type regression was involved until I read the details. That would seem to be a key point of distinction between APM there and hypothetical public alternatives. Are there any good posts or links about this step in the Rosenbaum approach? I'm not even sure I'm asking the right question.

How about this one: any responses provoked by the Arturo post?
Back to top
View user's profile Send private message Send e-mail
DLew

Joined: 13 Nov 2006
Posts: 224

PostPosted: Mon Mar 07, 2011 11:13 am Post subject: Reply with quote
Arturo clearly got a little confused there... Adjusted plus-minus is not easy to understand, especially for people who come in with a prior belief that it's not a good method. If Arturo actually wanted to understand APM you would think he would post in this forum or check out Eli's very informative work on the topic.
Back to top
View user's profile Send private message
greyberger

Joined: 27 Sep 2010
Posts: 46

PostPosted: Mon Mar 07, 2011 12:44 pm Post subject: Reply with quote
In the post he says he's been working on this for 'months' (before 'breaking out his pimp hand') Rolling Eyes

Just to be clear and to satisfy my special curiosity:

Adjusted +/- at BValue.com does not use a SPM component as AG claims?
Back to top
View user's profile Send private message Send e-mail
bchaikin

Joined: 27 Jan 2005
Posts: 686
Location: cleveland, ohio

PostPosted: Mon Mar 07, 2011 2:18 pm Post subject: Reply with quote
wait a minute...

you claimed in a previous thread:

Adjusted plus-minus... when compared to all other overall player rating systems... is clearly superior when done properly.

when questioned with some spurious results of adjusted plus minus, and whether you could substantiate them, you blew off that questioning with:

No, I have contractual obligations not to, and frankly I wouldn't care to if I was allowed. If you choose to make an effort to understand adjusted plus-minus then you'll likely come around, but I suspect you've already made up your mind about it.

but now when someone actually does make the effort to understand the process, you respond with:

Arturo clearly got a little confused there... Adjusted plus-minus is not easy to understand, especially for people who come in with a prior belief that it's not a good method. If Arturo actually wanted to understand APM you would think he would post in this forum or check out Eli's very informative work on the topic.

now you are blowing off the attempt to understand it with the caveat that they say it doesn't work because they don't want it to work and must have some grudge against it...

on the one hand you are saying people need to make an effort to understand the process, but when they actually do you say it's not easy to understand...

i read through his posting - several times - including the comments section. and my question for you is this - how do you respond to his statements?

Calculating Adjusted +/-

The final step is to take the Pure regression and the Stats model and adds them up by player like so:
APM = x* Pure +/- + (1-x)*Statistical +/-

And proceed to adjust x between 10% and 90% for each player to minimize the error. In essence he tweaks the rating to get a high R-Square. To summarize, the APM model calculates two variables with a low correlation to wins (R^2 <5%) and adds them up to minimize the error and guarantee a 90%+ Rsq. for the overall model.

Funny that.

What does this mean exactly? Well, the R^2 for the APM model is very much a fabrication. The correlation to point margin & wins of the model shown in Basketball value is artificially inflated by adding the error back in.

that last line is a pretty serious claim. if what he is saying is true, that the players values (normalized to player minutes played) do not add up to team wins (or team average per game point differential) without a fudge factor, then the value of the process for player evaluation is severely weakened...

i am not saying he is correct, but am simply asking - how do you respond to that?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
greyberger

Joined: 27 Sep 2010
Posts: 46

PostPosted: Mon Mar 07, 2011 6:17 pm Post subject: Reply with quote
Apparently this is much ado about nothing. Arturo is commenting on a SPM technique outlined in a 2004 Rosenbaum piece. He incorrectly describes this as being the foundation for BV.com's APM ratings.

Apologies for hijacking the thread.
Back to top
View user's profile Send private message Send e-mail
back2newbelf

Joined: 21 Jun 2005
Posts: 260

PostPosted: Mon Mar 07, 2011 7:10 pm Post subject: Reply with quote
Quote:
if what he is saying is true, that the players values (normalized to player minutes played) do not add up to team wins (or team average per game point differential) without a fudge factor, then the value of the process for player evaluation is severely weakened...

i am not saying he is correct, but am simply asking - how do you respond to that?

Possession weighted RAPM should line up to the teams' homecourt (and pace) adjusted SRS-rating pretty well, and also to the teams' average point differential. And I sure didn't use any "fudge factor"
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
DLew

Joined: 13 Nov 2006
Posts: 224

PostPosted: Mon Mar 07, 2011 8:19 pm Post subject: Reply with quote
Bob, I think greyberger's comment pretty much summed things up...
Back to top
View user's profile Send private message
Crow

Joined: 20 Jan 2009
Posts: 806

PostPosted: Mon Mar 07, 2011 9:47 pm Post subject: Reply with quote
Looking again at back2newbelf's Coaching only Adjusted +/= for the most recent 5 years I found a few trends:

Only Scott Brooks was +1 or better on both the offensive and defensive splits. Aided by the favorable comparison to the previous stretch with PJ Carlesimo.

Of the top 10 actives, 6 are in the east, 4 in the west.

Of the bottom 17 out the total of 67 Coaches active during the time period only 2 are still active. Only three Coaches -1 or worse overall are active.

There were 10 Coaches estimated to have more than a 2 point helpful impact on defense. No Coach was +2 on offense. 2 active coaches are over +1.5.

Only 6 of the 67 were over +1 on offense. 13 were estimated to have more than a 1 point helpful impact on defense. 20 were estimated to have more than a 0.5 point helpful impact on defense. 14 on offense. Only 3 better than 0.5 positive impact on both, so except for those rare occasions the better Coaches are estimated to be notably helpful on just one side of the court.

15 were estimated -1 or worse on offense. 13 on defense. None on both.
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 260

PostPosted: Tue Mar 08, 2011 5:31 pm Post subject: Reply with quote
Some of this stuff appeared here http://www.slate.com/id/2287339/
But I have to say, if that's the way magazines write about my work I don't want my stuff to appear anywhere.
Writing about a certain technique, then saying that it might actually a bad technique because one person said so and cite the critique, while failing to realize that (1) the person who wrote the critique only has comments on the original model and (2) that it has been proven that using ridge regression instead greatly improves performance.. that's just beautiful.

no wait, that's bad journalism
_________________
http://stats-for-the-nba.appspot.com/
page 5

Author Message
back2newbelf

Joined: 21 Jun 2005
Posts: 266

PostPosted: Fri Mar 25, 2011 10:03 am Post subject: Reply with quote
I can now easily update most files by the press of a button, so there might be almost daily updates from now on. Keep in mind that most ratings, especially the multiyear ones, don't change much from day to day. Time of update is now listed at the top of each ranking. The one year ranking will mostly be updated through the playoffs

I'll probably add another rating soon where I compute ratings of coaches on multiyear data (going back to 2002), then add those ratings to the computation of 1 year player ratings while leaving the coach ratings fixed. That might give a better estimation of players that play on super good/bad defensive teams. Of course, prediction performance of this method will have to be tested first.

Comments on the ratings:
-Thibodeau looks awesome, I didn't think it was possible to jump to the #1 spot after less than a full season
-Completely disagrees with the firing of O'Brian, but this method only measures the influence on the players being on the court. Did he have really weird substitution patterns?
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
EvanZ

Joined: 22 Nov 2010
Posts: 276

PostPosted: Fri Mar 25, 2011 10:34 am Post subject: Reply with quote
Are you planning to do adjusted four factors any time soon?
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
EvanZ

Joined: 22 Nov 2010
Posts: 276

PostPosted: Fri Mar 25, 2011 11:55 am Post subject: Reply with quote
Someone asked before, but are there standard errors somewhere? Even a ballpark would be nice for the 1-yr vs. 3.x-yr.
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 266

PostPosted: Fri Mar 25, 2011 11:59 am Post subject: Reply with quote
EvanZ wrote:
Someone asked before, but are there standard errors somewhere? Even a ballpark would be nice for the 1-yr vs. 3.x-yr.

I tried computing those via bootstrapping, but the way I did it wasn't correct. As of right now, I do not know how to compute them
_________________
http://stats-for-the-nba.appspot.com/
Back to top
View user's profile Send private message
Ilardi

Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Fri Mar 25, 2011 12:19 pm Post subject: Reply with quote
back2newbelf wrote:
EvanZ wrote:
Someone asked before, but are there standard errors somewhere? Even a ballpark would be nice for the 1-yr vs. 3.x-yr.

I tried computing those via bootstrapping, but the way I did it wasn't correct. As of right now, I do not know how to compute them

Joe Sill has said he has a method, but I believe even there he's not completely sure it's correct. SAS (my stat package of choice) doesn't provide any error estimates with ridge regression, but I'm assuming the error terms have to be lower than those generated by a corresponding regression model without the ridge correction (since the ridge technique helps reign in the coefficient variance inflation that arises due to excessive collinearity among predictors). So, at least we have an upper bound on the errors.

If someone can help elucidate the issue further, I'm sure we'd all be grateful!
Back to top
View user's profile Send private message
xkonk

Joined: 26 Jan 2011
Posts: 3

PostPosted: Fri Mar 25, 2011 2:44 pm Post subject: Reply with quote
Just a few reasonable-sounding results from a google search:

http://stats.stackexchange.com/question ... regression

http://www.m-hikari.com/forth2/rashwanI ... 2-2011.pdf (more math-y)

http://www.stat.purdue.edu/~xbw/courses ... opic5a.pdf

Across a few of the results, it seems like the main benefit of ridge regression is better predictive power, so the error of the predictors themselves aren't as critical as the error of the model overall. Response number two in the first link might be the most useful though: the errors may not have a real meaning at all in the context of ridge regression.
Back to top
View user's profile Send private message
DSMok1

Joined: 05 Aug 2009
Posts: 608
Location: Where the wind comes sweeping down the plains

PostPosted: Fri Mar 25, 2011 3:04 pm Post subject: Reply with quote
I agree with xkonk. I don't think the standard errors can be found that are meaningful. However, you could perhaps come up with some rough estimates based on out-of-sample predictions. Not sure how to do it, though.

The reason stderrs aren't going to be meaningful in ridge regression is this:

You're taking a highly correlated problem. You throw in a bunch of observations of 0 (this is the effect of ridge regression). Then if you get the standard errors of the regression, they'll be quite low using bootstrap or anything. The 0s cause that effect. But the 0s aren't actual data...in fact, they should have an "error" associated with them.
_________________
GodismyJudgeOK.com/DStats
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Crow

Joined: 20 Jan 2009
Posts: 810

PostPosted: Fri Mar 25, 2011 3:54 pm Post subject: Reply with quote
The Sloan research papers are available.

http://www.sloansportsconference.com/re ... 2/posters/

Anyone care to review and comment on them?

Piette used ridge regression and player pairs in the model. Omidiran used player pairs and boxscore stats in a single regression model I believe (the feasibility and desirability of which I have asked about before) but I don't think he used ridge regression.

Neither used multi-season to further reduce errors or made any other adjustments to my quick read beyond home-court.

Other things that might improve signal or reduce noise include: use of a performance "prior" or minutes or both, aging curves, consideration of the affects of rest & elevation, coaches, possession type and adjustment for clutch & garbage time.

Both cite improvement over their comparison base model. The changes are helpful but in the big picture might still be called modest, evolutionary ones and insufficient to win responsible high reliance on this model alone, though it could still be considered helpful as information from one tool among several with the weight still leaning, perhaps, towards direct boxscore based approaches.

Anyone here plan to incorporate either method for addressing player pairs into their APM version? Any reservations about how they did it?

Anyone plan to add more of the other adjustments listed above to try to reduce noise?

By the way, back2newbelf does the RAPM available at your site use your possession type sensitive approach? I assume it does, but want to be sure.

Last edited by Crow on Sat Mar 26, 2011 1:01 pm; edited 1 time in total
Back to top
View user's profile Send private message
Ryan J. Parker

Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Fri Mar 25, 2011 4:25 pm Post subject: Reply with quote
Why can't you just use the closed form solution? Is it because of the way lambda is chosen?
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
DSMok1

Joined: 05 Aug 2009
Posts: 608
Location: Where the wind comes sweeping down the plains

PostPosted: Fri Mar 25, 2011 4:31 pm Post subject: Reply with quote
Ryan J. Parker wrote:
Why can't you just use the closed form solution? Is it because of the way lambda is chosen?

How? Got an example of the calculation somewhere?
_________________
GodismyJudgeOK.com/DStats
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ryan J. Parker

Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Fri Mar 25, 2011 4:36 pm Post subject: Reply with quote
http://www.stat.sc.edu/~hansont/stat704/Lecture18.pdf

Slide 17 is what you want I think.

Now you'll have to do a little work to connect all of those dots in the formula, but that looks like something you could compute.
_________________
I am a basketball geek.

Posted: **Mon Apr 18, 2011 12:59 pm**

The one year numbers will be updated throughout the playoffs http://stats-for-the-nba.appspot.com/ranking11

Posted: **Wed Apr 20, 2011 6:40 pm**

Posted: **Fri Apr 22, 2011 8:47 pm**

It seems i get better out of sample prediction results for one year RAPM when I don't penalize for moving away from zero for everyone, but rather penalize for moving away from some negative number for players that played a small amount of possessions. I need to play around with this some more but it's certainly a good thing if we get a smaller out of sample error

Posted: **Fri Apr 22, 2011 10:16 pm**

Good luck with the further "tuning" of RAPM.

Previously did everyone get "penalized" for moving away from zero at the same rate regardless of where they fell on the distribution or did the tails of the distribution see a greater rate of regression to the mean?
I'd think there would be benefit for prediction of regressing the tails more heavily, including perhaps very weak performers with more than the low minute cutoff you are already working on.

Posted: **Sat Apr 23, 2011 1:57 am**

Crow wrote:Good luck with the further "tuning" of RAPM.

Previously did everyone get "penalized" for moving away from zero at the same rate regardless of where they fell on the distribution or did the tails of the distribution see a greater rate of regression to the mean?
I'd think there would be benefit for prediction of regressing the tails more heavily, including perhaps very weak performers with more than the low minute cutoff you are already working on.

The tails see a greater rate of regression to the mean because we're penalizing rating^2

I got the best results for tonight using the following method: For every player with less than 7000 possessions penalize the system for moving him away from -(0.03-0.03*(possessions of that player)/7000). Someone with zero possessions gets a -0.03 rating on offense and defense.

http://stats-for-the-nba.appspot.com/2011 no playoffs. Seems good to me. How good can players with no prior playing time really be? They're probably close to league worst

Posted: **Sat Apr 23, 2011 3:34 am**

Interesting. That had a huge effect on two rookies (Ekpe Udoh and Omer Asik) that were near the top previously.

Seems like a huge negative skew now. Does that really predict wins better?

Posted: **Sat Apr 23, 2011 4:11 am**

Hey, now Bron-Bron's better than Bosh. It's perfect!!

/I Kid

As an aside:
I really wish I understood regularization better (because I love the concept/results!). Every time I've tried to figure it out/do it on my own, I've been stumped. I only readily found one book containing the subject in my entire university library, and it wasn't any more beginner-friendly. Bleh.

Posted: **Sat Apr 23, 2011 5:53 am**

On this new version of 1 year RAPM, the Heat have 3 of top 10. Celtics 2, Mavs 1, Spurs 1. Magic 1, Nuggets 1, Hornets 1.
None for the Lakers, Thunder, Bulls, Hawks, etc.

On top 20 it is Heat 3, Celtics 2, Mavs 2, Spurs 2. Magic 1, Nuggets 1, Hornets 1,
Lakers 1, Thunder 1 (Collison), Bulls 2, Hawks 0, Blazers 1, Grizzlies 1 and 2 from teams not in the playoffs.

Kobe in 75th place.
(Love in 124th place at -0.4 per 48 minutes. Westbrook 143rd. Amare 137th.)

Posted: **Sat Apr 23, 2011 9:35 am**

EvanZ wrote:Seems like a huge negative skew now. Does that really predict wins better?

It's a system to predict point differential of a 5 against 5 lineup. From there you can obviously predict point differential between two teams if you sum up all the lineups

Crow, the biggest changes happened at the middle and bottom. Hinrich, who was in last place before, now is ~360 of 460. The bottom is now filled with low minute players that produced a horrible On/Off when they were actually playing. Those players got a rating close to zero before

Posted: **Sat Apr 23, 2011 2:32 pm**

J.E. wrote:It seems i get better out of sample prediction results for one year RAPM when I don't penalize for moving away from zero for everyone, but rather penalize for moving away from some negative number for players that played a small amount of possessions. I need to play around with this some more but it's certainly a good thing if we get a smaller out of sample error

We're getting somewhere, now!

The best way would be to use a Bayesian prior based on orthogonal measurements (things that RAPM doesn't include). MPG is obviously the first and most important. Another thing to include would perhaps be the team's overall Eff.Dif.; I'm not sure I would include that for sure, though.

Here's the equation of the prior I regress ASPM towards to get the best out-of-sample numbers within 1 season:

Code: Select all

	             Exp SPm
MPG	         0.27118
MPG*Margin	  0.00667
Margin	      0.02365
Intercept	  -7.09387

Before plugging MPG into this equation, I regress MPG with 3.5 games of 0 MPG to account for garbage-time players. The weighting of the regression to this mean is total minutes for ASPM, and 4 minutes*(Games + 3.5) for the prior. I found all of these numbers by doing out-of-sample validation a while back.

Example: for Lebron, his raw ASPM is 8.9. He's regressed towards 5.0 (the prior). His final ReASPM is 8.55.
For Mike Miller: raw ASPM is -1.85, regressed towards -0.88, final ASPM is -1.68.

Do you see how this works? I'm not sure how applicable it is for you, but something of this ilk sounds useful.

Posted: **Sat Apr 23, 2011 8:27 pm**

J.E.,

Do you plan to do the Adjusted +/- Factor for Turnovers?

With that, adjusted "scoring" and adjusted rebounding, the 4 Factors would be covered.

Total RAPM - adjusted "scoring" - adjusted rebounding scaled to per 100 team possessions instead of 100 opportunites might infer a value for the Adjusted +/- Factor for Turnovers but it would be much better to see it calculated separately and then see how the parts compare with overall RAPM.

Is there any way seeing overall RAPM and the Adjusted Factors could be used to better estimate error or even reduce average error? Say with out of sample testing of overall RAPM and with the sum of the Adjusted Factors and with various blends of them to see which does better or best?

Is adding a Baynesian prior (or several) a near term work goal or a long-term option?

Any interest in adding any other game context parameter beyond home court (e.g. rest & altitude)?

Just curious. Thanks again for your work and sharing it.

Posted: **Sun Apr 24, 2011 4:45 pm**

Along these same lines, I was playing with RAMP, and I had some success improving predictions by preprocessing the lineup data with statistical plus minus (SPM). For each lineup observation, I would use SPM to calculate the expected point differential, and then subtract that from the actual result. Then, I would calculate RAMP for this modified lineup data to get each player's non-boxscore rating, and then add the SPM back in to get a final result for each player.

The idea is that by subtracting the part of plus/minus that is captured by SPM, the remaining non-boxscore part should be closer to zero, and the regularization will have less of an effect in terms of compressing the distribution, especially on players with large ratings who deserve them (e.g. Lebron). I did get slightly improved next season predictions using this method (using my data, I couldn't calculate SPM for random games to do in-season cross-validation), but the differences were not very large.

Posted: **Sun Apr 24, 2011 5:45 pm**

An interesting hybrid RAPM / SPM approach.

The overall product of that is useful but it would also be useful to see splits of different contexts so you could see the beneficial and bad contexts for a player and emphasize the good ones and avoid the bad ones.

To maintain sample size it might be best to use a series of 2 choice splits rather than specify a context in multiple ways at once. Some possibilities would include playing with 3+ starters or not, playing against 3 or 4+ starters or not, playing in a lineup with a sum of usage above or below a threshold (whether that be 100% or 110% or something else), playing in a lineup with an expected team offense or defensive efficiency above or below average, playing a playoff ranked team or not, playing on or against a faster than average paced or not, etc.

I guess you could try to use multiple criteria on big minute starters over multi-seasons especially to try to predict a playoff series against one type of context but there would some loss of accuracy with each reduction in sample size.

APBRmetrics

Appr. 5.x year reg. adj. +/- (J.E., 2010)

Appr. 5.x year reg. adj. +/- (J.E., 2010)

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

New 1 year reg. adj. +/-

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul

Re: Appr. 5.x year reg. adj. +/- (updated with coaching,foul