One Year RAPM and Weighted Ridge Regression

Bobbofitos · Post by **Bobbofitos** » Sat Jan 05, 2013 4:43 am

bchaikin wrote:like how Dwight Howard's defense is different post-surgery

for howard Synergy shows excellent man defense, a very low PPP and eFG% allowed, almost as low as Cs joakim noah and marc gasol. he's also blocking shots at his highest rate in 3 seasons...

Do you think he's been better or worse on defense this season relative to last year?

Chilltown · Post by **Chilltown** » Sun Jan 06, 2013 6:14 pm

I think non-prior RAPM can be very helpful, as both Kevin and PerMaximum have pointed out. The idea of understanding single season performance is decidedly different than the one of out-of-sample performance.

As for doubting the validity of RAPM because Ronnie Price comes out on top, answer me this PerMaximum:

If player A posts an RAPM of +8.5 in 900 minutes and player B posts an RAPM of +5 in 2500 minutes, who has contributed more to the team?

Any APM metric cannot be viewed in isolation.

DSMok1 · Post by **DSMok1** » Sun Jan 06, 2013 8:50 pm

Chilltown wrote: If player A posts an RAPM of +8.5 in 900 minutes and player B posts an RAPM of +5 in 2500 minutes, who has contributed more to the team?

Any APM metric cannot be viewed in isolation.

If player A posts an RAPM of +8.5 in 900 minutes, I'd guess the RAPM is not accurate.

jsill · Post by **jsill** » Sun Jan 06, 2013 11:06 pm

I get Kobe, Dirk, Paul Pierce, Manu, KG as my top 5 for 1-year 2007-2008 RAPM, and that top 5 doesn't change much for different choices of the regularization parameter as long as it's reasonable. If Ronnie Price is ranking number 1, there is probably an error either in the implementation or the data.

permaximum · Post by **permaximum** » Mon Jan 07, 2013 1:59 am

I was away for a while, sorry that I couldn't answer some questions.

EvanZ wrote:
permaximum wrote: Actually, I was looking for explanation of player performances in a season instead of prediction of future performances. That's why I wanted 1 year RAPM. But the results were very disappointing. So this Ronnie Price joke definetely confirms RAPM is very bad for explenation of performances in a season. As for prediction, what's the target? +/- of next season?
Ronnie Price only played 10 mpg that season. Who's the highest rated player that played > 30 mpg? It's probably going to be more meaningful.

Used the data from basketballvalue.com. Regression response was margin for 100 poss (it means I didn't come out with ratings for offense and defense) Did 10-fold cross validation (tried more variations but 10-fold is good enough I think) for possession weighted ridge regression. Lambda was found as 108.9681. Calculations were done with R 2.15.2, glmnet package 1.8.5.

I think 1500+ minutes would be a nice line for a season. So here's top 20 1500+. Values are centered at the league average.

2007/08 RAPM (qualified: 1500+ minutes)
1. Kevin Garnett 3.84
2. Manu Ginobili 3.33
3. Thaddeus Young 3.22
4. Paul Pierce 3.00
5. Eduarda Najera 3.00
6. Dirk Nowitzki 2.95
7. Steve Nash 2.88
8. Chris Bosh 2.65
9. Peja Stojakovic 2.63
10. Rasheed Wallace 2.47
11. Jamario Moon 2.39
12. Chuck Hayes 2.36
13. Tim Duncan 2.35
14. Pau Gasol 2.35
15. Chauncey Billups 2.30
16. Kobe Bryant 2.19
17. Rajon Rondo 2.19
18. Andrei Kirilenko 2.08
19. Kendrick Perkins 2.05
20. Josh Howard 2.00
----------------------------
25. Chris Paul
26. Dwight Howard
36. LeBron James
44. Amare Stoudemire

I think, I can't really argue with this list although there are a few weird names at top and low values for Paul, Howard, James and Stoudemire.

@Chilltown

I get what you mean, so that I decided to analyse players who played 1500+ minutes in a season.

@jsill

I would like to see your values. And where did you get your data?

DSMok1 · Post by **DSMok1** » Mon Jan 07, 2013 12:22 pm

permaximum wrote: @jsill

I would like to see your values. And where did you get your data?

Joe Sill is the one who originally developed RAPM: http://www.sloansportsconference.com/?p=2798 . (BTW, Joe -- is the Sloan paper no longer available online, or could I just not find it?)

permaximum · Post by **permaximum** » Mon Jan 07, 2013 6:18 pm

DSMok1 wrote:
permaximum wrote: @jsill

I would like to see your values. And where did you get your data?
Joe Sill is the one who originally developed RAPM: http://www.sloansportsconference.com/?p=2798 . (BTW, Joe -- is the Sloan paper no longer available online, or could I just not find it?)

Then I assume, the developer of RAPM didn't use some corrupted data so there are probably errors in the basketballvalue.com's data. I'm curious how do the RAPM values look like in Crow's RAPM files.

DSMok1 · Post by **DSMok1** » Mon Jan 07, 2013 6:53 pm

permaximum wrote:
DSMok1 wrote:
permaximum wrote: @jsill

I would like to see your values. And where did you get your data?
Joe Sill is the one who originally developed RAPM: http://www.sloansportsconference.com/?p=2798 . (BTW, Joe -- is the Sloan paper no longer available online, or could I just not find it?)
Then I assume, the developer of RAPM didn't use some corrupted data so there are probably errors in the basketballvalue.com's data. I'm curious how do the RAPM values look like in Crow's RAPM files.

One thing I would check is how your data is dealing with few-possession intervals--if a lineup was only on for 1 offensive possession and 0 defensive possessions, what happens to the data?

I would seriously consider also deriving both offensive and defensive ratings (doubling the response variables, and solving the problem I just mentioned).

jsill · Post by **jsill** » Mon Jan 07, 2013 7:34 pm

Joe Sill is the one who originally developed RAPM: http://www.sloansportsconference.com/?p=2798 . (BTW, Joe -- is the Sloan paper no longer available online, or could I just not find it?)

Yeah, at one point the paper was available for download on the site and now it's not- why, I couldn't tell you. They have the papers from the last 2 years available for download but not the 2010 papers. Someone at the conference last year told me they had to harrass the Sloan folks repeatedly via email in order to get the paper. If anyone wants a copy of the paper, send me an email at joe_sill at yahoo and I'll send you a copy.

They also recorded video of the talk but apparently the tapes for the research videos for that year were misplaced or something. I like how it says on the site "Video was not recorded for this talk"... not true... it was recorded but then somehow lost. It's probably just a random thing but it all seems a little odd to me.

I gathered the data myself by crawling the web. That's probably about as much as I'm able to get into the details of what I did, but again, anyone is welcome to get a copy of the paper from me.

Crow · Post by **Crow** » Mon Jan 07, 2013 8:53 pm

permaximum wrote: I'm curious how do the RAPM values look like in Crow's RAPM files.

I have located the files. They would need to be shifted into excel or something to work with them. I can send to one or more if there is a a clear request and e-mail via private message.

permaximum · Post by **permaximum** » Tue Jan 08, 2013 1:28 pm

DSMok1 wrote:One thing I would check is how your data is dealing with few-possession intervals--if a lineup was only on for 1 offensive possession and 0 defensive possessions, what happens to the data?

I would seriously consider also deriving both offensive and defensive ratings (doubling the response variables, and solving the problem I just mentioned).

Only 1 offensive or defensive possession means 1 possession total for both lineups. What you point out is better but I just wanted to quickly check how uninformed 1-year RAPM performs. I don't think seperating it for defense and offense will lead to considerably different results but I will try it anyway and see what happens.

permaximum · Post by **permaximum** » Thu Jan 10, 2013 1:37 am

@DSMok1

I just decided to seperate it for defensive-offensive RAPM and compare the results to values in Crow's files and I found out the thing you mentioned. I prepared the bbv.com's data according to this article (http://www.countthebasket.com/blog/2008 ... lus-minus/) for the regression but there are two problems in the article that skew the results a bit. In fact I realized it's wrong from the beginning when I read it carefully and I can't lie it made me angry.

Still I couldn't find "an easy" way to get that data ready for the regression. It looks I should seperate the predictors(players) for defense and offense and use both offensive and defensive rating as the response. Getting that data ready for it means considerable time for me. Is there an easy way I'm missing or that's the only way?

DSMok1 · Post by **DSMok1** » Thu Jan 10, 2013 2:53 am

permaximum wrote:@DSMok1

I just decided to seperate it for defensive-offensive RAPM and compare the results to values in Crow's files and I found out the thing you mentioned. I prepared the bbv.com's data according to this article (http://www.countthebasket.com/blog/2008 ... lus-minus/) for the regression but there are two problems in the article that skew the results a bit. In fact I realized it's wrong from the beginning when I read it carefully and I can't lie it made me angry.

Still I couldn't find "an easy" way to get that data ready for the regression. It looks I should seperate the predictors(players) for defense and offense and use both offensive and defensive rating as the response. Getting that data ready for it means considerable time for me. Is there an easy way I'm missing or that's the only way?

I think your last paragraph is the only obvious solution. Twice as many predictors, and twice as many rows/ratings. Could you just cycle through and for each line, split into two lines and appropriately code it?

I haven't done it, personally.

J.E. · Post by **J.E.** » Sun Jan 13, 2013 2:10 pm

Kevin Pelton wrote:Despite its limitations, I wish there was single-season RAPM (or even APM) available to answer questions that multi-season RAPM isn't designed to answer, like how Dwight Howard's defense is different post-surgery.

1y RAPM through Jan. 2nd has Howard ranked 403th out of 436. Obviously many worse players have higher rating because of a lack of data (one, if not the biggest problem of vanilla RAPM).
I would argue that, to see the effect of injury for a specific player, it's probably better to do a multiyear RAPM analysis and simply treat the player that recently came back from injury as an entirely new player, then compare the ratings of the two players (Howard_before_injury with Howard_after_injury).

permaximum · Post by **permaximum** » Sun Jan 13, 2013 11:49 pm

DSMok1 wrote:
permaximum wrote:@DSMok1

I just decided to seperate it for defensive-offensive RAPM and compare the results to values in Crow's files and I found out the thing you mentioned. I prepared the bbv.com's data according to this article (http://www.countthebasket.com/blog/2008 ... lus-minus/) for the regression but there are two problems in the article that skew the results a bit. In fact I realized it's wrong from the beginning when I read it carefully and I can't lie it made me angry.

Still I couldn't find "an easy" way to get that data ready for the regression. It looks I should seperate the predictors(players) for defense and offense and use both offensive and defensive rating as the response. Getting that data ready for it means considerable time for me. Is there an easy way I'm missing or that's the only way?
I think your last paragraph is the only obvious solution. Twice as many predictors, and twice as many rows/ratings. Could you just cycle through and for each line, split into two lines and appropriately code it?

I haven't done it, personally.

Actually it took 15 minutes to get the data ready via simple coding in excel but formulazing 50 million cells gave me headache because of limited memory issues. I did it anyway. I checked J.E.'s vanilla RAPM list for each year (files that Crow gave me) and I can say results are very smilar as far as the placement goes. It's just my values are lower probably because of higher lambda I used. My minutes cutoff for players was 495 minutes for 2010-11 (used top 75% in minutes played). I don't know J.E.'s cutoff. It should be the second reason why values are lower but player placement is very close.

APBRmetrics

One Year RAPM and Weighted Ridge Regression

Re: Permaximum Ratings and Rankings (Updated Often)

Re: One Year RAPM and Weighted Ridge Regression

Re: One Year RAPM and Weighted Ridge Regression

Re: One Year RAPM and Weighted Ridge Regression

Re: Permaximum Ratings and Rankings (Updated Often)

Re: Permaximum Ratings and Rankings (Updated Often)

Re: Permaximum Ratings and Rankings (Updated Often)

Re: Permaximum Ratings and Rankings (Updated Often)

Re: One Year RAPM and Weighted Ridge Regression

Re: Permaximum Ratings and Rankings (Updated Often)

Re: Permaximum Ratings and Rankings (Updated Often)

Re: One Year RAPM and Weighted Ridge Regression

Re: One Year RAPM and Weighted Ridge Regression

Re: Permaximum Ratings and Rankings (Updated Often)

Re: One Year RAPM and Weighted Ridge Regression