Early 2013-14 stat-based observations
Re: Early 2013-14 stat-based observations
from talkingpractice's website:
"Individual Player Value (“IPV”) is a stabilized in-season RAPM model which uses a robust machine learning based SPM metric (“FORPM”) as a prior for RAPM. There is no previous season information used, to put it on par with other in-season metrics such as NPI RAPM, PER, WS, or EZPM. The choice of a FORPM metric as prior (using an ensemble consisting of random forest regressions and gradient boosting), rather than a traditional SPM metric, was made in part to eliminate discretion in variable selection, with the goal of making IPV a pure metric. In addition, a properly specified FORPM model (fit to SRS) performs much better out of sample than more plain vanilla regression-based models (especially with regard to ‘defense’). Due to not using any previous year info or an aging/experience curve, these values should be considered as descriptive more so than as predictive. The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player. This is again both for purity of the metric, and due to this approach being more predictive out of sample. Individual Player Values here are not meant to imply player rankings, nor are they meant to imply that they are the players value if he were to be traded, or have his role changed on his team."
I looked up random forests and saw that is considered a very helpful approach overall. I did see this caveat at the wikipedia page:
"This method of determining variable importance has some drawbacks. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Methods such as partial permutations can be used to solve the problem. If the data contain groups of correlated features of similar relevance for the output, then smaller groups are favored over larger groups." I wondered how this would affect your application. Would attributes "with more levels" be scoring, rebounding and assists? Would "groups of correlated features of similar relevance for the output" be FGA or usage, FG%, FTA, 3 pt FGA?
Is there any good reason to use or not use random forest regressions for the RAPM model?
"The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player."
This statement seems important to highlight. And it is consistent with Mike G's perspective that offensive stats should be seen in the context of a player's / team's defensive stats and some stuff I have seen about prior action impact on next plays. As much as I have been interested in offensive /defensive splits of RAPM, if a random forest regression for an overall RAPM model had noticeable less estimated error than the parts I would want to keep that in mind.
I am stretching from what I quickly read and do not by any means fully understand but is there any heightened value from using a random forest regression (SPM or RAPM) to think about / find player nearest neighbors / "similars"? Does anyone else who has worked on player similarity models have opinions or questions about this?
Are there any other statistical outputs generated by random forest regression runs that seem important besides the overall SPM (or RAPM) outputs?
This article looked somewhat interesting. Is a basketball player or his stats a "deformable object"?
http://www.google.com/url?sa=t&rct=j&q= ... 8121,d.cWc
There was some work with "facial mapping" awhile ago. http://www.countthebasket.com/blog/2008 ... off-faces/
"Individual Player Value (“IPV”) is a stabilized in-season RAPM model which uses a robust machine learning based SPM metric (“FORPM”) as a prior for RAPM. There is no previous season information used, to put it on par with other in-season metrics such as NPI RAPM, PER, WS, or EZPM. The choice of a FORPM metric as prior (using an ensemble consisting of random forest regressions and gradient boosting), rather than a traditional SPM metric, was made in part to eliminate discretion in variable selection, with the goal of making IPV a pure metric. In addition, a properly specified FORPM model (fit to SRS) performs much better out of sample than more plain vanilla regression-based models (especially with regard to ‘defense’). Due to not using any previous year info or an aging/experience curve, these values should be considered as descriptive more so than as predictive. The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player. This is again both for purity of the metric, and due to this approach being more predictive out of sample. Individual Player Values here are not meant to imply player rankings, nor are they meant to imply that they are the players value if he were to be traded, or have his role changed on his team."
I looked up random forests and saw that is considered a very helpful approach overall. I did see this caveat at the wikipedia page:
"This method of determining variable importance has some drawbacks. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Methods such as partial permutations can be used to solve the problem. If the data contain groups of correlated features of similar relevance for the output, then smaller groups are favored over larger groups." I wondered how this would affect your application. Would attributes "with more levels" be scoring, rebounding and assists? Would "groups of correlated features of similar relevance for the output" be FGA or usage, FG%, FTA, 3 pt FGA?
Is there any good reason to use or not use random forest regressions for the RAPM model?
"The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player."
This statement seems important to highlight. And it is consistent with Mike G's perspective that offensive stats should be seen in the context of a player's / team's defensive stats and some stuff I have seen about prior action impact on next plays. As much as I have been interested in offensive /defensive splits of RAPM, if a random forest regression for an overall RAPM model had noticeable less estimated error than the parts I would want to keep that in mind.
I am stretching from what I quickly read and do not by any means fully understand but is there any heightened value from using a random forest regression (SPM or RAPM) to think about / find player nearest neighbors / "similars"? Does anyone else who has worked on player similarity models have opinions or questions about this?
Are there any other statistical outputs generated by random forest regression runs that seem important besides the overall SPM (or RAPM) outputs?
This article looked somewhat interesting. Is a basketball player or his stats a "deformable object"?
http://www.google.com/url?sa=t&rct=j&q= ... 8121,d.cWc
There was some work with "facial mapping" awhile ago. http://www.countthebasket.com/blog/2008 ... off-faces/
Re: Early 2013-14 stat-based observations
Might LeBron find motivation in seeing that Durant is about to take 'his' MVP award?...his numbers are what they are. He decided to punt on defense this season, for the obvious reason(s).
Is it the end of the LeBron era, as default MVP?
He's got 4, tied with Wilt. Still behind Jordan and Russell, and 2 short of Kareem.
Is that good enough?
Re: Early 2013-14 stat-based observations
Looking at LeBron's boxscore stats against playoff teams (at hoopsstats.com), I see that his level of performance has been modestly lower overall than against lottery teams. He has not "gotten up" from them. Blocks and steals are close to the same level. Durant has done modestly better on boxscore stats than James against playoff teams. But Miami has a better win% against playoff level teams. About even against top 10 percentage-wise, with the Thunder having played many more such games so far.
LeBron should get into top 3 all-time on scoring. Could get #1 if he plays 8+ more seasons. Durant has a very good chance of passing him.
LeBron has a 20% edge on career winshares per 48 ahd is 5th all-time (with C Paul slightly ahead of him). The gap will probably narrow as Durant adds more seasons to offset the very weak and modest first two seasons. LeBron only had one season below .200. Durant currently 20th all-time and one of 21 over .200 for career (including 6 current players). http://www.basketball-reference.com/lea ... areer.html Wade has slipped below that (at .153 this season).
LeBron should get into top 3 all-time on scoring. Could get #1 if he plays 8+ more seasons. Durant has a very good chance of passing him.
LeBron has a 20% edge on career winshares per 48 ahd is 5th all-time (with C Paul slightly ahead of him). The gap will probably narrow as Durant adds more seasons to offset the very weak and modest first two seasons. LeBron only had one season below .200. Durant currently 20th all-time and one of 21 over .200 for career (including 6 current players). http://www.basketball-reference.com/lea ... areer.html Wade has slipped below that (at .153 this season).
Re: Early 2013-14 stat-based observations
All-time regular season totals through age 28 -- LeBron is the only non-center among the top 5 in FTA; the only non-guard among the top 19 in Ast (Garnett is #20) or the top 9 in Stl (Erving #10)
http://bkref.com/tiny/yWQME
And in playoffs, he ranks as high or higher all around:
In some of these categories, there are just a few players with half of LeBron's totals, thru age 28.
So here are leaders (RS) thru age 25. Durant's 2013-14 totals will all be counted in this age range.
Code: Select all
Minutes Points FT Attempts 3pt Attempts
30,374 LeBron 21,081 LeBron 6844 Dwight 3388 A Walker
29,583 Garnett 19,296 Kobe 6617 LeBron 3211 B Davis
28,379 Kobe 19,000 Jordan 6417 Wilt 3187 Allen
27,924 Marbury 18,837 Wilt 6397 Moses 3072 Lewis
27,694 Moses 17,846 Carmelo 6390 Shaq 3026 LeBron
Assists Steals Turnovers Win Shares
7037 Magic 1594 Jordan 2759 Isiah 152.6 LeBron
6985 Isiah 1477 Isiah 2525 LeBron 131.7 Kareem
5880 Marbury 1414 Paul 2488 Magic 130.4 Jordan
5829 Paul 1390 Drexler 2422 Moses 120.0 Wilt
5768 Oscar 1326 Magic 2298 Theus 117.6 Oscar
5409 Kidd 1323 LeBron
5302 LeBron 1309 Payton
http://bkref.com/tiny/yWQME
And in playoffs, he ranks as high or higher all around:
Code: Select all
Minutes Points FT Attempts 3pt Attempts
5954 LeBron 3871 LeBron 1365 LeBron 605 LeBron
5686 Magic 3184 Jordan 1295 Shaq 438 Kobe
5085 Kobe 3053 Kobe 960 Jordan 339 Horry
5016 Parker 2956 Shaq 947 Duncan 333 Billups
4416 Worthy 2741 Magic 898 Magic 324 Durant
Assists Steals Turnovers Win Shares
1800 Magic 297 Magic 518 Magic 29.5 LeBron
924 LeBron 240 Cheeks 482 LeBron 24.9 Magic
845 Rondo 236 LeBron 390 Parker 20.6 Jordan
839 Isiah 219 Jordan 374 Kobe 19.7 Duncan
820 K Johnson 216 Isiah 349 Pippen 17.8 Shaq
682 Parker 210 Pippen 344 Duncan 16.5 Erving
So here are leaders (RS) thru age 25. Durant's 2013-14 totals will all be counted in this age range.
Code: Select all
. Minutes Points FT Made 3FG Made Win Shares
22,108 LeBron 15,251 LeBron 3674 Durant 882 Arenas LeBron 103.3
20,401 Dwight 13,511 Durant 3650 LeBron 785 Durant Kareem 83.3
19,910 Garnett 12,711 Carmelo 3239 Carmelo 771 LeBron Dwight 79.8
19,557 McGrady 12,423 McGrady 3093 Kobe 770 B Gordon Durant 79.8
19,273 Marbury 12,215 Kobe 3085 Dwight 731 M Miller Paul 76.4
19,125 Durant 11,662 Erving 3032 Dantley 724 A Walker Erving 74.9
-
- Posts: 194
- Joined: Tue Oct 30, 2012 6:58 pm
- Location: The Alpha Quadrant
- Contact:
Re: Early 2013-14 stat-based observations
We think we've handled these issues in how we specified the forest and underlying data, but I'm also sure that our result isn't perfect. In addition (and to clarify), we used gradient boosting and a localized regression as well, and the final FORPM model is an ensemble of these three models. One caveat here (and I've discussed this a bit on Twitter) is that the additional benefit to out of sample prediction from these various things is not extreme. There's definitely a benefit, but it's not a game changer in any way.Crow wrote:I looked up random forests and saw that is considered a very helpful approach overall. I did see this caveat at the wikipedia page:
"This method of determining variable importance has some drawbacks. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Methods such as partial permutations can be used to solve the problem. If the data contain groups of correlated features of similar relevance for the output, then smaller groups are favored over larger groups." I wondered how this would affect your application. Would attributes "with more levels" be scoring, rebounding and assists? Would "groups of correlated features of similar relevance for the output" be FGA or usage, FG%, FTA, 3 pt FGA?
I'm confident that a "whole" model describes reality better than these artificial splits, but ofc I'm wrong a lot too, so who knows. So much depends on context, on the players role, etc..... eg so much of Dirk's allegedly good defense over the years (in RAPM models) is just the effects of his particular style of offense (I think that this is what you called "prior action impact on next plays"). So much depends on if a guy is told to go for ORB's or get back on D, etc. None of these confusions get in the way when looking at a complete player, in the complete game. We publish the RAPM-informed RAPM numbers now showing both Off and Def as that's what everyone wanted to see, but I still claim that they're inferior to the whole coef version in terms of oos predictive ability. We don't even really calculate IPV in the splits other than to look at them for curiosity/interest's sake.Crow wrote:"The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player."
I don't know much about similarity models, but I do think what you said is interesting.Crow wrote:I am stretching from what I quickly read and do not by any means fully understand but is there any heightened value from using a random forest regression (SPM or RAPM) to think about / find player nearest neighbors / "similars"? Does anyone else who has worked on player similarity models have opinions or questions about this?
The whole LeCoast thing is really making the rounds now. I 'get' why he's made that decision, and the franchise must be involved in it in some way (and we see this with Wade too)..... but it's already starting to feel like a Kobe thing to me a bit, where he reached a point and decided that he had to give up one side of the court. I guess maybe LeCoast can change some of this around in the playoffs when he presumably stops coasting. But for me at least, if he's going to 'decide' (or otherwise) to let his D go from a huge strength to a clear weakness, then I'm giving up (for now) my preconceived notion that he's the default MVP. I'm still intensely confident that he's the best player (by far) when he's giving full effort. But the lack of full effort really changes things, imo.Mike G wrote:Might LeBron find motivation in seeing that Durant is about to take 'his' MVP award?
Is it the end of the LeBron era, as default MVP?
He's got 4, tied with Wilt. Still behind Jordan and Russell, and 2 short of Kareem.
Is that good enough?
I do really think that lots of your recent arguments (about not rushing to already name him as the best player ever, until we see how he plays in the 2nd half of his career) have proven very very true, given what's happened this year thus far.
Re: Early 2013-14 stat-based observations
LeBron's already in the 2nd half of his career, perhaps well into it. With almost 32,000 minutes, he's at 56% of the longest ever, that of Kareem.
He's played longer minutes than Frazier, Nance, Howell, Pettit, Cousy, McHale, Schayes, Beaty, ..
Another 3000 minutes (middle of next season) and he'll have passed Ben Wallace, Lanier, Jerry Lucas, Mullin, Eddie Jones, Billups, Magic, Chet Walker, Schrempf, Divac, and Baylor
At age 31, he'll be passing Cummings, Laimbeer, Hornacek, Dantley, Robinson, Bird, Hill, Cheeks, Porter, Gervin, Unseld, Greer, Perkins, Mutombo, Sikma, Drexler, Iverson, ..
Some of these played 'forever', and others ended rather abruptly.
He's played longer minutes than Frazier, Nance, Howell, Pettit, Cousy, McHale, Schayes, Beaty, ..
Another 3000 minutes (middle of next season) and he'll have passed Ben Wallace, Lanier, Jerry Lucas, Mullin, Eddie Jones, Billups, Magic, Chet Walker, Schrempf, Divac, and Baylor
At age 31, he'll be passing Cummings, Laimbeer, Hornacek, Dantley, Robinson, Bird, Hill, Cheeks, Porter, Gervin, Unseld, Greer, Perkins, Mutombo, Sikma, Drexler, Iverson, ..
Some of these played 'forever', and others ended rather abruptly.
Re: Early 2013-14 stat-based observations
Using RAPM for similarity studies could be done at several levels: a sort by position & overall rating, offensive / defensive spilts, RAPM at factor level, any of these and boxscore stats or I assume there are other statistics available that describe the detailed pattern of team +/- game by game and minute by minute and that player similar could be taken down to that level to get at overall degree of impact consistency, at clutch-time, against good teams, good opponents,with and without star teammates, etc.
Re: Early 2013-14 stat-based observations
Defensive player of year?
I saw that Ibaka's defensive RAPM estimate has fallen from about 10th last season to about 60th this season. Will the eye test guys notice the same thing or go on past memories?
Based on his very high RAPM estimate and the team's defensive rating, I'd probably give it to Hibbert. I think it will go to him.
I saw that Ibaka's defensive RAPM estimate has fallen from about 10th last season to about 60th this season. Will the eye test guys notice the same thing or go on past memories?
Based on his very high RAPM estimate and the team's defensive rating, I'd probably give it to Hibbert. I think it will go to him.
-
- Posts: 306
- Joined: Sat Apr 16, 2011 7:40 am
- Location: Cambridge, MA
- Contact:
Re: Early 2013-14 stat-based observations
Hibbert probably will get it. It's fairly deserved.Crow wrote:Defensive player of year?
I saw that Ibaka's defensive RAPM estimate has fallen from about 10th last season to about 60th this season. Will the eye test guys notice the same thing or go on past memories?
Based on his very high RAPM estimate and the team's defensive rating, I'd probably give it to Hibbert. I think it will go to him.
The guy that has flown under the radar, to an extent, is Andrew Bogut. He's been spectacular.
http://pointsperpossession.com/
@PPPBasketball
@PPPBasketball