Early 2013-14 stat-based observations

Crow · Post by **Crow** » Sun Jan 19, 2014 5:35 pm

from talkingpractice's website:

"Individual Player Value (“IPV”) is a stabilized in-season RAPM model which uses a robust machine learning based SPM metric (“FORPM”) as a prior for RAPM. There is no previous season information used, to put it on par with other in-season metrics such as NPI RAPM, PER, WS, or EZPM. The choice of a FORPM metric as prior (using an ensemble consisting of random forest regressions and gradient boosting), rather than a traditional SPM metric, was made in part to eliminate discretion in variable selection, with the goal of making IPV a pure metric. In addition, a properly specified FORPM model (fit to SRS) performs much better out of sample than more plain vanilla regression-based models (especially with regard to ‘defense’). Due to not using any previous year info or an aging/experience curve, these values should be considered as descriptive more so than as predictive. The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player. This is again both for purity of the metric, and due to this approach being more predictive out of sample. Individual Player Values here are not meant to imply player rankings, nor are they meant to imply that they are the players value if he were to be traded, or have his role changed on his team."

I looked up random forests and saw that is considered a very helpful approach overall. I did see this caveat at the wikipedia page:
"This method of determining variable importance has some drawbacks. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Methods such as partial permutations can be used to solve the problem. If the data contain groups of correlated features of similar relevance for the output, then smaller groups are favored over larger groups." I wondered how this would affect your application. Would attributes "with more levels" be scoring, rebounding and assists? Would "groups of correlated features of similar relevance for the output" be FGA or usage, FG%, FTA, 3 pt FGA?

Is there any good reason to use or not use random forest regressions for the RAPM model?

"The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player."
This statement seems important to highlight. And it is consistent with Mike G's perspective that offensive stats should be seen in the context of a player's / team's defensive stats and some stuff I have seen about prior action impact on next plays. As much as I have been interested in offensive /defensive splits of RAPM, if a random forest regression for an overall RAPM model had noticeable less estimated error than the parts I would want to keep that in mind.

I am stretching from what I quickly read and do not by any means fully understand but is there any heightened value from using a random forest regression (SPM or RAPM) to think about / find player nearest neighbors / "similars"? Does anyone else who has worked on player similarity models have opinions or questions about this?

Are there any other statistical outputs generated by random forest regression runs that seem important besides the overall SPM (or RAPM) outputs?

This article looked somewhat interesting. Is a basketball player or his stats a "deformable object"?
http://www.google.com/url?sa=t&rct=j&q= ... 8121,d.cWc

There was some work with "facial mapping" awhile ago. http://www.countthebasket.com/blog/2008 ... off-faces/

Mike G · Post by **Mike G** » Mon Jan 20, 2014 1:46 pm

...his numbers are what they are. He decided to punt on defense this season, for the obvious reason(s).

Might LeBron find motivation in seeing that Durant is about to take 'his' MVP award?
Is it the end of the LeBron era, as default MVP?

He's got 4, tied with Wilt. Still behind Jordan and Russell, and 2 short of Kareem.
Is that good enough?

Crow · Post by **Crow** » Mon Jan 20, 2014 5:51 pm

Looking at LeBron's boxscore stats against playoff teams (at hoopsstats.com), I see that his level of performance has been modestly lower overall than against lottery teams. He has not "gotten up" from them. Blocks and steals are close to the same level. Durant has done modestly better on boxscore stats than James against playoff teams. But Miami has a better win% against playoff level teams. About even against top 10 percentage-wise, with the Thunder having played many more such games so far.

LeBron should get into top 3 all-time on scoring. Could get #1 if he plays 8+ more seasons. Durant has a very good chance of passing him.

LeBron has a 20% edge on career winshares per 48 ahd is 5th all-time (with C Paul slightly ahead of him). The gap will probably narrow as Durant adds more seasons to offset the very weak and modest first two seasons. LeBron only had one season below .200. Durant currently 20th all-time and one of 21 over .200 for career (including 6 current players). http://www.basketball-reference.com/lea ... areer.html Wade has slipped below that (at .153 this season).

Mike G · Post by **Mike G** » Tue Jan 21, 2014 10:09 pm

All-time regular season totals through age 28 --

Code: Select all

Minutes            Points            FT Attempts    3pt Attempts
30,374  LeBron    21,081  LeBron    6844  Dwight   3388  A Walker
29,583  Garnett   19,296  Kobe      6617  LeBron   3211  B Davis
28,379  Kobe      19,000  Jordan    6417  Wilt     3187  Allen
27,924  Marbury   18,837  Wilt      6397  Moses    3072  Lewis
27,694  Moses     17,846  Carmelo   6390  Shaq     3026  LeBron


Assists          Steals          Turnovers       Win Shares
7037  Magic     1594  Jordan    2759  Isiah     152.6  LeBron
6985  Isiah     1477  Isiah     2525  LeBron    131.7  Kareem
5880  Marbury   1414  Paul      2488  Magic     130.4  Jordan
5829  Paul      1390  Drexler   2422  Moses     120.0  Wilt
5768  Oscar     1326  Magic     2298  Theus     117.6  Oscar 
5409  Kidd      1323  LeBron
5302  LeBron    1309  Payton

LeBron is the only non-center among the top 5 in FTA; the only non-guard among the top 19 in Ast (Garnett is #20) or the top 9 in Stl (Erving #10)
http://bkref.com/tiny/yWQME

And in playoffs, he ranks as high or higher all around:

Code: Select all

Minutes          Points          FT Attempts     3pt Attempts
5954  LeBron    3871  LeBron    1365  LeBron    605  LeBron
5686  Magic     3184  Jordan    1295  Shaq      438  Kobe
5085  Kobe      3053  Kobe       960  Jordan    339  Horry
5016  Parker    2956  Shaq       947  Duncan    333  Billups
4416  Worthy    2741  Magic      898  Magic     324  Durant


Assists          Steals         Turnovers      Win Shares
1800  Magic     297  Magic     518  Magic     29.5  LeBron
924   LeBron    240  Cheeks    482  LeBron    24.9  Magic
845   Rondo     236  LeBron    390  Parker    20.6  Jordan
839   Isiah     219  Jordan    374  Kobe      19.7  Duncan
820  K Johnson  216  Isiah     349  Pippen    17.8  Shaq
682   Parker    210  Pippen    344  Duncan    16.5  Erving

In some of these categories, there are just a few players with half of LeBron's totals, thru age 28.

So here are leaders (RS) thru age 25. Durant's 2013-14 totals will all be counted in this age range.

Code: Select all

. Minutes           Points          FT Made       3FG Made       Win Shares
22,108  LeBron   15,251  LeBron   3674  Durant   882  Arenas    LeBron  103.3
20,401  Dwight   13,511  Durant   3650  LeBron   785  Durant    Kareem   83.3
19,910  Garnett  12,711  Carmelo  3239  Carmelo  771  LeBron    Dwight   79.8
19,557  McGrady  12,423  McGrady  3093  Kobe     770  B Gordon  Durant   79.8
19,273  Marbury  12,215  Kobe     3085  Dwight   731  M Miller  Paul     76.4
19,125  Durant   11,662  Erving   3032  Dantley  724  A Walker  Erving   74.9

talkingpractice · Post by **talkingpractice** » Wed Jan 22, 2014 3:43 pm

Crow wrote:I looked up random forests and saw that is considered a very helpful approach overall. I did see this caveat at the wikipedia page:
"This method of determining variable importance has some drawbacks. For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Methods such as partial permutations can be used to solve the problem. If the data contain groups of correlated features of similar relevance for the output, then smaller groups are favored over larger groups." I wondered how this would affect your application. Would attributes "with more levels" be scoring, rebounding and assists? Would "groups of correlated features of similar relevance for the output" be FGA or usage, FG%, FTA, 3 pt FGA?

We think we've handled these issues in how we specified the forest and underlying data, but I'm also sure that our result isn't perfect. In addition (and to clarify), we used gradient boosting and a localized regression as well, and the final FORPM model is an ensemble of these three models. One caveat here (and I've discussed this a bit on Twitter) is that the additional benefit to out of sample prediction from these various things is not extreme. There's definitely a benefit, but it's not a game changer in any way.

Crow wrote:"The model is based on ‘basketball’, and not on ‘offense’ nor on ‘defense’, and as such there is only one coefficient for each player."

I'm confident that a "whole" model describes reality better than these artificial splits, but ofc I'm wrong a lot too, so who knows. So much depends on context, on the players role, etc..... eg so much of Dirk's allegedly good defense over the years (in RAPM models) is just the effects of his particular style of offense (I think that this is what you called "prior action impact on next plays"). So much depends on if a guy is told to go for ORB's or get back on D, etc. None of these confusions get in the way when looking at a complete player, in the complete game. We publish the RAPM-informed RAPM numbers now showing both Off and Def as that's what everyone wanted to see, but I still claim that they're inferior to the whole coef version in terms of oos predictive ability. We don't even really calculate IPV in the splits other than to look at them for curiosity/interest's sake.

Crow wrote:I am stretching from what I quickly read and do not by any means fully understand but is there any heightened value from using a random forest regression (SPM or RAPM) to think about / find player nearest neighbors / "similars"? Does anyone else who has worked on player similarity models have opinions or questions about this?

I don't know much about similarity models, but I do think what you said is interesting.

Mike G wrote:
Might LeBron find motivation in seeing that Durant is about to take 'his' MVP award?
Is it the end of the LeBron era, as default MVP?
He's got 4, tied with Wilt. Still behind Jordan and Russell, and 2 short of Kareem.
Is that good enough?

The whole LeCoast thing is really making the rounds now. I 'get' why he's made that decision, and the franchise must be involved in it in some way (and we see this with Wade too)..... but it's already starting to feel like a Kobe thing to me a bit, where he reached a point and decided that he had to give up one side of the court. I guess maybe LeCoast can change some of this around in the playoffs when he presumably stops coasting. But for me at least, if he's going to 'decide' (or otherwise) to let his D go from a huge strength to a clear weakness, then I'm giving up (for now) my preconceived notion that he's the default MVP. I'm still intensely confident that he's the best player (by far) when he's giving full effort. But the lack of full effort really changes things, imo.

I do really think that lots of your recent arguments (about not rushing to already name him as the best player ever, until we see how he plays in the 2nd half of his career) have proven very very true, given what's happened this year thus far.

Mike G · Post by **Mike G** » Wed Jan 22, 2014 5:00 pm

LeBron's already in the 2nd half of his career, perhaps well into it. With almost 32,000 minutes, he's at 56% of the longest ever, that of Kareem.

He's played longer minutes than Frazier, Nance, Howell, Pettit, Cousy, McHale, Schayes, Beaty, ..
Another 3000 minutes (middle of next season) and he'll have passed Ben Wallace, Lanier, Jerry Lucas, Mullin, Eddie Jones, Billups, Magic, Chet Walker, Schrempf, Divac, and Baylor

At age 31, he'll be passing Cummings, Laimbeer, Hornacek, Dantley, Robinson, Bird, Hill, Cheeks, Porter, Gervin, Unseld, Greer, Perkins, Mutombo, Sikma, Drexler, Iverson, ..
Some of these played 'forever', and others ended rather abruptly.

Crow · Post by **Crow** » Wed Jan 22, 2014 6:34 pm

Using RAPM for similarity studies could be done at several levels: a sort by position & overall rating, offensive / defensive spilts, RAPM at factor level, any of these and boxscore stats or I assume there are other statistics available that describe the detailed pattern of team +/- game by game and minute by minute and that player similar could be taken down to that level to get at overall degree of impact consistency, at clutch-time, against good teams, good opponents,with and without star teammates, etc.

Crow · Post by **Crow** » Fri Feb 07, 2014 8:34 pm

Defensive player of year?

I saw that Ibaka's defensive RAPM estimate has fallen from about 10th last season to about 60th this season. Will the eye test guys notice the same thing or go on past memories?

Based on his very high RAPM estimate and the team's defensive rating, I'd probably give it to Hibbert. I think it will go to him.

Bobbofitos · Post by **Bobbofitos** » Fri Feb 07, 2014 9:39 pm

Crow wrote:Defensive player of year?

I saw that Ibaka's defensive RAPM estimate has fallen from about 10th last season to about 60th this season. Will the eye test guys notice the same thing or go on past memories?

Based on his very high RAPM estimate and the team's defensive rating, I'd probably give it to Hibbert. I think it will go to him.

Hibbert probably will get it. It's fairly deserved.

The guy that has flown under the radar, to an extent, is Andrew Bogut. He's been spectacular.

APBRmetrics

Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations

Re: Early 2013-14 stat-based observations