APBRmetrics

Posted: **Mon Nov 10, 2014 5:25 am**

HoopDon wrote:BPM (RAPM scale but with box-score data only) ranked Anthony Davis as roughly the 18th best player last year (I believe). So the most reliable box-score metric liked him a lot better than RPM, but still didn't think he was "amazing".

My work had him somewhere between the 7th best player (HnI, production metric that ignores missed games) to 14th best player (overall WAR) in the NBA last season - depending on which of my metrics you look at.

Posted: **Mon Nov 10, 2014 8:46 am**

To increase prediction, RPM becomes a very good metric at calculating the impact of "average players"... This will lead to better prediction overall, however a worse metric for players who has unusually good or bad a few seasons in their careers. So players with less than 4 seasons will never be truely valued by RPM thanks to those priors... Also, let's not forget the inclusion of the previous years' data as a part of the calculation.

Check these xRAPM values for Garnett and 03 class (LBJ,Wade,Bosh and Carmelo). Look at the pattern for the first 4-7 years. 14YR is 14-year-RAPM.

Code: Select all

Year	 KG	 LBJ  Wade Carmelo Bosh
2001	5.5				
2002	6.1				
2003	9.7				
2004	10.3	  0	0.3  -1.9  -1
2005	8.2	 4.4	3	 -0.6	2.4
2006	8.1	 6.3	6.9  -0.1	3.5
2007	8.6	 7.4	6.6	0.2	4.8
2008	8.4	 8.2	3.7	0.1	6.1
2009	6.1	11.3	8.4	2.4	4.9
2010	3.6	11.9	9.4	2.1	5.4
2011	5.2	 8.1	5.9	1.9	3.5
2012	5.9	 9.5	6.1	1.5	2.5
2013	4.1	10.1	4.5	2.3	1.9
2014	3.5	 7.9	3.6	2.4	3.8
Mean	6.66	7.74  5.31  0.94  3.44
14YR	9.7	 9.5	5.6	1.7	3.1

BTW, so far every yearly NPI-RAPM (simple vanilla RAPM) values and rankings I've ever seen differs from each other to some degree. It's especially prominent on the values.

I'm not a statistician and I don't and won't get money from these things, but it's surprising to see those "pros" don't share the data and the calculation process of the metrics. That's the first thing you do in these things. For example how do they decide on cross validation "k"? A small change in lambda changes the results greatly. How do they do ridge regression? Was the lambda found according to mean squared errors or sum of squared errors?

I especially don't believe 1-year NPI-RAPM results if I don't see 4-5 players with very low possessions on top. This means either they use a cutoff( which they shouldn't) or priors one way or another.

For xRAPM or RPM, it's a mystery. In theory it sounds legit, but how do we know if there's a human error in the process or not?

On the other hand, I'm also curious, with 5 years of data (like xRAPM or RPM uses), how would a different metric do at prediction?

Posted: **Mon Nov 10, 2014 10:44 am**

I'm surprised this has gone on so long without stating some obvious things.

-Box score metrics are terrible at defense. You cannot judge a defender based on blocks/steals/rebounds. That's silly.
-Pelicans were 27th in defensive rating last year. He was the lead big man defender on an awful defense.
-Their defense wasn't appreciably different when he was off the court. That is a distressing sign for a "star" big man defender on a team without depth.
-His offense, while surprisingly good, is nothing earth-shattering either. He's no Dirk here.

Few possessions end with a blocked shot, and even then it's likely the other team will still have the ball anyway. You need to judge a defender on how he guards every shot, not just the blocks.

Posted: **Mon Nov 10, 2014 1:04 pm**

permaximum wrote:To increase prediction, RPM becomes a very good metric at calculating the impact of "average players"... This will lead to better prediction overall, however a worse metric for players who has unusually good or bad a few seasons in their careers. So players with less than 4 seasons will never be truely valued by RPM thanks to those priors... Also, let's not forget the inclusion of the previous years' data as a part of the calculation.

Can you explain the difference between "prior" and "previous years' data" in the calculation of RPM? You seem to know that and I would really appreciate, if you could explain it.

permaximum wrote: Check these xRAPM values for Garnett and 03 class (LBJ,Wade,Bosh and Carmelo). Look at the pattern for the first 4-7 years. 14YR is 14-year-RAPM.

Uh? Similar pattern is seen in NPI, or boxscore-based approaches overall. Fact is, rookies aren't particular good players, and players also tend to become better players over time. So, what exactly is your issue here? If you want to say that using the same rookie-prior isn't the best solution, you will very likely see agreement all over the place, but so far using such rookie prior is still better than not using one.

Overall, the more possession a rookie was used, the less influence the rookie prior will have. For players with a lot of minutes in their first season, the RPM should reflect there real value a bit better. If that value is then adjusted according to a somewhat proper aging curve, there shouldn't be much of an issue with the values of younger players.

permaximum wrote: I especially don't believe 1-year NPI-RAPM results if I don't see 4-5 players with very low possessions on top.

Why? How far a player's coefficient can differ from 0 depends on the choosen lambda and the amount of possessions. With a bigger lambda, players with a very low amount of poessessions will likely stay around 0. So, this conclusion:

permaximum wrote: This means either they use a cutoff( which they shouldn't) or priors one way or another.

Isn't per se true. Nonetheless, eliminating low possessions players by packing them for example all together as one artificial player isn't such a bad solution. Most of those low possessions players are basically D-League material anyway. Unless they are injured, the most likely reason for them not getting minutes is: They suck.

Posted: **Mon Nov 10, 2014 2:08 pm**

mystic wrote: Can you explain the difference between "prior" and "previous years' data" in the calculation of RPM? You seem to know that and I would really appreciate, if you could explain it.

RPM's priors are probably found by a formula which includes adjustments such as age, court, game score, rookie or not, spm and previous years' rapm data. However i guess it also has weightings of previous years' RPM. Such as RPM15-Final= RPM'12*0.10+ RPM13*0.20 +RPM14*0,30+RPM15*0.40.

That's the difference.

mystic wrote:Uh? Similar pattern is seen in NPI, or boxscore-based approaches overall. Fact is, rookies aren't particular good players, and players also tend to become better players over time. So, what exactly is your issue here? If you want to say that using the same rookie-prior isn't the best solution, you will very likely see agreement all over the place, but so far using such rookie prior is still better than not using one.

Overall, the more possession a rookie was used, the less influence the rookie prior will have. For players with a lot of minutes in their first season, the RPM should reflect there real value a bit better. If that value is then adjusted according to a somewhat proper aging curve, there shouldn't be much of an issue with the values of younger players.

No. Not this prominent in SPMs or NPI-RAPM. I also think you didn't get why I used KG in the above table either...

mystic wrote:Why? How far a player's coefficient can differ from 0 depends on the choosen lambda and the amount of possessions. With a bigger lambda, players with a very low amount of poessessions will likely stay around 0. So, this conclusion:

No. That's not what I find... It's the same for all years (2005-12) and here's 2007-08 RAPM for example. Increasing the lambda doesn't change things either. BBV data used. Calculated in R 3.1.2, by cv.glmnet function in glmnet 1.9.8 package. Used 50-fold cross validation because of the stabilization of lambda at that point and lambda.min was found "71.88326". I can share the regression ready data too if anyone wants. I'm not holding back anything unlike most users here do.

Code: Select all

Player	Possessions (O+D)	O-RAPM	D-RAPM	RAPM	MVP Score
James, Jerome	20	-10.45	39.21	28.75	3
Langford, Keith	36	-0.71	16.66	15.95	3
McRoberts, Josh	116	-1.74	16.78	15.04	9
Woods, Loren	60	8.60	2.64	11.24	3
Randolph, Shavlik	104	8.41	-0.27	8.15	4
Diaz, Guillermo	70	10.35	-2.65	7.70	3
Dupree, Ronald	76	-1.74	8.76	7.02	3
Martin, Darrick	530	1.71	4.56	6.27	17
Price, Ronnie	2238	1.34	3.43	4.77	53
Harrington, Othella	636	-1.64	5.86	4.23	13
Garnett, Kevin	8836	1.53	2.58	4.12	182

mystic wrote:Isn't per se true. Nonetheless, eliminating low possessions players by packing them for example all together as one artificial player isn't such a bad solution. Most of those low possessions players are basically D-League material anyway. Unless they are injured, the most likely reason for them not getting minutes is: They suck.

Agreed that those players generally suck. However if somebody's eliminating them or using a replacement low-minute player for all of them, they should point that out first. It's no more NPI-RAPM after that. And how would you decide the cutting point?

Posted: **Mon Nov 10, 2014 2:57 pm**

permaximum wrote: That's the difference.

So, what you want to say that "you guess" how the prior looks like and then call part of the prior "previous season's data"? Did I get that correctly?

permaximum wrote: No. Not this prominent in SPMs or NPI-RAPM.

Nonetheless, we see a similar pattern for players, usually they look worse as rookies than in later years and towards the end of their career they decline. In RPM the process is more smooth, that's all.

permaximum wrote: I also think you didn't get why I used KG in the above table either...

I don't even get why you use specific players in the first place, because it should be rather obvious that the results have to be looked at as a whole. You want to point out that some players may get over- or underrated? Yeah, everyone will agree. Question is: Should a metric be judged by the random outliers or by the overall performance in terms of prediction and explanation?

permaximum wrote: No. That's not what I find...

Well, that may not what you found, but that's what the math will tell you.

permaximum wrote: Agreed that those players generally suck. However if somebody's eliminating them or using a replacement low-minute player for all of them, they should point that out first. It's no more NPI-RAPM after that. And how would you decide the cutting point?

Actually, it would still be NPI-RAPM, because the NPI points to the fact that no prior was used for a ridge regression, nothing else. That someone should probably point out how the raw data was arranged, is a good point, but it would still be NPI RAPM. But I want to point out that there are reasons people hold back with such information, because usually they aren't getting paid just for calculating such numbers, but probabaly use an advantage they gain to make money based on that. Giving up that advantage may costs them money. Also, the audience is usually pretty small, and the effort is mostly not worth it (I speak from my experience, and I can say that explaining things in detail are not often appreciated, while the reactions by some are even getting hostile), especially because most times follow-up question are coming up and answering those is costing a lot of time. In an ideal world we would all openly discuss different approaches to the issue at hand, but we are not living in such an environment.

Well, how to decide about the "cutting point" is actually a good question, and you may get different answers from different people. I for example use a non-linear weighting scheme, not going further into detail, but it really helps with this explanatory and predictive factor of the metric.

Posted: **Mon Nov 10, 2014 3:11 pm**

JE has engaged in a pretty good amount of discussion about details about his method and changes to it over time. Eli Witus gave a step by step of the basic method. I don't know how fully everyone here now has read back threads on RAPM but there is a lot there. Not everything maybe but perhaps JE and others will say more if there are specific questions.

Posted: **Mon Nov 10, 2014 3:18 pm**

Crow wrote:JE has engaged in a pretty good amount of discussion about details about his method and changes to it over time.

Indeed, and there is even a thread in which he explains his SPM approach, and how the prior looks like (at least for xRAPM, but from what I gathered that didn't change for RPM). You can find information what lambda he uses and what raw data goes in. Maybe a nice looking overview is missing, but overall he gave a lot of information which should be helpful to understand the RPM approach and with the interpretation of the results. Anyway, that is not true for other publical available RAPM-like results.

But I guess the amount of critics gets bigger with an increased popularity, seems pretty unfair towards JE in my opinion.

Posted: **Mon Nov 10, 2014 3:22 pm**

I still haven't and won't be doing a full study of where Davis ranked last season on the scale of what I would do if asked by a team; but, on brief consideration of the various metrics, I'd probably rank Davis at about the 7th - 10th most impactful PF last season. His rank will be higher this season.Looking forward to seeing the rpm estimate at mid-season.

Posted: **Mon Nov 10, 2014 10:44 pm**

mystic wrote: So, what you want to say that "you guess" how the prior looks like and then call part of the prior "previous season's data"? Did I get that correctly?

Yes, I "guess" priors because it's not stated anywhere what they are and how they are calculated. No, previous seasons' RPM data is no prior. That data should be used after the computation. That's what I think the keeper of the metric is doing with RPM anyways.

mystic wrote:Nonetheless, we see a similar pattern for players, usually they look worse as rookies than in later years and towards the end of their career they decline. In RPM the process is more smooth, that's all.

That's an artificial smoothness thanks to priors.

mystic wrote:I don't even get why you use specific players in the first place, because it should be rather obvious that the results have to be looked at as a whole. You want to point out that some players may get over- or underrated? Yeah, everyone will agree. Question is: Should a metric be judged by the random outliers or by the overall performance in terms of prediction and explanation?

Kevin Garnett was the one I chose because his RAPM value is the best in 14-year RAPM and he was already in his prime at 2001 with 6 seasons into his career. Just like the pattern with 03 class rookies, he shows the exact same pattern for the next 4 years.

mystic wrote:Well, that may not what you found, but that's what the math will tell you.

In theory, yes. In practice, no. If you could show me some results with the detailed steps to reproduce it, we would come to an agreement.

mystic wrote:Actually, it would still be NPI-RAPM, because the NPI points to the fact that no prior was used for a ridge regression, nothing else. That someone should probably point out how the raw data was arranged, is a good point, but it would still be NPI RAPM. But I want to point out that there are reasons people hold back with such information, because usually they aren't getting paid just for calculating such numbers, but probabaly use an advantage they gain to make money based on that. Giving up that advantage may costs them money. Also, the audience is usually pretty small, and the effort is mostly not worth it (I speak from my experience, and I can say that explaining things in detail are not often appreciated, while the reactions by some are even getting hostile), especially because most times follow-up question are coming up and answering those is costing a lot of time. In an ideal world we would all openly discuss different approaches to the issue at hand, but we are not living in such an environment.

Well, I meant the vanilla RAPM. The first RAPM with no adjustments of any kind. I think you knew what I meant but still...

If people are afraid to go into detail because it costs them time and money, the audience will always be small and we won't see much improvement in short time. Even here I see posts that criticize RPM or xRPAM quite regularly and that speaks for the future of the metric.

Posted: **Tue Nov 11, 2014 5:07 am**

You surely realize that plays with a block or steal are less than 10% of all plays.

yes, but we are not talking about all plays here are we? we are talking about plays involving anthony davis - and the fact is he was involved in alot of plays with steals and blocked shots compared to the league average PF (or PF/C). in 2013-14 he got per minute 26% more steals than did the league average PF, and 3 times as many blocked shots. that's alot of defensive stops attributable to just one player, and doesn't even include the defensive stops he induced without a steal or block...

Any metric/methodology (RPM included) will have anomalies, and Anthony Davis may be one who is underrated.

i can understand a player playing few minutes having a spurious very good or very bad rating, but what does it say about the efficacy of a rating system giving anomalies, if that is the case here, for a player who lead an nba team in:

- minutes played, and...
- points, and...
- rebounds, and...
- steals, and...
- blocked shots...

Of course, the opposite may also be true, and he's not as good as our eyes/lesser metrics would suggest.

lesser metrics? what lesser metrics? he just put up a single season where his combination of scoring, shooting, rebounding, and shot blocking has been attained by just 8 other players, all HOFs, and he did this at the age of just 20-21...

Regardless, picking out the anomalies to try and discredit the methodology (which has proven to be much better than any other publicly available methodology) is silly.

why is it that everytime someone questions a plus/minus methodology rating a proponent says you are trying to discredit the methodology itself? i simply asked why was his rating so low considering his excellent combination of box score stats...

Also, Synergy's defensive numbers are not nearly reliable enough to be used as some sort of "check" for RPM/RAPM.

the Synergy data is charted from games, and data is what it is - data. RPM/RAPM is not data, it is a calculation on data...

I'm surprised this has gone on so long without stating some obvious things.

let's see how obvious they are...

Box score metrics are terrible at defense. You cannot judge a defender based on blocks/steals/rebounds. That's silly.

last season in 2013-14 portland allowed on defense a 48.8% eFG%, minnesota allowed a 51.8% eFG% - that's 3% higher/worse. both teams rebounded similarly (27.4% vs 28.0% off reb %, 74.4% vs 74.7% def reb %). you know which team had the better overall defense? minnesota, 105.4 pts/100poss allowed to 106.3 pts/100poss allowed for portland. you know why? the t-wolves forced 4.4 more TO/100poss, of which 3.1 were steals and 1.1 were non-steal forced turnovers...

a steal and a blocked shot whose rebound is retrieved by the defense are defensive stops - if you don't include them in evaluating player or team defense you are being silly...

Pelicans were 27th in defensive rating last year. He was the lead big man defender on an awful defense.

in 1999-00 the atlanta hawks finished with a W-L record of 28-54, and had the league's 4th worst defense (106.6 pts/100poss allowed). dikembe mutombo lead them in shot blocking and rebounding, and he wasn't even close to being the offensive player that year that anthony davis was in 13-14. he received the 3rd most votes for DPOY that season - wonder what his RPM was for that season...

Their defense wasn't appreciably different when he was off the court. That is a distressing sign for a "star" big man defender on a team without depth.

davis was on the floor for close to 3/4 of the team's time, and thus the time he was not on the floor is a sample size of data that is 3 times smaller, and thus much more apt to be skewed...

i seem to remember a while back someone touting adjusted plus/minus trying to claim chris paul was a poor defender one year (07-08) and a good defender the next (08-09) based on what happened the few minutes (1/3 the minutes that he played) he was not on the floor - yet the two seasons he was on the floor his team's defense was pretty much the same and paul was named to the all-defensive team both years...

His offense, while surprisingly good, is nothing earth-shattering either. He's no Dirk here.

if you consider lebron james and carmelo anthony SFs (both actually played alot of PF last season), that leaves five PFs that in 13-14 scored 20+ pts/g - kevin love, blake griffin, lamarcus aldridge, dirk nowitzki, and anthony davis. of the five, nowitzki was the most efficient on offense, but among those five davis was next best in offensive efficiency. so he was closer to nowitzki in scoring and offensive efficiency last season than were love, griffin, or aldridge...

Few possessions end with a blocked shot, and even then it's likely the other team will still have the ball anyway. You need to judge a defender on how he guards every shot, not just the blocks.

a generalization on which i agree, but we are being specific here concerning anthony davis...

here's a list of the players that since 1977-78 had a single season of 2000+ minutes played, with 4.0%+ BS, 3.0+ ST/100min, and 13.0+ reb/48min (anthony davis last year was at 4.7% BS, 3.8 ST/100min, and 13.7 reb/48min):

kareem abdul-jabbar
benoit benjamin
marcus camby
anthony davis
patrick ewing
dwight howard
ervin johnson
george t. johnson
hakeem olajuwon
robert parish
david robinson
ben wallace

care to pick out the poor overall defenders among this group?...

Posted: **Tue Nov 11, 2014 8:09 am**

permaximum wrote: Yes, I "guess" priors because it's not stated anywhere what they are and how they are calculated. No, previous seasons' RPM data is no prior. That data should be used after the computation. That's what I think the keeper of the metric is doing with RPM anyways.

JE described how he calculates the prior here:

viewtopic.php?f=2&t=8025&start=75#p14694

And unless something significantly changed (and from a look at the numbers, it does not seem that way), RPM is just the coefficients coming out of the regression when using the respective prior. No further inclusion of previous season data happens.

permaximum wrote: That's an artificial smoothness thanks to priors.

And something necessary given the issues provided by the raw data.

permaximum wrote: Kevin Garnett was the one I chose because his RAPM value is the best in 14-year RAPM and he was already in his prime at 2001 with 6 seasons into his career. Just like the pattern with 03 class rookies, he shows the exact same pattern for the next 4 years.

I bet you can find other players happened to play in 2001 with a similar pattern and others with a different pattern ... also, it may be better to look at normalized values instead. Overall I get what you wanted to show, but it really makes not much sense in the first place. Understand what regression means and how the coefficients have to be interpreted as whole set.

permaximum wrote: In theory, yes. In practice, no.

Given that the same math is used in "theory" and "practice" I'm incredible sure that the math in "practice" works the same ...

permaximum wrote:If you could show me some results with the detailed steps to reproduce it, we would come to an agreement.

It makes much more sense to read (or listen to) a math lecture on ridge regression. Just saying ...

permaximum wrote: Well, I meant the vanilla RAPM. The first RAPM with no adjustments of any kind. I think you knew what I meant but still...

I probably "knew" what you meant, but preparing the ingoing raw data differently does not change the mathematical algorithm to solve the problem. And given the fact that prior or no-prior is related to the algorithm used, we can safely say that "packing a bunch of low possession players together" will not change that underlying algorithm, therefore the results should be either called NPI or PI.

You need to understand why such thing was done in the first place, maybe you can start to understand that your criticism is not as useful as you may believe it is. I understand your desire to get those information, and I would wholeheartedly agree, that in general the information regarding the methods is somewhat lacking to non-existing, but especially in case of JE, who took a great deal of time to explain and discuss how he arrived at his xRAPM (and subsequently RPM), I think the criticism is misplaced.

Posted: **Tue Nov 11, 2014 11:37 am**

bchaikin wrote: that leaves five PFs that in 13-14 scored 20+ pts/g - kevin love, blake griffin, lamarcus aldridge, dirk nowitzki, and anthony davis. of the five, nowitzki was the most efficient on offense, but among those five davis was next best in offensive efficiency. so he was closer to nowitzki in scoring and offensive efficiency last season than were love, griffin, or aldridge...

How is one closer in scoring and offensive efficiency? These aren't on the same scale.

Code: Select all

PF'14    Pts/G    TS%    TO%   Ast%  ORtg  DRtg
Love      26.1   .591   10.3   21.4   120   104
Griffin   24.1   .583   11.9   19.2   114   103
Nowitzki  21.7   .603    7.5   14.2   120   108
Davis     20.8   .582    8.3    8.2   119   104
Aldridge  23.2   .507    7.2   13.0   108   104

http://bkref.com/tiny/lPVLQ
Davis holds his own in both ORtg and DRtg. If you are including TO in your offensive efficiency, are you also including Assists?

(anthony davis last year was at 4.7% BS, 3.8 ST/100min, and 13.7 reb/48min):
.

Do you think the fact that Davis had 86% more blocks at home (vs away) might mean his listed block% is actually about 43% higher than his real blocks ratio?

Posted: **Tue Nov 11, 2014 5:14 pm**

How is one closer in scoring and offensive efficiency? These aren't on the same scale.

pts/g pts/40 pts/0ptposs player
-21.7--26.4------2.90-----nowitzki
-20.8--23.6------2.65-----davis
-26.1--28.7------2.42-----love
-24.1--27.0------2.31-----griffin
-23.2--25.7------2.24-----aldridge

If you are including TO in your offensive efficiency

yep...

are you also including Assists?

nope - why would i? pts/0ptposs measures a player's own scoring efficiency...

Do you think the fact that Davis had 86% more blocks at home (vs away) might mean his listed block% is actually about 43% higher than his real blocks ratio?

in one breath you are asking me if i include assists in measuring offensive efficiency, and the next are telling me some blocks aren't real? there are far more "phantom" assists awarded every year than there are phantom blocks...

as far as i know blocks are never awarded to a player who actually blocks the shot but it happens to go in. so if a player blocks a shot, or prevents it from going in with a near block (harvey pollack's intimidations), what's the difference? call it a blocked shot, a shot defended, a shot prevented, it' all semantics...

i seem to remember you bringing this up about ben wallace, too. but wallace has four DPOY awards, and was all-defensive 1st team 5 times, so what's the difference how the forced missed is registered in a ledger? call it whatever you like...

oh, and i wonder how RPM will rate anthony davis this year - he is having a season of epic proportions, but the team on/off for him is little different...

Posted: **Tue Nov 11, 2014 5:58 pm**

Davis currently has a plus 30 team plus minus on / off but his counterpart defense is still basically the same as last year. The individual offense is up 4-5 pts. Usage up just 1 percentage point. RPM probably going to be up a lot. Using Neil's APM estimator method and guesses on impact of new prior and ridge regression impact, I 'll guess that the early RPM estimate will be around plus 3.5 to 4.

APBRmetrics

Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM

Re: Shots at RPM