Well, well, well...

Post by **Kevin Pelton** » Thu Apr 10, 2014 12:11 am

knarsu3 wrote:Just something along the lines of lower bounds for each player (since thats what most people care about). So we can say 95% confidence that x players are above +4 (which would be their lower bound), +2, or whatever splits work.

Agree that lower bound is more often important, but with someone like Anthony Davis we'd want to know the upper bound on what RPM considers his possible value, right?

Debating the actual replacement level used by RPM is reasonable, but I don't think we need to fight the battle over the concept behind it again here.

knarsu3 · Post by **knarsu3** » Thu Apr 10, 2014 1:18 am

Kevin Pelton wrote:
knarsu3 wrote:Just something along the lines of lower bounds for each player (since thats what most people care about). So we can say 95% confidence that x players are above +4 (which would be their lower bound), +2, or whatever splits work.
Agree that lower bound is more often important, but with someone like Anthony Davis we'd want to know the upper bound on what RPM considers his possible value, right?

Debating the actual replacement level used by RPM is reasonable, but I don't think we need to fight the battle over the concept behind it again here.

Right, for younger players like Anthony Davis we probably would want to know the upper bound as well. I'd definitely be in favor of ESPN reporting both bounds but if it's too much information for fans to handle, I think at least reporting the lower bound would add a lot of information while still making it not too hard for fans to interpret.

Crow · Post by **Crow** » Thu Apr 10, 2014 3:26 am

On RPM Irving with by far the worst defensive impact estimate among top 40 PGs. Redick the worst for SGs.

knarsu3 · Post by **knarsu3** » Thu Apr 10, 2014 4:48 am

I was doing a bit of digging in regards to explaining the high defensive RPM for Taj Gibson (I had looked at this a bit awhile ago). There's what I found with his ability to contest shots near the basket (info is here: http://blog.cacvantage.com/2013/12/the- ... -shot.html) but also he's great defending the three: http://public.tableausoftware.com/profi ... ree/Sheet2 which includes a very high 74.6% contest rate on threes. So basically, he's excellent defending threes and he's very good protecting the basket which should pretty much explain his defensive RPM. Some of the other Vantage stats pretty much entirely explain why his defensive RPM is so high (great screen defender, help defender and double teamer). Putting together a table of Vantage metrics that looks at some of these "unexpected" (though not to APBR) high RPMers to see why.

colts18 · Post by **colts18** » Thu Apr 10, 2014 12:35 pm

In xRAPM, the prior is box score based stats. So if I understand it correctly, 2014 xRAPM uses 2013 box score stats as a prior?

DSMok1 · Post by **DSMok1** » Thu Apr 10, 2014 12:45 pm

colts18 wrote:In xRAPM, the prior is box score based stats. So if I understand it correctly, 2014 xRAPM uses 2013 box score stats as a prior?

xRAPM uses multiyear data, weighted differently--X*2012,Y*2013,Z*2014. I believe J.E. uses a similar schema for both the box score data and the APM lineup data, with the different weights for each, chosen to maximize out-of-sample accuracy within 2014. I believe he uses 2014 box score data for 2014, and may use a lower weight of 2013 obx score data as well.

knarsu3 · Post by **knarsu3** » Thu Apr 10, 2014 6:25 pm

Crow wrote:There are several ways one could go. But given what I assume about the estimated errors based on past reporting by JE, I might cite players with at least a 60-70% (or 67%) of being over a certain value (or under a negative one). The 95% confidence level seems too demanding for this circumstance. It would keep the level cited (exceeded or below) closer to the best estimate than using 95% confidence interval would. The goal IMO is understand the most likely level, with an understanding that one level higher or lower are the next but much lesser likelihoods.

I was thinking about this and I'd kind of like to see both. I think 95% lower bound would make sense because it's basically standard to report that. But a 60-70% bound would also be good for the reasons you stated. Would fans be able to interpret that? I would think anyone who has taken a basic probability class would understand. But the way of thinking in probabilities/estimates as opposed to exact numbers is much more difficult.

xkonk · Post by **xkonk** » Thu Apr 10, 2014 10:01 pm

knarsu3 wrote: Would fans be able to interpret that? I would think anyone who has taken a basic probability class would understand. But the way of thinking in probabilities/estimates as opposed to exact numbers is much more difficult.

As someone currently teaching a statistics class, I don't think they would interpret it correctly. And I think the chance of anyone understanding decreases dramatically if they aren't currently in the class.

jbrocato23 · Post by **jbrocato23** » Thu Apr 10, 2014 11:21 pm

DSMok1 wrote:
colts18 wrote:In xRAPM, the prior is box score based stats. So if I understand it correctly, 2014 xRAPM uses 2013 box score stats as a prior?
xRAPM uses multiyear data, weighted differently--X*2012,Y*2013,Z*2014. I believe J.E. uses a similar schema for both the box score data and the APM lineup data, with the different weights for each, chosen to maximize out-of-sample accuracy within 2014. I believe he uses 2014 box score data for 2014, and may use a lower weight of 2013 obx score data as well.

Are you sure? I was under the impression it's calculated along these lines:

2014 xRAPM = 2014 RAPM informed by 2013 xRAPM w aging curve then regressed toward the mean * 0.65 + 2014 JE SPM * 0.35

where the weights (I represented them as 0.35 & 0.65 but they could vary based on findings) are found via cross validation

Crow · Post by **Crow** » Fri Apr 11, 2014 4:56 am

I can understand not pursuing a complicated presentation of the data on an on-going basis but I think there is still a place for further introductory explanation. And some rationale for turning the data from apparent two decimal place accuracy to something less precise, whether it be the still way exaggerated apparent accuracy of one decimal place data or something even less exaggerated like accuracy to nearest 0.5 or .0 or within nearest 1 or 2 pts.

mystic · Post by **mystic** » Fri Apr 11, 2014 10:56 am

jbrocato23 wrote:
DSMok1 wrote:
colts18 wrote:In xRAPM, the prior is box score based stats. So if I understand it correctly, 2014 xRAPM uses 2013 box score stats as a prior?
xRAPM uses multiyear data, weighted differently--X*2012,Y*2013,Z*2014. I believe J.E. uses a similar schema for both the box score data and the APM lineup data, with the different weights for each, chosen to maximize out-of-sample accuracy within 2014. I believe he uses 2014 box score data for 2014, and may use a lower weight of 2013 obx score data as well.
Are you sure? I was under the impression it's calculated along these lines:

2014 xRAPM = 2014 RAPM informed by 2013 xRAPM w aging curve then regressed toward the mean * 0.65 + 2014 JE SPM * 0.35

where the weights (I represented them as 0.35 & 0.65 but they could vary based on findings) are found via cross validation

Yeah, that's how J.E. described the building of the prior for xRAPM in another thread not that long ago. I didn't read that he changed it, even though he made posts about experimenting with different kind of boxscore metric approaches (only scoring, PER, ORtg, etc.). Would be nice to get clarification on this.

I also found this: http://www.hickory-high.com/is-espns-re ... -for-real/

I'm a bit puzzled here (to say the least, and I commented on this on RealGM already, explaining that at least a part of that is simply wrong or can't be concluded based on the data).

1. IPV is using a different prior and a different regression algorithm, if I interpret the short descriptions on the blog as well as J.E.'s comments correctly. Saying that it is "very, very similar" seems obviously wrong to me, given rather obvious differences in the choosen prior not just based on some prior-season data, but also in terms of the boxscore-based approach.
2. Given the fact that James is leading by a good margin in J.E.'s SPM rating, which has an effect on the prior as well, I can't even come close to imagine, how someone could claim that James is only on top of RPM, because of the previous season xRAPM. Just the mere fact that Cousins is so high in IPV while being way closer to the league average in RPM, tells me that the choose boxscore-prior has a greatly different influence on the outcome. I have James as #1 in my merged rating while only using regression-based data from this season (while my SPM is what puts James on top, his no-prior informed RAPM is in my dataset and with my used regression at +3.5 this season, while Durant would be at +4.5, without further normalization).
3. As I understand the height is just included as independent variable in the regression-approach for the boxscore-metric used as prior, and at that for offense and defense. The author of that article seems to imply that there would be an additional adjustment regarding height included only on defense. Is my understanding of that procedure wrong?
4. The author implies that the SPM J.E. uses would be basically the same as Daniels' ASPM. If RPM is really just based on xRAPM, and J.E. description of the used boxscore-prior is correct, than this is obviously false. As I understand it, J.E. used prior season boxscore data in a regression on the season lineup results, in order to find the best coefficients for a prediction. That is something really, really different from Daniel's SPM-approach, where the longterm RAPM data and boxscore data was used, to get the best approximation of the RAPM data. But the author implies that this would be used as prior for RPM.

I think the article describes an issue here with the general presentation of the data, the lack of a specific description as well as a lack of understanding by the people trying to grasp what was done in the first place. But I'm also not sure that a better description would actually really help here overall, because in my discussions about that topic I constantly run into people, who seem to believe that the "adjustment" would be somehow happen somewhat arbitrarily and not within the constrains of the used regression analysis. In fact, there are people out there trying to critize the used method without even knowing what a regression is, let alone understanding the underlying theory of ridge regression and the advantages it provides for the solutions of ill-posed problems like we are dealing with. Well, I find that to be a bit frustrating, because I somehow imagined the knowledge about regression analysis would be a bit more common ...

Jinxed · Post by **Jinxed** » Fri Apr 11, 2014 11:35 pm

Great great work guys.

How often will this be updated? Continuously? It's been out a week almost with no update on the stats.

RoyceWebb · Post by **RoyceWebb** » Fri Apr 11, 2014 11:59 pm

Jinxed wrote:Great great work guys.

How often will this be updated? Continuously? It's been out a week almost with no update on the stats.

The plan is to update the numbers at least once a week throughout the remainder of the season and the postseason.

That was the best, most practical solution to ensure that we could get the numbers onto the site during the regular season (and the postseason).

The ideal solution for next season would be to have all of this automated for nightly updates.

RW

APBRmetrics

Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...

Re: Well, well, well...