jbrocato23 wrote:DSMok1 wrote:colts18 wrote:In xRAPM, the prior is box score based stats. So if I understand it correctly, 2014 xRAPM uses 2013 box score stats as a prior?
xRAPM uses multiyear data, weighted differently--X*2012,Y*2013,Z*2014. I believe J.E. uses a similar schema for both the box score data and the APM lineup data, with the different weights for each, chosen to maximize out-of-sample accuracy within 2014. I believe he uses 2014 box score data for 2014, and may use a lower weight of 2013 obx score data as well.
Are you sure? I was under the impression it's calculated along these lines:
2014 xRAPM = 2014 RAPM informed by 2013 xRAPM w aging curve then regressed toward the mean * 0.65 + 2014 JE SPM * 0.35
where the weights (I represented them as 0.35 & 0.65 but they could vary based on findings) are found via cross validation
Yeah, that's how J.E. described the building of the prior for xRAPM in another thread not that long ago. I didn't read that he changed it, even though he made posts about experimenting with different kind of boxscore metric approaches (only scoring, PER, ORtg, etc.). Would be nice to get clarification on this.
I also found this:
http://www.hickory-high.com/is-espns-re ... -for-real/
I'm a bit puzzled here (to say the least, and I commented on this on RealGM already, explaining that at least a part of that is simply wrong or can't be concluded based on the data).
1. IPV is using a different prior and a different regression algorithm, if I interpret the short descriptions on the blog as well as J.E.'s comments correctly. Saying that it is "very, very similar" seems obviously wrong to me, given rather obvious differences in the choosen prior not just based on some prior-season data, but also in terms of the boxscore-based approach.
2. Given the fact that James is leading by a good margin in J.E.'s SPM rating, which has an effect on the prior as well, I can't even come close to imagine, how someone could claim that James is only on top of RPM, because of the previous season xRAPM. Just the mere fact that Cousins is so high in IPV while being way closer to the league average in RPM, tells me that the choose boxscore-prior has a greatly different influence on the outcome. I have James as #1 in my merged rating while only using regression-based data from this season (while my SPM is what puts James on top, his no-prior informed RAPM is in my dataset and with my used regression at +3.5 this season, while Durant would be at +4.5, without further normalization).
3. As I understand the height is just included as independent variable in the regression-approach for the boxscore-metric used as prior, and at that for offense and defense. The author of that article seems to imply that there would be an additional adjustment regarding height included only on defense. Is my understanding of that procedure wrong?
4. The author implies that the SPM J.E. uses would be basically the same as Daniels' ASPM. If RPM is really just based on xRAPM, and J.E. description of the used boxscore-prior is correct, than this is obviously false. As I understand it, J.E. used prior season boxscore data in a regression on the season lineup results, in order to find the best coefficients for a prediction. That is something really, really different from Daniel's SPM-approach, where the longterm RAPM data and boxscore data was used, to get the best approximation of the RAPM data. But the author implies that this would be used as prior for RPM.
I think the article describes an issue here with the general presentation of the data, the lack of a specific description as well as a lack of understanding by the people trying to grasp what was done in the first place. But I'm also not sure that a better description would actually really help here overall, because in my discussions about that topic I constantly run into people, who seem to believe that the "adjustment" would be somehow happen somewhat arbitrarily and not within the constrains of the used regression analysis. In fact, there are people out there trying to critize the used method without even knowing what a regression is, let alone understanding the underlying theory of ridge regression and the advantages it provides for the solutions of ill-posed problems like we are dealing with. Well, I find that to be a bit frustrating, because I somehow imagined the knowledge about regression analysis would be a bit more common ...