permaximum wrote:
One thing first, I pretty much get the same RMSE for all in-sample seasons.
Well, I'm wondering whether you can post such numbers say from 2001 to 2014, just to get a grasp on what you mean with "pretty much the same". Last time we talked about something like that, you proclaimed a SD of 5.7 to be not significant different from a SD of 2.4.
I just checked the SD of the RMSE for a metric with (A) and without (B) in-sample information (for that 2001 to 2014 intervall, where I use 2000 information to predict 2001 performances until 2013 data for the 2014 results), and A has 0.4 while B has 0.4, in fact they have a R² of 0.79 on the team level while being at 0.94 at the overall league-level RMSE (season-to-season). When I compare the predictive power via RMSE for metric A in y1 for 2001 to 2014 vs. 1987 to 2000, where the latter intervall would be out-of-sample, the RMSE would just differ by 0.06 (worse for the out-of-sample intervall) while the SD becomes 0.41. Looking at such numbers, I can't say with great certainty whether A has in-sample for a specific intervall or not. Though, when I compare the 2nd year performances, B becomes slightly better than A, which gets substantially in the 3rd year. If you don't have a good metric without in-sample data and therefore need that in-sample data to become relevant, your test can be enough, but if that metric isn't that bad in predicting y1 (so, better than PER or WP48 for that matter) without in-sample, the boost by the in-sample data wouldn't be noticed "easily". To put a number on it: The best 7 years in terms of RMSE for the metric A in the 1987 to 2000 sample, present in average a RMSE better by 0.45 than the worst 7 years for the 2001 to 2014 sample (mind you, 2001 to 2014 would be in-sample). Overall, the boost from the in-sample information might just be enough to "win" such a contest for y1, even doing very well in y2, but would get exposed by y3 and y4.
The test was based on adjusted per 100 possession scoring differences.
From my perspective, going by your comments so far, you either haven't checked that on a bigger sample at all, and thus pulling some stuff out of thin air here, or you actually not just used in-sample information to derive one set of coefficients, but have different coefficients for each season depending on the fit, or you have a weird understanding of what "pretty much the same" means.
permaximum wrote:
Even at 15 games into to 2014-15 you can differentiate better metric from worse pretty easily via RMSE check.
Pretty bold statement, given that I know for sure that a better metric in average may even show a worse RMSE in a particular season than a metric being in average worse. Though, talking about metrics here which are a bit closer in terms of overall predictive power (say 2.4 vs. 2.7 RMSE).
permaximum wrote:
Back to the real thing, well, we finally found out where the problem lies with this in-sample, out-of-sample thing. Miscommunication. I always assumed Neil would use Season-1 to predict the Season-0. If he'll go back to the point of S-4, that's an another story. Besides, I don't think going that back is a good thing. He will have to use too many average, replacement or actual values for missing players which is going to skew the test imo.
Ok, but how would 2014 be out-of-sample in such case, if you "believed" it would be used to "predict" 2013 results? Given your strong comment before about that specific season, you may understand that something seems to be very off with your "explanation". But let us work with the assumption here that you indeed just misunderstood the design of the test, can you agree that from an outside perspective it looked like that you don't have a clue what in-sample and out-of-sample actually means?
permaximum wrote:
And for your claim about "logical fallacies"... Prove it or.. you know. Come on pal, do something besides talking. All I see is failure from you... Failure in prediction contests, failure in calculating a simple ridge regression via cross validation, failure in actually showing a real work here.
What I sometimes wonder is what people like you are getting out of such crap. What do you want to achieve with such comments? Should I feel bad, because I made mistakes or should I feel deeply offended? Life is a myriad of mistakes, handling such in a honest way and with the will to improve upon yourself by learning from mistakes seems to me the better approach than feeling bad about yourself or getting overly emotional by comments made by random people on the internet. Or do you have the feeling that writing things like that gives you bonus points by others, who may have shown dislike towards me in the past?
Btw, I suggest using a blend build by this equation:
- build z-score-like values for WS/48 and BPM, using this:
WS-z = (WS/48-0.1)/0.075
BPM-z = BPM/3
then plug that into this:
player_rating = 0.7*WS-z + 1.2*BPM-z + 0.1*mpg - 2.5
Pretty simply solution for a blend I found. I also tested it against "your metric" and found that this would cover about 96.5% of the variance (unweighted, while about 99% minute-weighted) when using your presented numbers in this thread for 2015 at two different stages. That blend can also be adjusted by a team-level adjustment as well as normalized to a desired SD afterwards.
Also, one other thing: The likely better predictive results of "your metric" in comparison to BPM are clearly controlled by a regression to the mean, where you have some sort of individual "shrinkage" based on the other stuff included. If you bring the BPM SD controlled by using a WS-like approach down to the SD of your metric, the difference in "predictive power" will get closer.