Poll: RPM's degree of efficacy in sorting players

schtevie · Post by **schtevie** » Thu May 04, 2017 1:04 am

I don't know if I am the only one who has had difficulty digesting the facts and figures presented by permaximum, but having read and reread the latter portions of the thread, how can we summarize his data, with regard to the topic at hand "RPM's degree of efficacy in sorting players"?

As best I can understand, permaximum's data state that RPM dominates all comers, up until roster turnover (as defined) reaches something over 60% (61.45% is the point of crossover?) at which point MPG begins to achieve superiority. But what is not explicitly stated is that RPM's slips into second place (amongst these tested metrics) for what is less than 4% of the sample (100/2627) - 4%!

And this zone of supposed metric "failure" is confirmed by the correlated metric, BPM, for which the sample is much larger, and for which its inferiority to MPG occurs, again, approximately at "60+" roster turnover what represents an identical 4% of the games in question.

Given this, if one wants to make the argument that RPM isn't fit for purpose, one has to plead that the 4% of games of, by definition, way above-average roster turnover, where no self-respecting coach wishes to find herself, is where players prove their "true" value.

Pull another one.

The value, really the essence, of the plus-minus approach is it has the capacity to measure the contribution of players in organized basketball, where teamwork, chemistry, repetition from practice, and all that come to the fore, and players establish their value in a team context.

Big surprise that such a measure would begin to suffer when the context that undergirds it is eroded.

And a couple other comments about the graphical presentation of the data, one about the X axis and one about the Y.

Regarding the X axis, I must admit that I am confused how Full, half, and quarter relate to the entries to the right which explicitly identify the % of roster turnover. Full is the full sample, what must indicate average roster turnover, no? And what are half and quarter? Please explain. What we want to see is the left most data point showing the smallest observed roster turnover, then moving linearly to the right with increasing roster turnover. Is that what is presented?

As for the Y axis, what is the context, or said another way, what is the relevant range? It is implicitly suggested that the Full sample RPM prediction accuracy of just over 0.67 is unsatisfactory. But is it? Perhaps I am thinking about this the wrong way, but isn't something a bit above .70 the theoretical maximum for prediction?

This is to say that basketball is a game determined effectively by unfair coin flips where randomness mandates a whole lot of "unexpected" outcomes over the course of a season. For example, the GSW, the best team in the league, doesn't win close to all their games, though "expected" to by underlying team strength, and the average team realizes a far larger fraction of "unexpected" losses. Sum these all up, and I believe you have an upper bound not far above what RPM actually predicts.

permaximum · Post by **permaximum** » Thu May 04, 2017 1:25 am

The sample is 38658 games. The complete 3pt era including playoffs except the last season. So, the sample is huge actually.

Half means, half of the data with the most roster turnover. 19443 games( there are ties)

Quarter means, quarter of the data with the most roster turnover. 9741 games(there are ties)

The sample doesn't get problematic and too small until somewhere between 65% and 70% of roster turnover or around 500 games.

Your chart guess is correct.

Considering HCA=0.615 regardless of roster turnover rates, anything below 0.7 should be unsatifactory at high roster turnover rates. No metric even gets close. They don't get close even at average roster turnover rate.

It's simply about the euqation. Retrodiction test at high roster turnover rates is the only tool to isolate player skill from team-related effects.

I'm not sure those losses are really unexpected so I won't comment on the upper bound. It should definetely be more than 0.7.

Nathan · Post by **Nathan** » Thu May 04, 2017 1:44 am

You could estimate the upper bound by looking at how well vegas does predicting the winner of games (e.g. http://www.oddsportal.com/basketball/usa/nba/results/).

As for errors, mathematically you should expect sqrt(p*(1-p)/n) error in the values you calculate. This means, for instance, a +/-5% error if you find 60% prediction success on a sample of 100 games, a +/- 2.2% error if the sample is 500 games, a 0.5% error if the sample is 10000 games.

permaximum · Post by **permaximum** » Thu May 04, 2017 2:01 am

Nathan wrote:You could estimate the upper bound by looking at how well vegas does predicting the winner of games (e.g. http://www.oddsportal.com/basketball/usa/nba/results/).

As for errors, mathematically you should expect sqrt(p*(1-p)/n) error in the values you calculate. This means, for instance, a +/-5% error if you find 60% prediction success on a sample of 100 games, a +/- 2.2% error if the sample is 500 games, a 0.5% error if the sample is 10000 games.

Why Vegas? We know Vegas is not that good at it. It's even worse than the metrics in question.

schtevie · Post by **schtevie** » Thu May 04, 2017 2:04 am

So, if I understand your most recent remarks, the X-axis, as shown in the charts presented does not have a coherent definition/representation. (And I must admit that I don't understand how ties figure into things, given that no NBA game has ever so ended.)

Since you've gone to all this trouble to make the argument, and as you have the underlying data, how about clarifying the relationship between roster turnover and predicted accuracy and present a graph with the X-axis showing the average results for decreasing deciles of roster turnover, i.e. the leftmost indicating 0-10% RT and so on, with sample size in each in parentheses, and each successive data point showing the information for one decile higher?

Regarding the upper-bound for predicted accuracy (on the Y-axis) this ought to be thought through before bold claims are made as to the shortcomings of the best existing predictive metric and as for the potential future improvements.

Old Friend DeanO, way back in the day had a widget on his website (see http://www.rawbw.com/~deano/articles/aa030597.htm) where you could have fun calculating win probabilities for a coin flip model of two teams based on pace and relative offensive efficiencies. Now, no home court advantage factor is included in this simple model, but what you find, by my calculation, is if you calibrate for 2017 averages, that an average offensive team this year (who is Charlotte) playing against the league would be expected to achieve "unexpected" results about 34% of the time (both unexpected wins and unexpected losses). Of course, on one extreme, GSW would achieve a far smaller share of such "surprises" (I think that was 15%) but the overall, league average would include a disproportionate number of near-Charlottes (given the distribution of league strength, what is pretty much the standard in all years).

So, as to what the "true" upper-bound should be, taking into account the weighted average of all teams and HCA and whatever else, it is not obvious to me that it is that far north of 0.70. But this is something that one has to explicitly consider when one plays this game.

sndesai1 · Post by **sndesai1** » Thu May 04, 2017 2:23 am

permaximum wrote:
Nathan wrote:You could estimate the upper bound by looking at how well vegas does predicting the winner of games (e.g. http://www.oddsportal.com/basketball/usa/nba/results/).

As for errors, mathematically you should expect sqrt(p*(1-p)/n) error in the values you calculate. This means, for instance, a +/-5% error if you find 60% prediction success on a sample of 100 games, a +/- 2.2% error if the sample is 500 games, a 0.5% error if the sample is 10000 games.
Why Vegas? We know Vegas is not that good at it. It's even worse than the metrics in question.

just want to point out that while the betting market is definitely worse for season win totals due to low limits and illiquidity, it's unlikely any of these metrics are "better" for individual game prediction

permaximum · Post by **permaximum** » Thu May 04, 2017 2:29 am

Ties are Roster Turnover Rate ties. Some games have the exact percentage of roster turnover.

Pardon my English but I don't get what you mean by coherent definition/representation. I thought charts were pretty simple. It was the change of prediction accuracy of metrics with increasing roster turnover rates.

That widget is too archaic.

Then let's just use HCA to predict the outcome of games if these prediction rates are good enough. Now where's the place for advanced metrics and all the trouble?

Anyways, I thought roster turnover rate approach was universally accepted way for the evaluation metrics until this thread. I'm pretty surprised actually.

schtevie · Post by **schtevie** » Thu May 04, 2017 3:03 am

Permaximum, it is your scheme to establish a relationship between "roster turnover" and "prediction accuracy". You defined the variables, so do it clean. For each metric, there are X number of games in your sample where roster turnover is between 0 and 10%, Y number of games where roster turnover is between 10 and 20%, Z between 20 and 30%, and so on up to 90 to 100%. This is how the X-axis of your graph should be defined (and the sample size of each roster turnover decile ought be shown in parentheses.)

As for the widget being too archaic, I do believe you are choosing to kid yourself. There is no reason to believe that this simple model doesn't provide estimates of "unexpected outcomes" that are very much in the ball park. HCA is but a second-order factor, as every team plays home and away, fundamentally canceling its influence, in terms of "unexpected" outcomes.

But never mind the upper bound, if having a number too close to actual RPM performance is too ungenial to your priors; that can be a discussion for another day.

Just present your data in graphical form, where a clear picture is available, showing the effect of roster turnover on predictive accuracy of game results. This is the basis of your arguments, and it ought be clearly presented.

mystic · Post by **mystic** » Thu May 04, 2017 9:08 am

permaximum wrote: 1. I will generate a "minutes from <250MP players / roster turnover chart" for you. I believe 250 MP cutoff should be more than sufficient and PM based metrics shouldn't have any problem with it because it stays roughly the same for any roster turnover rate so the line should be straight and your conclusion should be wrong. But I don't know yet. Maybe you're right. We'll see.

I calculated average values from all metrics myself instead of taking known average values for those metrics. I was considering going for the replacement level approach but I decided this was the right thing to do. Replacement approach hasn't been properly tested and accepted yet and it's not fair to use average values for some while using replacement values for others.

2. RPM represents everything stands for PI, multi-year (which is actually not fair) and more than them. That's why I used NPI-RAPM. I didn't use previous years' information for box-score metrics either. I bet it would increase their prediction power too.

3. Edit: Instead of making assumptions based on theories I'm spending time and testing them for no gain to prove something that's very obvious... at least for me. And then I share them. I'm not complaining but I wanted to reveal what I think.

4. Do you guys have problem with <250 MP players that have been assigned average values for all metrics and blame it for the poor performance of PM metrics for some unknown reason although it's fair and the same for all metrics or do you have problem with players that played between 250 and let's say 1000 minutes and have their metric values calculated?

5. I think you said 250+ minutes is not enough for PM metrics to reliably capture performances right? So you want the second thing calculated. How many minutes do you guys think these PM metrics need? 1000? 1250? 1500?

1. It doesn't matter whether you believe it shouldn't be a problem, but it is a matter of fact that 250 min is not a sufficient sample size for a PM based metric in order to get a rather reliable value for a single player. It should be rather obvious, when you check the error range. And that is not some wildly controversial stuff I write, but pretty much accepted among those working with PM based metrics. I most certainly wouldn't use a sample of just about 13 games for an average player (give or take 2 games) in order to make any kind of bold prediction, even if those games are from the very season I want to predict a future game from. Using such small sample from the previous season is just not useful at all.
It is not about the cutoff itself, it is about the sample size. For boxscore based metrics such sample is much more reliable and the results are more stable. That cutoff in itself is helping the boxscore metrics more, just like using y-2 data is helping PM based metrics more. Again, that is not some wild guessed stuff, that is pretty much established knowledge (at least I thought it would be).
And I'm talking about a linear trend; I most certainly hope that this is a straight line and we are discussing the slope here, not how the graph looks like. I know that the slope is positive; and I assume you found the same thing. That is pretty much guaranteed by the fact that part of the new minutes are taken by rookies.

2. I don't understand the argument at all. Why would you use deliberately a worse metric, when the test should be about out-of-sample and not that every metric is just limited by the fact that boxscore based metrics are usually just have data from one season included. It really makes no sense.

3. I don't make any assumptions here, but rather write about things I know a thing or two about. There was work done by other and I tested some of that myself. Overall your whole approach is flawed, because you are basically saying that the signal you are looking for can be best found in a very small subsample. Let alone that game-to-game testing has to deal with a bigger variance than year-to-year tests.

4. That wasn't the argument at all. The argument was that 250 min is not a big enough sample to get reliable PM based values. Raising the threshold will not change that, because especially RAPM with a big enough lambda will not move those player values far away from 0 anyway; thus replacing the close to the mean values with mean values will not make a big difference anyway. The argument is that a more reliable method like PI RAPM should be used; and nobody should be surprised that RAPM is not able to predict the outcome of games very well, when the amount of players with little to no information is higher.

5. No, that's not what I asked for, at all. I pointed out an issue with the method, which is not easily removable by using another threshold, because such threshold is not improving the sample size for the players with little to no playing time in y-1.

Well, you said that only 6 games had a 100% RT, and as far as I see it, I can name those 6 games without even checking the data. I assume that there are 3 seasons in which 2 such games happened. 1989, 1990 and 1996; teams involved: Miami Heat vs. Charlotte Hornets, Minnesota Timberwolves vs. Orlando Magic and Toronto Raptors vs. Vancouver Grizzlies. Those expansion teams will account for a big slice of the higher RT games. And that's what Nathan basically pointed out before. That is an issue you did not want to acknowledge.

There is another issue of the interpretation of the results in itself. When a metric says Team A is better than Team B, it will just give you a probability for a win by Team A. The metric didn't fail, because Team B won the game, because the metric in itself didn't say that Team A will win 100% of the time. Everybody using such metric will hopefully assume that variance exists.
I also wouldn't recommend using a all-in-one metric in order to look for player skills. They are bound to fail at a task where they are not suited for.

Last but not least: I do not have a horse in this race, thus your assumption I try to make "excuses" has no basis in reality.

permaximum · Post by **permaximum** » Thu May 04, 2017 12:54 pm

mystic is the same old mystic. I have seen him give researched info about something publicly 3 times.

1. Calculating a simple ridge regression.
2. Claiming weight factor has to be matrix for glmnet.
3. Standarize factor for glmnet should be false.

First two simple and "obvious" things were definetely wrong. The third was true and very helpful. I thanked for it.

But considering your history on these things pal I have a hard time accepting your facts on this matter. I have a feeling your're doing demagogy. You're speaking some truths but you don't care if they affect the purpose of this research at all or not. Especially that giving probability argument as a factor on these results must be a joke.

There are 1502 games with more than 60% of RT where MPG passes almost all other metrics but box-score ones. In that sample, only 25% of minutes came from <250MP players, rookies, injured players, inactives, penalized etc where they have been assigned average values. And in those games with +60% of roster turnover 2234 players took part aside from those that have been assigned average values. Which is practically everyone except 250MP players, rookies, injured players, inactives, penalized etc. in that time period (1984-2016).

That should be enough for all your arguments.

BTW, RPM is "a lot" better than any version of RAPM and it definetely represents PM metrics best. That's why all those versions of RAPMs were not included but the vanilla.

permaximum · Post by **permaximum** » Thu May 04, 2017 1:00 pm

schtevie wrote:Permaximum, it is your scheme to establish a relationship between "roster turnover" and "prediction accuracy". You defined the variables, so do it clean. For each metric, there are X number of games in your sample where roster turnover is between 0 and 10%, Y number of games where roster turnover is between 10 and 20%, Z between 20 and 30%, and so on up to 90 to 100%. This is how the X-axis of your graph should be defined (and the sample size of each roster turnover decile ought be shown in parentheses.)

As for the widget being too archaic, I do believe you are choosing to kid yourself. There is no reason to believe that this simple model doesn't provide estimates of "unexpected outcomes" that are very much in the ball park. HCA is but a second-order factor, as every team plays home and away, fundamentally canceling its influence, in terms of "unexpected" outcomes.

But never mind the upper bound, if having a number too close to actual RPM performance is too ungenial to your priors; that can be a discussion for another day.

Just present your data in graphical form, where a clear picture is available, showing the effect of roster turnover on predictive accuracy of game results. This is the basis of your arguments, and it ought be clearly presented.

What you're suggesting changes nothing. But I will do it for you. You want 10% of RT increments right?

permaximum · Post by **permaximum** » Thu May 04, 2017 1:19 pm

BTW I tested for Y-2 values before and Y-2 was even worse for PM metrics. Logically

mystic · Post by **mystic** » Thu May 04, 2017 1:25 pm

permaximum wrote:The third was true and very helpful. I thanked for it.

And that was info given by someone else before and I just repeated it ... I should have known where this was going ...

Nathan · Post by **Nathan** » Thu May 04, 2017 1:55 pm

permaximum wrote:
Nathan wrote:You could estimate the upper bound by looking at how well vegas does predicting the winner of games (e.g. http://www.oddsportal.com/basketball/usa/nba/results/).

As for errors, mathematically you should expect sqrt(p*(1-p)/n) error in the values you calculate. This means, for instance, a +/-5% error if you find 60% prediction success on a sample of 100 games, a +/- 2.2% error if the sample is 500 games, a 0.5% error if the sample is 10000 games.
Why Vegas? We know Vegas is not that good at it. It's even worse than the metrics in question.

We do? Has everyone else here been cashing in this whole time? Why did nobody tell me?

permaximum · Post by **permaximum** » Thu May 04, 2017 1:59 pm

Nathan wrote:
permaximum wrote:
Nathan wrote:You could estimate the upper bound by looking at how well vegas does predicting the winner of games (e.g. http://www.oddsportal.com/basketball/usa/nba/results/).

As for errors, mathematically you should expect sqrt(p*(1-p)/n) error in the values you calculate. This means, for instance, a +/-5% error if you find 60% prediction success on a sample of 100 games, a +/- 2.2% error if the sample is 500 games, a 0.5% error if the sample is 10000 games.
Why Vegas? We know Vegas is not that good at it. It's even worse than the metrics in question.
We do? Has everyone else here been cashing in this whole time? Why did nobody tell me?

I agree with you again but you missed my point. Improvement is marginal. Not enough to be cashing at least in my country. That's the very first reason I decided to chase analytics in the first place. Betting.

Still you can do better than Vegas whether it's marginal or not. I can but I need more improvements to make +$10k per year. Now, it doesn't worth the time and the risk. If I lived in USA, I would definely be cashing very small amounts of money. Better than nothing.

Actually I suspect, some people here probably gain small amounts of money from betting.

APBRmetrics

Poll: RPM's degree of efficacy in sorting players

RPM efficacy?

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players