APBRmetrics

Posted: **Thu May 04, 2017 11:05 pm**

The story doesn't reference any hard numbers in regards to how often lopsided action supposedly happens and how often or how far the lines are supposedly moving. The data from TeamRankings doesn't really seem to support the idea that there are line moves of a significant magnitude happening on a consistent basis for NBA spreads. Only 9.4% of games tracked there moved more than 2+ points from the opening line. Just from looking at betting percentages at SportsInsights on a somewhat regular basis, I'm fairly certain the number of games that receive lopsided action (at least 60% of bets on one team) is much higher than 10% of all games. So more often than not, the books aren't all that concerned with lopsided action and are willing to bet that the lopsided action (which is almost always on the favorite) is not going to consistently win at greater than a 52.4% rate long term. They're more than happy to allow the public to hammer away at favorites at a 60+% clip, while making slight adjustments to the line based on bets from the select sharp accounts whose opinions they respect to arrive at an efficient closing line.

A ridiculously simple analogy would be if you were taking -110 bets on the flip of a fair coin. If 60% of the people bet on heads every single time, you can simply leave the odds at -110 on both sides and know that over a sample of 1230 flips, there's an 91% chance you'll come out ahead, since you only lose if the number of heads is greater than 644 or less than 586 (each of which have ~4.5% chance of happening). You could increase the juice on heads to gamble and try to win more money if tails dominates, but you also run the risk of losing more if heads wins by enough. As the number of 1230 flip iterations (NBA seasons) increases the chances of coming out ahead in the long run only increase. The point being, it's in the best interest of the sportsbook to set the opening number to be as accurate of a predictor of the outcome as possible since this maximizes their chances of coming out ahead long term. They don't have to worry if 60% of the public always bets the favorite because as long as their lines are within the 47.6% to 52.4% window for how often the favorite covers the spread, they will come out ahead long term.

The 'throwing a number' out there comment was based on this comment:

Vegas lines do not (and are not intended to) "predict" outcomes of games. They are set in an attempt to anticipate how the public will place its bets.

The NylonCalc article illustrated that the closing lines in the NBA do predict the MOV extremely well in the long run. And because opening lines do not differ substantially from closing lines ~90% of the time, it stands to reason that the opening lines would also have a very high correlation with actual MOV and are reasonably good at predicting outcomes. The idea that the predominant factor in setting a line is not to predict the outcome, but to anticipate how the public will bet simply isn't true.

Posted: **Fri May 05, 2017 9:42 am**

permaximum wrote:It's because games with higher roster turnover rates are found more in 2000s and 2010s and almost every metric's prediction success gets better when we get closer to our time.

I just ran a ADF on that data and there is no such trend at all (talking about the bolded part). You just picked a sample with a low starting point (1984) and a higher end point (2016); remove those two entrances and your linear trend is gone. That does not happen, if there really is a solid recency trend.

The real reason was given by Nathan; in the higher RT games, there are more loopsided games involved for which the metrics have a higher success rate of picking the correct winner. That's why you get such odd results that metrics supposedly predict better with higher RT. You need to use the scoring margin in order to check PA. Also, you should check the subsamples (each increment) whether you have a normal distribution, because quite frankly it doesn't seem like it is the case for the extreme ends of the spectrum. For not normally distributed data the mean value is not a good estimation of the overall trend.

Edit: Removed the last comment, because it was completely out of line.

Posted: **Fri May 05, 2017 11:28 am**

mystic wrote:
permaximum wrote:It's because games with higher roster turnover rates are found more in 2000s and 2010s and almost every metric's prediction success gets better when we get closer to our time.
I just ran a ADF on that data and there is no such trend at all (talking about the bolded part). You just picked a sample with a low starting point (1984) and a higher end point (2016); remove those two entrances and your linear trend is gone. That does not happen, if there really is a solid recency trend.

The real reason was given by Nathan; in the higher RT games, there are more loopsided games involved for which the metrics have a higher success rate of picking the correct winner. That's why you get such odd results that metrics supposedly predict better with higher RT. You need to use the scoring margin in order to check PA. Also, you should check the subsamples (each increment) whether you have a normal distribution, because quite frankly it doesn't seem like it is the case for the extreme ends of the spectrum. For not normally distributed data the mean value is not a good estimation of the overall trend.

Edit: This comment was deleted.

After 65% of RT all metrics get a lot worse. So your statement is not true, again.

You shouldn't talk about ifs. If 1984 this, if 2016 that. Previous charts used all the data so the trendlines were created using all the data.

You call for score margin again. Alright. I'll do your wish just "this 1 time" to show how "wrong you are" again and prove score margin has no effect at all.

Posted: **Fri May 05, 2017 3:19 pm**

My thought, at least ,was that games with ~60% RT often feature one team with high (~90%) RT and another team with typical (~30%) RT. These are the games that would be easy to predict, because they feature a team in turmoil playing a typical team. Games with very high (~80%) RT necessarily involve two teams that are both in a state of flux. It seems logical that these games might be harder than average to predict.

As for the trend line, it should be straightforward to assess whether or not it is statistically significant. Is it? Is the variance observed in the data points consistent with p*(1-p)/n?

Posted: **Fri May 05, 2017 6:11 pm**

permaximum wrote:@schtevie

Beyond 70-80%, sample becomes too small for any comparison. RPM and BPM's on target prediction spread and success is extremely similar so you can take BPM for RPM comparisons. If you still insist I can make RPM chart with extremely small sample for such comparison which woulnt'be a valid approach but whatever. If I did this research after this year, 1-year of more RPM data would help but findings would be the same no matter what.

Thanks for this. And though I'm not insisting, if it isn't too much trouble, I (and I expect others) would be grateful for the same presentation for RPM. Thanks.

Posted: **Fri May 05, 2017 8:07 pm**

permaximum wrote:
mystic wrote:
permaximum wrote:It's because games with higher roster turnover rates are found more in 2000s and 2010s and almost every metric's prediction success gets better when we get closer to our time.
I just ran a ADF on that data and there is no such trend at all (talking about the bolded part). You just picked a sample with a low starting point (1984) and a higher end point (2016); remove those two entrances and your linear trend is gone. That does not happen, if there really is a solid recency trend.

The real reason was given by Nathan; in the higher RT games, there are more loopsided games involved for which the metrics have a higher success rate of picking the correct winner. That's why you get such odd results that metrics supposedly predict better with higher RT. You need to use the scoring margin in order to check PA. Also, you should check the subsamples (each increment) whether you have a normal distribution, because quite frankly it doesn't seem like it is the case for the extreme ends of the spectrum. For not normally distributed data the mean value is not a good estimation of the overall trend.

Edit: This comment was deleted.
After 65% of RT all metrics get a lot worse. So your statement is not true, again.

You shouldn't talk about ifs. If 1984 this, if 2016 that. Previous charts used all the data so the trendlines were created using all the data.

You call for score margin again. Alright. I'll do your wish just "this 1 time" to show how "wrong you are" again and prove score margin has no effect at all.

SCORE MARGIN RETRODICTION

RMSE, MAE, with HCA and without HCA are ready included with other metrics such as PER, WS, USG. Just in case you are not satisfied enough.

Posted: **Fri May 05, 2017 8:47 pm**

schtevie wrote:
Thanks for this. And though I'm not insisting, if it isn't too much trouble, I (and I expect others) would be grateful for the same presentation for RPM. Thanks.

In only 10% RT parts, sample is a big problem for any comparison. That's why I used 10+, 20+, 30+,40+,... instead of 10-20, 20-30.... in the first place. Between 1984-2016 what you requested still was doable but with only 3 years of data and 2 years of games to predict, doing this is statistically not valid at all. Still, since you asked, here it is. The only thing that's very obvious to me is incredibly similar prediction accuracy between BPM and RPM like I pointed out before.

Posted: **Fri May 05, 2017 11:07 pm**

It looks mystic didn't want adj-r^2 of Score Margin Prediction but wanted RMSE of Score Margin Prediction because with his own words;

"just like for betting you would rather look to beat the spread instead of predicting the correct winner."

I have to mention I haven't checked RMSE of Score Margin Prediction before and I was referring to the adjusted-R^2 I found by using Score Margin instead of Win/Lose about a year ago and it gave the same results. Anyways, RMSE of Score Margin Prediction once again confirmed my previous statements that simple box score metrics are better than other advanced metrics when it comes to player evaluation. But if mystic did mean beating the spread, he was also correct that metrics suffered with more roster turnover until the point of 65% of RT. So here it comes:

And this is bonus:

I guess I granted everyone's wishes in this thread but Crow's. He wanted blend results.

@Crow

Any particular blend you want to look at a given roster turnover rate or you want the best possible one at a certain roster turnover rate?

Posted: **Fri May 05, 2017 11:35 pm**

thanks for your work

the slope of these lines makes more sense to me (obviously outside of the super-low sample size 80%-90%-100% RT in the last graph)

Posted: **Sat May 06, 2017 12:12 am**

You're welcome.

BTW I dropped including PER, WS and Usage in the charts because USG is worse than everything at any RT. PER is worse than box-score metrics at any RT. WS is worse than BPM at almost any RT.I don't include RPM because sample size is too small for RT rate 60% and higher but like I said before you can check BPM to have an idea about RPM's prediction success.

For some reason if anyone wants RMSE charts of RPM, RAPM, PER, WS, Usage, I'm here.

Posted: **Sat May 06, 2017 6:08 pm**

To return the conversation to what might be considered an upper bound of predictive accuracy for whatever lagged metric on might choose, here's what I think is another representative estimate.

I went through this past season's results, using contemporaneous SRS ratings (as taken from basketball-reference) to determine individual game favorites (and applying permaximum's suggested HCA of 0.63 where relevant). What this exercise shows is that 36% of game outcomes were unpredicted (comprising, of course, unpredicted wins as will as losses) ranging from a high of 51% of Bulls games to 18% of Warriors games.

Now perhaps someone might choose to argue that a formula of lagged variables can out-predict contemporary season summary variables, but frankly that would be an uphill struggle. And of course, this 64% value seen in 2016-17 isn't set in stone for all seasons (the figure crucially depends on the distribution of team strength within the league) and another metric (such as OFFRTG-DEFRTG) would offer different results, but only slightly.

So, stipulating this, where are we? The historic PA of BPM of 0.645 looks pretty darn good, and RPM seeming to be an improvement is better still, seemingly leaving no room for improvement.

But predicting win-loss records is a bit of topic drift. Maybe now we can begin returning to the issue of the value of RPM in sorting players.

Posted: **Sat May 06, 2017 6:33 pm**

schtevie wrote:To return the conversation to what might be considered an upper bound of predictive accuracy for whatever lagged metric on might choose, here's what I think is another representative estimate.

I went through this past season's results, using contemporaneous SRS ratings (as taken from basketball-reference) to determine individual game favorites (and applying permaximum's suggested HCA of 0.63 where relevant). What this exercise shows is that 36% of game outcomes were unpredicted (comprising, of course, unpredicted wins as will as losses) ranging from a high of 51% of Bulls games to 18% of Warriors games.

Now perhaps someone might choose to argue that a formula of lagged variables can out-predict contemporary season summary variables, but frankly that would be an uphill struggle. And of course, this 64% value seen in 2016-17 isn't set in stone for all seasons (the figure crucially depends on the distribution of team strength within the league) and another metric (such as OFFRTG-DEFRTG) would offer different results, but only slightly.

So, stipulating this, where are we? The historic PA of BPM of 0.645 looks pretty darn good, and RPM seeming to be an improvement is better still, seemingly leaving no room for improvement.

But predicting win-loss records is a bit of topic drift. Maybe now we can begin returning to the issue of the value of RPM in sorting players.

I corrected the HCA as 0.615. You probably missed the post between all thse big charts. Can you do it with the HCA as 0.615 again if it's not too much trouble?

Posted: **Sat May 06, 2017 6:45 pm**

How could I say no after all the information you've posted (though i cannot get to it for a small while)? Just to anticipate, however, there is no way that this will make any substantive difference to the figure (my bet is that it won't change the answer to two significant figures presented). A season consists of 1230 games, only a fraction of which are influence by HCA in this kind of exercise, and the number influenced by a further change of 0.015, has to be peanuts (and could go in either direction). Stay tuned...

Posted: **Sat May 06, 2017 7:36 pm**

Went through the calculation, and there was absolutely no change in the PA. Only the Bulls vs. Cavs game outcomes were affected by shrinking a notional, league-wide HCA from 0.630 and 0.615, and the only effect of this was to change the mix of unexpected outcomes but not the total. So, the figure of 0.36 (or 0.359) stands.

Posted: **Sun May 07, 2017 12:10 am**

schtevie wrote:Went through the calculation, and there was absolutely no change in the PA. Only the Bulls vs. Cavs game outcomes were affected by shrinking a notional, league-wide HCA from 0.630 and 0.615, and the only effect of this was to change the mix of unexpected outcomes but not the total. So, the figure of 0.36 (or 0.359) stands.

Thank you. Your findings, Crow's wish and the thread's original topic made me do a research I'm sure everyone's most interested for.

- 2001/02-2013/14 regular and playoff data was used to create a blend of PER, WS, BPM, USG, MPG, Thibodeau, AWS and RAPM. RPM was left out. No further data was used OR predicted in any form in the creation of the blend.
- 2014/15 and 2015/16 both regular and playoff games' score margin was predicted and RMSE of each metric and blend was calculated.
- The blend is completely out-of-sample but metrics as lone predictors cannot be called 100% out-of-sample because each metric's singular coefficient and intercept (which is basically HCA) to predict Margin was calculated by using 2014/15 and 2015/16 data. This stage had to be done to make different kind of metric comparisons easier because metric and stat values were in different form. Those 2 years had to be chosen because I wanted to include RPM in the comparison and RPM had no other data so I didn't want to punish other metrics in RPM's favour. Since we're also predicting this data, I can confirm applying the method on these years helped all metrics around RMSE of 0.24-0.26.
- Shortly, the blend was in small disadvantage.

2627 games were predicted. No roster turnover limit. Simply the whole data that consists of 2014/15 + 2015/16 regular and post-season.

Code: Select all

+-----------+-------------+
|  METRIC   |    RMSE     |
+-----------+-------------+
| BLEND     |  12.3324633 |
| BPM       | 12.49061905 |
| RPM       | 12.49556962 |
| AWS       | 12.56756586 |
| WS        | 12.62856945 |
| Thibodeau | 12.67719531 |
| PER       | 12.89176372 |
| MPG       | 13.14923064 |
| USG       |  13.3729981 |
+-----------+-------------+

Although the sample size is too small, a roster turnover chart for those who wonder.

I plan on joining the next year's prediction contest so I can't share blend details. I won't use this one ofc, it can easily be improved even further by feeding the whole data and adding 2016-17 season and using multiple-years of metrics AND any kinds of stats instead of one year in the blend.

So schtevie, I think even with the box-score and PM data only, metrics can be improved further. However, don't be fooled by the blend's first position in higher roster turnover rates becuase the sample size is too small. I roughly tested different blends' RMSE/RT on bigger data and I got conflicting results. I will carefully look into it in the future.

RPM is a bit worse than BPM at predicting score margin but RPM's win prediction is 0.673 to BPM's 0.659. So they're really similar. If I had to pick one I would pick RPM as the marginally better metric for a couple reasons.

APBRmetrics

Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players

Re: Poll: RPM's degree of efficacy in sorting players