The debut and popularization of BPM

Statman · Post by **Statman** » Thu Nov 27, 2014 3:18 am

mystic wrote:
Just check out how your metric would do, if you wouldn't make it fit on the team level. Than compare that to PER. I wouldn't be surprised, if your metric comes up short.

I don't make it "fit" at the team level in the end like some others do - I come up with the team rating FIRST and then divvy up that rating to all the players - thus adjusting for that specific team's pace, efficiencies, maybe friendly home keeper assist/block padding, etc. I personally think it's a very intuitive way to approach player ratings, and I think it allows my college players ratings to make a ton of sense. I'm not going to apologize for trying to make my work best try to represent true player value based on box score stats - compiling at the team level seems to be a very simple and obvious thing to do.

But, ok, I'll completely change my approach to my work in an attempt that would obviously make it worse to see if it's even more worse than PER.

Overall, you didn't get my point - it doesn't really matter if PER can correlate decently well or not in prior seasons when it can't correlate well in the current season. PER is still maybe the most "known" box score metric, am I supposed to ignore it's obvious flaws because we all know it's obvious flaws? I was just making a point that a box score metric pretty much fails to do it's job if when compiled it doesn't even correlate to team success same season.

I KNEW the other metrics would have a solid correlation, I'm not an idiot. I used them for comparison. I used them to show that they pass (like we knew they would by how they are created) the very BASIC step of same season correlation. I wasn't trying to explain my metric is "better" in any way to them, which is why I did not bother to have my listed with them. But, I felt obligated to explain why I didn't list either of my metrics, just in case someone thought maybe mine fell short of the very simple same season correlation.

AcrossTheCourt · Post by **AcrossTheCourt** » Thu Nov 27, 2014 4:59 am

So am I building the metric for 1978 through 2000? I don't need to attach 2001 through 2014?

Statman wrote: it doesn't really matter if PER can correlate decently well or not in prior seasons when it can't correlate well in the current season.

Why does this matter? PER wasn't built to sum to team wins and you can always do that adjustment later. Summing up to wins in the current season has no real utility. A metric is useful when you can do stuff with it, like predict wins in the next season. There's no rule stating that a metric has to explain wins in the current season before you do anything else with it.

Like if you told a team or someone gambling hey you can go back in the 2014 season and explain wins really well by using this metric with a team adjustment ... well, so what? How is that useful? 2014 already happened.

Statman · Post by **Statman** » Thu Nov 27, 2014 2:03 pm

AcrossTheCourt wrote:So am I building the metric for 1978 through 2000? I don't need to attach 2001 through 2014?

Statman wrote: it doesn't really matter if PER can correlate decently well or not in prior seasons when it can't correlate well in the current season.
Why does this matter? PER wasn't built to sum to team wins and you can always do that adjustment later. Summing up to wins in the current season has no real utility. A metric is useful when you can do stuff with it, like predict wins in the next season. There's no rule stating that a metric has to explain wins in the current season before you do anything else with it.

Like if you told a team or someone gambling hey you can go back in the 2014 season and explain wins really well by using this metric with a team adjustment ... well, so what? How is that useful? 2014 already happened.

Well, it matters to me I guess. It doesn't make sense to me that you have a metric that attempts to quantify value of players, yet when you compile the results it does a poor job of quantifying value of teams. PER has Dallas as the best team in the league by a pretty wide margin, and the Rockets as well below average and one of the worst teams (MUCH worse than the Lakers). That doesn't tell you that MAYBE the metric has some flaws in terms of quantifying value?

Now, I know that we ALL know that these flaws have always existed - but the general fan often doesn't know this. The general fans doesn't think "hey, interesting PER there - now let me run some team adjustments to maybe 'improve' the results and see what we have!". PER is maybe the most widely accepted and mentioned box score metric.

All I (and others here) is trying to quantify value at the player level as best I can - and be able to adequately and fairly compare values across seasons & eras. I personally feel at the very least to quantify players' value in a given season - when compiled it should give a decent picture of team quality. PER fails the initial test, which gives it little credence to me.

BTW, this is why I shy away from these discussions, and may recuse myself from discussing further. Even though in many cases my interaction has been limited, I like about everyone that I've come across here. I like Hollinger & have his periodicals, DeanO & I have talked and actually have each other as friends on facebook- his book sits in my reading library (ie bathroom) for quick reference when I'm inspired. I don't want to look like I am trashing anybody by having my doubts about some of their work. I believe I have improved quite a bit on some of the work the pioneers started with - BUT others may disagree and I am totally fine with it.

I may discuss more when Neil posts some results. I don't know. I have a TON of work I want to do, and these discussions seem to get me sidetracked a little mentally.

Statman · Post by **Statman** » Thu Nov 27, 2014 2:15 pm

AcrossTheCourt wrote: Like if you told a team or someone gambling hey you can go back in the 2014 season and explain wins really well by using this metric with a team adjustment ... well, so what? How is that useful? 2014 already happened.

BTW, If you told someone gambling that a metric compiled very poorly to the team level SAME SEASON - would they use that metric as a reference for their gambling? How is that metric useful? To me a player metric that would be useful is one that correlates well to team success same season - as well as correlates well prior &/or future seasons.

Mike G · Post by **Mike G** » Thu Nov 27, 2014 2:26 pm

For a player metric to be credible, it has to accurately describe a given season's team success. It's a necessary, but not sufficient, test of validity. Year to year stability is the rest of the test.

As Dan says, a single example of nonsense -- Dal vs Hou -- proves a flaw in the metric.
If a team's PER is wack one year, and the team is largely unchanged the next year, it'll likely be wack again.

As I recall -- and anyone may check the old APBR_analysis files -- both PER and WS (ORtg and DRtg) had their "issues" pointed out before they ever went to press. They are the same issues they carry to this day, but their creators went forward with them anyway.

If you want to retrodict 2010 by use of BPM, could you just ignore that year's data from the RAPM that was used to build BPM? Would that suddenly make the test more valid? Or would BPM have the same coefficients with or without a given season?

If BPM is based on regular-season RAPM, is it valid to to use postseason results as a test?
What actually makes any test not-valid because it's "in sample", and the "sample" is approximately Everything? Are we splitting hairs here?

v-zero · Post by **v-zero** » Thu Nov 27, 2014 2:50 pm

Mike G wrote:If you want to retrodict 2010 by use of BPM, could you just ignore that year's data from the RAPM that was used to build BPM? Would that suddenly make the test more valid? Or would BPM have the same coefficients with or without a given season?

Doing this still leaves the major issue of model/variable selection. BPM is a result of some model/variable selection process, and as such its formulation is a direct result of all the data which was used to create it, and that will include 2010 whether you remove 2010 from the data on which you fit the parameters or not.

mystic · Post by **mystic** » Thu Nov 27, 2014 2:58 pm

Mike G wrote:For a player metric to be credible, it has to accurately describe a given season's team success. It's a necessary, but not sufficient, test of validity. Year to year stability is the rest of the test.

No, such test is neither necessary nor sufficient. It is also 2/3 of Dave Berri's "quality check". I can show you an easy example, why that is basically useless:

Code: Select all

Player                 Tm     MP     UNO    UND    UNR
Mike Scott            ATL     178   1.94   0.40   2.34
Kyle Korver           ATL     444   1.10   0.55   1.64
Thabo Sefolosha       ATL     265   1.25   0.20   1.45
Mike Muscala          ATL     34    5.27   -3.92  1.35
Kent Bazemore         ATL     82    2.39   -1.38  1.02
Dennis Schröder       ATL     196   0.96   -0.64  0.32
Al Horford            ATL     383   0.48   -0.35  0.14
Shelvin Mack          ATL     159   0.59   -1.53  -0.94
Pero Antic            ATL     199   0.29   -1.46  -1.17
DeMarre Carroll       ATL     308   -0.03  -1.21  -1.24
Paul Millsap          ATL     456   -0.24  -1.10  -1.34
John Jenkins          ATL     21    6.72   -8.60  -1.87
Jeff Teague           ATL     425   -0.46  -1.43  -1.89
Elton Brand           ATL     21    6.42   -8.97  -2.55

These are values for the current season. When you take the minute weighted average of UNO and UND multiplied by 5, you get the Hawks ORtg/DRtg above league average. How did I arrive at those numbers? Well, I took the team-level data (in that case ORtg/DRtg) and then devided those among the players using their jersey numbers (using z-scores here) and their minutes played. Given the fact that jersey numbers are mostly staying constant from season to season as well as rather high season-to-season correlation for the minutes for each player exists, such a metric will have a typical year-to-year consistency. I called that UNR, because that is the Ultimate Nonsense Rating.

What people need to understand is where the "summing up to wins/team-level-play/etc." within a season comes from. It is explicitely a result of either starting at a team-level and then subsequentially devide that value among the team's players or making it fit to the team-level play afterwards. That is not rocket science to create such a metric, in fact I created one which then even fulfills Berri's third part: it must make sense. The "it makes sense part" is given by me declaring "scoring plus defense" as "making sense" (similar arbritarily as Berri). Then I simply took the individual player's scoring per 100 team possession rate and added a "small" defensive adjustment term to it. What makes that metric sum up nearly perfectly to wins is the fact that the minuted weighted scoring per 100 poss for individual players multiplied by 5 will give you the team's ORtg. Setting up the "defensive adjustment" in a fashion that the team's defensive prowess (Drtg over league average) will be distributed among the players accordingly. In that way this metric will completely match the ORtg and DRtg for each team. Then we know that this correlates highly with the win% (we can even improve that by simply using a SOS adjustment as well).

What is the point: Wether we start with a team-level value or let the results fit the team-level at the end, doesn't matter, because we can distribute the value among the players on a specific team in every way we want, it will ALWAYS correlate highly with the team win%/team net rating. But that is not a quality of the individual player values. The year-to-year consisteny is mostly guaranteed anyway, because the players will likely stick with their role from season to season and will get similar minutes. What we need to understand here is, that we are dealing with a sampling bias. We are not working with a random sample, but with players who are selected by skilled people (GM, Scouts) and then used in a fitting fashion again by highly skilled people (coaches), while the overall role determines the boxscore entries as well as skills. Being able to separate the skill part from the role part is what a good metric can do, and at that I'm rather sure PER is better than a lot of metrics, which are summing up to the wins/team-level play in-season (or show a high correlation to that).

Mike G · Post by **Mike G** » Thu Nov 27, 2014 3:00 pm

... its formulation is a direct result of all the data which was used to create it, and that will include 2010 whether you remove 2010 from the data on which you fit the parameters or not.

If the 2009-10 pbp data were for whatever reason missing, hidden, or unavailable, then how would this data yet be included?

mystic · Post by **mystic** » Thu Nov 27, 2014 3:04 pm

Mike G wrote:If the 2009-10 pbp data were for whatever reason missing, hidden, or unavailable, then how would this data yet be included?

v-zero means that testing the 2010 BPM numbers then in a retrodiction test in subsequent seasons, will still make it a in-sample test, because those subsequent seasons are included in the formulation of the metric. A "out-of-sample" test for that time period could be setup by using only half of the dataset (2001 to 2007 RAPM for example), and then test how well the metric is able to predict subsequent seasons (meaning, use 2008 data to predict the 2009, 2010, 2011 season, then 2009 data for 2010, 2011, 2012, etc. pp.).

Mike G · Post by **Mike G** » Thu Nov 27, 2014 3:15 pm

mystic wrote: What people need to understand is where the "summing up to wins/team-level-play/etc." within a season comes from. It is explicitely a result of either starting at a team-level and then subsequentially devide that value among the team's players or making it fit to the team-level play afterwords..

Your example is of a metric that is not of sufficient credibility. You may legitimately believe it was necessary to create the example, but it's not necessary for most of us here.

A 3rd possibility is one you did not posit: That your metric is valid enough that it needs no team adjustment afterward.
A simple example: You rank players by points per (team) game. Do your player ppg sum to team PPG? It is necessary that they do, or you have made an error. It's still not sufficient to explain anything else, like point differential.

Mike G · Post by **Mike G** » Thu Nov 27, 2014 3:22 pm

mystic wrote:
Mike G wrote:If the 2009-10 pbp data were for whatever reason missing, hidden, or unavailable, then how would this data yet be included?
v-zero means that testing the 2010 BPM numbers then in a retrodiction test in subsequent seasons, will still make it a in-sample test, because those subsequent seasons are included in the formulation of the metric. A "out-of-sample" test for that time period could be setup by using only half of the dataset (2001 to 2007 RAPM for example), and then test how well the metric is able to predict subsequent seasons (meaning, use 2008 data to predict the 2009, 2010, 2011 season, then 2009 data for 2010, 2011, 2012, etc. pp.).

Hmm. So I'm sincerely trying to figure out the notion of out-of-sample validity here.
If for whatever reason, say I had to hand-enter all data, I only had the 2001 season and the 2014 season upon which to create a RPM file -- and upon that I made some kind of BPM -- that all 12 seasons in between would still be "in sample"?

mystic · Post by **mystic** » Thu Nov 27, 2014 3:30 pm

Mike G wrote:Your example is of a metric that is not of sufficient credibility. You may legitimately believe it was necessary to create the example, but it's not necessary for most of us here.

Well, I have the impression it is necessary, because you still haven't understood that such a "in-sample test" is not helping you at all to determine the quality of the metric itself. Again, what is necessary to create a useful metric is understanding how those numbers are generated in the first place and that we have a sampling bias.

And how do you know "how credible" the metric is? Maybe it can even predict the outcome of future games better than yours? What would you then do?

Mike G wrote: A 3rd possibility is one you did not posit: That your metric is valid enough that it needs no team adjustment afterward.
A simple example: You rank players by points per (team) game. Do your player ppg sum to team PPG? It is necessary that they do, or you have made an error. It's still not sufficient to explain anything else, like point differential.

And? How well does that correlate with wins? So, your "3rd possibility" contains an example, which does not do well in "your test", because it does not correlate highly with wins. Thus, it makes no sense at all to bring that up, because it is not part of the point I made. In fact my first point was that such a comparison between a metric, which sums up to team-level-stuff per se, and one which does not do that, is completely pointless.

Try to create a metric which shows such a high degree of correlation while NOT using team data on the basis of ORtg/DRtg or PPG/opponentsPPG (or relevant boxscore-entries on the team level which are describing either thing pretty well). If you have that, you will likely have found something which in fact is describing a player quality and will show a pretty good predictive value.

mystic · Post by **mystic** » Thu Nov 27, 2014 3:39 pm

Mike G wrote: If for whatever reason, say I had to hand-enter all data, I only had the 2001 season and the 2014 season upon which to create a RPM file -- and upon that I made some kind of BPM -- that all 12 seasons in between would still be "in sample"?

No, it would not. You have 2001 and 2014 data, and then can test how well such a metric can predict 2002 to 2013 season results. You can not use 2013 or prior seasons for an "out-of-sample test" for the 2014 season, because 2014 data is part of the "creation process" and therefore would be considered "in-sample". You could also not use 2000 or prior seasons in order to predict the results for the 2001 season, because that again would be an in-sample test.

BPM is created by using 2001 to 2014 data, therefore seasons available for out-of-sample test are 2000 and before as well as 2015 and after. Example for in-sample: Using 2000 BPM numbers to predict the results of the 2001 season in a retrodiction. Out-of-sample: Use 1999 BPM numbers to predict 2000 results.

Statman · Post by **Statman** » Thu Nov 27, 2014 4:06 pm

mystic wrote: Try to create a metric which shows such a high degree of correlation while NOT using team data on the basis of ORtg/DRtg or PPG/opponentsPPG (or relevant boxscore-entries on the team level which are describing either thing pretty well). If you have that, you will likely have found something which in fact is describing a player quality and will show a pretty good predictive value.

So, create a metric that ignores the things we don't explicitly see in the box score - but maybe manifest themselves outside the box score stats (ie, actual team results, pace, defense, etc)

I don't see why one would want to do that. Team results HAVE to be part of such a metric as far as I'm concerned. We are trying to create a BETTER way to evaluate players based on box score stats - ignoring actual team results seems incredibly counter intuitive.

Would you suggest we create a metric that completely ignores pace also? If not - how is that different than ignoring actual team results?

I believe that, eventually, the BEST box score metric will not be able to ignore team data. The box score metric that is proven to just correlate much better than all others in season, out of season, whenever will incorporate ALL data available in terms of box score - and that obviously includes game results. That's my prediction, I just don't see any other way around it.

mystic · Post by **mystic** » Thu Nov 27, 2014 4:41 pm

Statman wrote: So, create a metric that ignores the things we don't explicitly see in the box score - but maybe manifest themselves outside the box score stats (ie, actual team results, pace, defense, etc)

Pace does not show a high correlation to the team results, pace-adjustment is normalization, from my perspective. If you use something which shows a high correlation to the team performance level (either separate or in conjunction), testing such a metric for their "in-season" correlation to wins/team performance will naturally be higher than a metric, which does not do that.

Statman wrote: I don't see why one would want to do that.

Because we want to know how good the players are. That team-level performance correlates well with wins, is something we already know.

Statman wrote:We are trying to create a BETTER way to evaluate players based on box score stats - ignoring actual team results seems incredibly counter intuitive.

The issue is that you can attribute the "outside of the individual boxscore" stuff in any arbritary way you want, and with the way you tested it, you can't say how well the metric is actually describing the individual players. I just showed that with the "metric based on jersey numbers". Starting with the team performance and then distributing that value among the team's players can be done in every perceivable way.

Statman wrote: Would you suggest we create a metric that completely ignores pace also? If not - how is that different than ignoring actual team results?

Pace does not correlate well with the team performance.

APBRmetrics

The debut and popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM

Re: The popularization of BPM