The debut and popularization of BPM

Statman · Post by **Statman** » Mon Dec 15, 2014 2:42 pm

AcrossTheCourt wrote: If you include SportVU stats like the rim protection ones, distance while defending, etc. blocks are a lot less valuable, drop out completely, or actually end up with a negative correlation. Blocks are not a good measure of defense, especially now with all the information we have.

So, what if we DON'T have ALL the information you mentioned? With box-score metrics, we don't have that info. If we want to do historical player ratings - we don't have that info. Blocks correlate to defense when we have limited data - that's just how it is. It is rare for a good/great defensive player by RAPM, etc. (especially big man) not to have a better than average block rate. Now, maybe, if we had significantly more data (ala SportVu) the importance of the block may almost disappear in favor of all the other info - but that's not what we have here.

I understand why reb (& bk? - man, big bk/rebound bigs with lowish assists get destroyed) rate was tied to assist rate - it helps the correlation. But, I'm guessing there's positional bias here - there are alot more forwards & guards than true big men. The correlation proves true (tying assists to the more big man stats) in a general sense when tested on all players - since most guys tested aren't big men & don't have LOW assist rates (otherwise they don't play - wing/guards who never pass don't usually last long). BUT, now the much smaller % of players who have great rebound & block rates & low assists now have their value dropped to almost nothing because BPM sees the general correlations that are dominated by a different type of player who doesn't fall prey to having such low assist rates due to position.

Was magnitude of any of rebound & block & assist taken into account when tying them together? I mean, as it is, you have a guy that gets 16 rpg & 4 bpg with 0.5 apg have MUCH lower value in bpm than a guy with 5rpg & 1 bpg with 4 apg, all other things equal. It's crazy to me that a guy who is very much elite in MULTIPLE important facets of the game statistically gets knocked down to mediocrity (literally guys like Moses Malone, Mourning, Mutombo, etc aren't really much better than an average run of the mill NBAer in BPM) JUST because one stat is well below league average - a stat that generally is dominated by a COMPLETELY different type of player.

Same on the opposite end, guys like Nash & Stockton now look like nice players in BPM, but not elite players by any means. Their high magnitude of assist rate is getting destroyed when tied to their mediocre rebound (& bk) rates.

I know, the correlation works. BUT, these type of big rebound/bk w/ low assist AND huge assist w/ mediocre st/bk tend to be spread out over the league (& don't tend to play together in lineups) - combined with many ok at all three stat guys who are probably generally overrated because being ok at everything is better in BPM than being ELITE at anything if you are mediocre at something else (particularly assists) - the two factors offset each other & the correlation still stands. But, GREATLY undervaluing a certain group of players that EVERYONE knows are great while very slightly overvaluing a large portion of the rest of the average because the correlation holds (it balances out) is what we have here - in my eyes.

I guess, probably a bad comparison, could be made to baseball. We all know that batting average by itself is pretty meh. OBP & Slugging are better. If we ignored OBP & slugging, we didn't even have hit by pitch, doubles, triples data. It is probably VERY possible to create a metric at the player level that rates the player by using a combination of BA*HRrate & BA*BBrate. When compiled at the team level, it probably would correlate well every year, much better than BA or HR or BB alone. BUT, now a guy hits .400 with 0 home runs - he's worthless by this metric. A guy hits .220 with 160 walks & 50 home runs - yeah, he's not that good either. There has to be a better way to do it - no matter if the correlation looks better than BA alone.

I guess what I'm saying is that I don't believe this couldn't be improved - probably by adjusting for magnitudes somehow. Don't make getting twice the rebounds to league average the same as half the assists. Or 1/5th the blocks the same 5 times the assists (for guys like Nash, Stockton). Figure out how to properly adjust the disparities created by the huge differences in magnitudes (elite at one stat, below average in another) AND still keep your good correlation (yes, I fully believe it could be done).

Finally, pardon my ignorance, by was every possible statistical combination tested to find the best correlation from one year to the next like Reb, Bk, & ast? 87000000000 or so possibilities there. How do we not know if, maybe, 3pt rate & rebound rate need to be tied? Or, personal foul rate & steals?

Mike G · Post by **Mike G** » Mon Dec 15, 2014 3:59 pm

Statman, you should download some RAPM files and fiddle with other boxscore stats to find weights and exponents that yield minimal difference. You might come across some combinations that work better. Make your own boxscore plus-minus.

Somewhat alongside the same arguments you raise, there are successful teams on which there are multiple assist men who also score and/or rebound. The '90s Bulls were like this. The current Spurs, also.
Then there are excellent teams with "designated" passers and rebounders and shotblockers. The '90s Jazz might be such, or the current Clippers.

If the Raw BPM indicates that a team's players are vastly under- or over-rated because of these differences -- their BPM do not add up to team MOV -- and if the team correction just moves everyone on the team up or down equally; then we still get the weird outliers. In some team contexts, specialization might be a strength.

I don't really know how to comprehensively introduce every combination of stats into a regression. Maybe one term should be ((Ast%)^2*Reb%)^(1/3), for example; and (Ast%*Reb%)^(1/2) should be dropped?
The possibilities are limitless.

Statman · Post by **Statman** » Mon Dec 15, 2014 5:30 pm

Mike G wrote:Statman, you should download some RAPM files and fiddle with other boxscore stats to find weights and exponents that yield minimal difference. You might come across some combinations that work better. Make your own boxscore plus-minus.

I don't really know how to comprehensively introduce every combination of stats into a regression. Maybe one term should be ((Ast%)^2*Reb%)^(1/3), for example; and (Ast%*Reb%)^(1/2) should be dropped?
The possibilities are limitless.

First, my metric is in essence my best work at a box-score plus minus. There are aspects of my approach that allow the results (in a good sense imo) to tend toward RAPM & farther away from PER, WS, etc. In almost every single case, ESPECIALLY those that are great defenders but not much in terms of producing (Bruce Bowen, Battier, Tony Allen, etc), my metric will have a generally understood great defensive player ranked higher than any other box-score metric out there besides BPM - where then there it varies on the "type" of player. My metric will value a low assist great defenders - Mutombo, Mourning, Howard, etc - often MUCH more than BPM (& closer to fan/gm perception from my understanding), while BPM will value good assist huge stl/bk/&or reb outliers guy much more (Kirilenko, Bo Outlaw), often much higher than any fan/gm perception would be (Bo Outlaw, a top 50 NBA player from the last 35 years?).

My metric USED to value the Kirilenkos & Bo Outlaws MUCH more than PER/WS/WP/etc ever would - now BPM comes along and blows my WAR outta the water in terms of valuing the Kirilenkos & Bo Outlaws. BPM does value the Bowens, Battiers, & Allens of the world better than the other box-score metrics (which is an obviously good thing imo), but not necessarily better than mine - depending pretty much on assist rate.

2nd part - yes, completely agree - the possibilities are limitless. This is partly why combining stats as factors in such a way bothers me - you can't test EVERY possibility. It leads to dangerous assumptions (like low assist big men, no matter how elite they are everywhere else, are almost worthless) imo because the overall general correlations maybe look a little better on the few stat combinations that were tested. We don't have a clue if it's the best possible correlations - since we can't possibly test EVERY stat combo - & also not every box score metric has been tested to compare. This seems a wee bit similar to the problems Dave Berri ran into - his method of "testing" told him that his player metrics were far & above better than everyone elses.

Anyway, don't get me wrong - I'm sounding a little extreme in my BPM concerns. I do think it is in general on the right path to evaluating player value compared to the "known" box-score metrics - it tends to find the guys that have been underrated by production metrics. However, there are a couple steps in the approach that I think very well are killing certain types of "great" players (elite in multiple facets of the game), greatly undervaluing them just because of mediocre assist rates. It just seem like BPM is on the right path - but it's not done, it's a work in progress. I think it'll still be improved - I can't imagine these results being the be all end all of the very best correlations that could possibly be had at the player level.

Also, were 80's & 90s correlations done? Compared to my WAR/48, it seems like a lot of faster pace era guys ranked higher than expected. Were correlations only done "in sample" in the 2000s? Is there a pace issue here if not? I don't know at all, just wondering. I apologize if this was answered many pages ago & I completely missed it.

Statman · Post by **Statman** » Mon Dec 15, 2014 5:51 pm

Mike G wrote:Statman, you should download some RAPM files and fiddle with other boxscore stats to find weights and exponents that yield minimal difference. You might come across some combinations that work better.

Also Mike, I'm not a RAPM guy at all. I do think it helps "find" certain players - but I really can't be on board with a methodology that tries to come up with current values of players in varying ways (rookies are approached differently, player height involved, etc), all the while using prior seasons because larger amounts of data is really needed for less "noise". I, in no way, would want to create a box-score metric that completely mimics RAPM - let it do what it does, I do what I do. I do want a metric that pretty much rates the "undervalued" RAPM guys better than other box score metrics, all the while coming up with results that still pass the eye test - and have historical results to compile well (no bias for era). The fact that my WAR/48 correlated pretty closely to JE's initial RPM this season helps me feel pretty comfortable about my box score approach.

permaximum · Post by **permaximum** » Mon Dec 15, 2014 6:01 pm

@Limitless possibilities thing

That's what I thought when I read the calculation of BPM first. Ast*reb may increase R^2 but that doesn't mean it helps out-of-sample prediction accuracy. It's very very very obvious that BPM undervalues blocks and assits. Pehaps I should come with a uniuqe metric using 14-year RAPM data. Still, I don't want to spend time calculating playoff SRS for a metric that has the potential to become better than BPM a bit.

Anyways, what happened to the retrodiction test? Let's check WAR, eWins etc. and see what are the actual results.

Statman · Post by **Statman** » Mon Dec 15, 2014 6:43 pm

permaximum wrote:@Limitless possibilities thing

That's what I thought when I read the calculation of BPM first. Ast*reb may increase R^2 but that doesn't mean it helps out-of-sample prediction accuracy. It's very very very obvious that BPM undervalues blocks and assits. Pehaps I should come with a uniuqe metric using 14-year RAPM data. Still, I don't want to spend time calculating playoff SRS for a metric that has the potential to become better than BPM a bit.

Anyways, what happened to the retrodiction test? Let's check WAR, eWins etc. and see what are the actual results.

Haven't heard from Neil, so don't know. I sent him my HnI ratings, assuming they'd have the best predictive power, BUT I realized maybe my WAR/48 (ties HnI to team wins, also with every player assumed at at least "replacement level") might have as good or better correlation. I asked if I could send him the WAR/48 results also - he never responded, so I never sent the WAR/48. Didn't want to hassle him about it.

I'm also very curious how the 80s correlations look with all the metrics compared to now, since the 80s the game was played much differently. I always wondered how well many other metrics adjust to a different game.

I could do testing, but I'd have a different approach that probably would draw the ire of some. I'd probably do something like weighted current with 2 years previous & 2 years next. We all know all the metrics outside of PER correlate well in season so I'd combine current with previous with next, weight current probably current minutes*4, previous & next previous & next minutes *2, y-2 & y+2 with minutes *1. Current would then have 40% (or more if a rookie any of those years) weight (ignoring minutes played), y-1 & y+1 40% total, y-2 & y+2 20% total. This way - I would never create a dummy value for anybody (rookies, guys that missed season due to injury, etc) - EVERY metric would be tested the same. I could run correlations to team W% to every season for every metric from 1980 to now.

But, just like the arguments about the metrics, there would be arguments about my results - especially if my metric came out on top. So, lose lose for me testing - my metric doesn't come out on top, it'd prove to some my metric isn't as good as BPM (or others), my metric comes out on top, my testing is completely flawed.

Screw it, I may do it anyway - have all the y-2, y-1, y+1, y+2 in a MASSIVE spreadsheet (for every year from '80 to '14) with all the metrics I can get together easily (bball-refer ones with my HnI & WAR/48) there so other analysts could easier test with their own methodology if they hate my approach, since the data would be there. If I do this, I'll try to set up the spreadsheet so it's be easy for others to grab the different ratings at the y-2, y-1, y+1, y+2 level.

willguo · Post by **willguo** » Mon Dec 15, 2014 7:42 pm

Statman wrote: First, my metric is in essence my best work at a box-score plus minus. There are aspects of my approach that allow the results (in a good sense imo) to tend toward RAPM & farther away from PER, WS, etc. In almost every single case, ESPECIALLY those that are great defenders but not much in terms of producing (Bruce Bowen, Battier, Tony Allen, etc), my metric will have a generally understood great defensive player ranked higher than any other box-score metric out there besides BPM - where then there it varies on the "type" of player. My metric will value a low assist great defenders - Mutombo, Mourning, Howard, etc - often MUCH more than BPM (& closer to fan/gm perception from my understanding), while BPM will value good assist huge stl/bk/&or reb outliers guy much more (Kirilenko, Bo Outlaw), often much higher than any fan/gm perception would be (Bo Outlaw, a top 50 NBA player from the last 35 years?).

In baseball, catchers who can't hit almost invariably have great defensive reputations. Is something similar is happening here - if this guy is playing a lot of minutes for a good defensive team, most likely he is good on defense, especially if he's not good on offense?

Statman · Post by **Statman** » Mon Dec 15, 2014 7:56 pm

willguo wrote:
Statman wrote: First, my metric is in essence my best work at a box-score plus minus. There are aspects of my approach that allow the results (in a good sense imo) to tend toward RAPM & farther away from PER, WS, etc. In almost every single case, ESPECIALLY those that are great defenders but not much in terms of producing (Bruce Bowen, Battier, Tony Allen, etc), my metric will have a generally understood great defensive player ranked higher than any other box-score metric out there besides BPM - where then there it varies on the "type" of player. My metric will value a low assist great defenders - Mutombo, Mourning, Howard, etc - often MUCH more than BPM (& closer to fan/gm perception from my understanding), while BPM will value good assist huge stl/bk/&or reb outliers guy much more (Kirilenko, Bo Outlaw), often much higher than any fan/gm perception would be (Bo Outlaw, a top 50 NBA player from the last 35 years?).
In baseball, catchers who can't hit almost invariably have great defensive reputations. Is something similar is happening here - if this guy is playing a lot of minutes for a good defensive team, most likely he is good on defense, especially if he's not good on offense?

Yep, very much so. My metric makes assumptions that if a guy has miserable production but plays a ton - especially on a good team - his rating goes up significantly (probably a great defender). A guy with great per minute production but plays little - especially on a bad team - his rating goes down significantly (probably a bad defender). All players on their teams are adjusted accordingly, so player ratings still compile correctly at the team level. BPM has minutes played as part of the regression I believe - but I'm not sure the value is tied inversely to production. THAT would make sense to me in terms of solid correlations - if it hasn't been tested, it should be.

This adjustment helps the correlation of my HnI & WAR to RAPMish metrics for sure, without the crazy outliers (like Nick Collison being a top 10 player, or a rookie who looks incredible being well below average) that drive the general fan crazy.

willguo · Post by **willguo** » Mon Dec 15, 2014 8:26 pm

Makes sense, and I do something similar myself, but not a big fan of that as a scientific approach. It's going to be a chicken-and-egg problem - Coach is great because he plays the right players, so if Coach doesn't play him he's probably bad at defense. But if Coach doesn't play 3 of these guys in one year, then these guys probably aren't that bad at defense - Coach doesn't know what he's doing.

It's just so hard to apportion value to each player within a team, and counterpart data isn't quite the panacea either - I have a worse closeout % because I'm so athletic I am the one chosen to double the big man. I have a worse blow by % because I'm always guarding the #1 option. I have a worse stay in front % because this guy can't finish with his left, so I left him drive that way rather than go right. Etc etc etc. Defense is hard.

Statman · Post by **Statman** » Mon Dec 15, 2014 8:52 pm

willguo wrote:Makes sense, and I do something similar myself, but not a big fan of that as a scientific approach. It's going to be a chicken-and-egg problem - Coach is great because he plays the right players, so if Coach doesn't play him he's probably bad at defense. But if Coach doesn't play 3 of these guys in one year, then these guys probably aren't that bad at defense - Coach doesn't know what he's doing.

It's just so hard to apportion value to each player within a team, and counterpart data isn't quite the panacea either - I have a worse closeout % because I'm so athletic I am the one chosen to double the big man. I have a worse blow by % because I'm always guarding the #1 option. I have a worse stay in front % because this guy can't finish with his left, so I left him drive that way rather than go right. Etc etc etc. Defense is hard.

Well, since I do tie playing time & production together - it's really only the outliers (bad production huge minutes or good production low minutes) that seem to have their HnI (& WAR/48) change that much. Most of the guys in the NBA see playing time that correlates at least somewhat to production, their final rating adjustments are small comparatively to the outliers.

The adjustment to the outliers pretty much always matches defensive capability perceptions. The only thing that seems to mess up this approach just a little is the rookie who is horrible (Morrison, Rivers, Lavine) who gets a ton of playing time because of being a high draft pick. Good thing is, this happens it's almost always on a BAD team, so the adjustment doesn't trend upward nearly as much as if they played on a good team.

willguo · Post by **willguo** » Tue Dec 16, 2014 6:12 am

Statman wrote: Well, since I do tie playing time & production together - it's really only the outliers (bad production huge minutes or good production low minutes) that seem to have their HnI (& WAR/48) change that much. Most of the guys in the NBA see playing time that correlates at least somewhat to production, their final rating adjustments are small comparatively to the outliers.

Oops, I missed a whole thought - I sort of have a fudge factor for MPG and defense that's dependent on team defense and the coach. I.e., if Popovich plays a bad offensive player for a lot of minutes, I assume he's good at D, and am fairly confident. If Kurt Rambis does it, I have no idea how good that guy is on D from this factor, and am forced to basically only consider other factors. This leads to the chicken-and-egg issue that I was talking about.

bchaikin · Post by **bchaikin** » Tue Dec 16, 2014 6:28 am

Javale McGee and Serge Ibaka had extremely high block rates, and don't rate that well in DRAPM.

are you saying serge ibaka is not a very good defender because DRAPM says so, or are you saying he is a very good defender but DRAPM is just not picking it up, and thus is not a good measure of a player's defense? here's why i ask...

as an RAPM it supposedly uses stats from previous seasons in it's calculation, correct? ibaka had 215+ blocks in each of the past 3 seasons. in the history of the nba only 17 players have had 3 or more seasons with 215+ blocks:

kareem abdul-jabbar
manute bol
shawn bradley
marcus camby
mark eaton
patrick ewing
serge ibaka
george t. johnson
alonzo mourning
dikembe mutombo
shaquille o'neal
hakeem olajuwon
theo ratliff
david robinson
tree rollins
elmore smith
ben wallace

and ibaka is only 1 of 3 players to do this 3 times prior to the age of 25 (age at the start of the season, shawn bradley and tree rollins being the other two)...

if ibaka is not a very good defender, just out of curiosity which of the above players does RAPM also say are not very good defensive players?...

from the ages of 20-24 ibaka has a shot blocking rate of 5.6% BS (5.6 blocks per 100 opposing team FGAs), same as greg ostertag, close to marcus camby (5.5% BS), alonzo mourning (5.2% BS), theo ratliff (5.8% BS), and david robinson (5.9% BS). any of these players DRAPM also find as not very good defenders?...

somewhere between 3/5 to 3/4 of all blocks are rebounded by the defense, and thus those are defensive stops (the block + def reb). so players who block alot of shots are forcing a high number of defensive stops, moreso than players that block few shots...

for a player who blocks alot of shots to not be a good overall defender, he would have to be responsible for very few forced misses other than blocks that are followed by def rebs. can you show this to be the case?...

If you include SportVU stats like the rim protection ones, distance while defending, etc. blocks are a lot less valuable, drop out completely, or actually end up with a negative correlation. Blocks are not a good measure of defense, especially now with all the information we have.

a block takes a shot that has a say 40%-60% chance of going in to a 0% chance of going in. anytime you stop a shot from going in that's good defense. how does that "...drop out completely..." or end up "...with a negative correlation..."? how can a blocked shot negatively correlate with defense when by itself it is excellent defense?...

AcrossTheCourt · Post by **AcrossTheCourt** » Tue Dec 16, 2014 6:45 am

Because missed shots are more likely to be rebounded by the defense than blocks, and forcing a miss is already being picked up by the SportVU stat -- so you'd be double-counting it with blocks. I could run through these numbers more specifically when we have more data here to find the relative value difference of blocking a shot inside versus just "defending" a miss to see what's really going on. At least with a blocked shot you know it's a miss, but then there are all those guys who leave their feet too much for a block and end up playing bad defense anyway. And it's not entirely negative/dropping out since I have a couple interaction terms with blocks, but it's a lot less valuable with shot defense data than people think.

For the longest time, the only thing we had was blocked shot (and steal) data, so it was really difficult to parse the good defenders. I think the thing is that blocks correlated with good rim protectors, but not perfectly so because some guys blocked too many jump shots or left their feet too often. So we've always been attached to the mythical idea of the blocked shot as the ultimate form of defense. But now that we have better information we don't have to rely on it so much.

Statman · Post by **Statman** » Tue Dec 16, 2014 4:54 pm

willguo wrote:
Statman wrote: Oops, I missed a whole thought - I sort of have a fudge factor for MPG and defense that's dependent on team defense and the coach. I.e., if Popovich plays a bad offensive player for a lot of minutes, I assume he's good at D, and am fairly confident. If Kurt Rambis does it, I have no idea how good that guy is on D from this factor, and am forced to basically only consider other factors. This leads to the chicken-and-egg issue that I was talking about.

True, but Pop's player will have his player's rating come up significantly more because of the quality of the teammates he's playing them in front of & the quality of team in general. Rambis' player won't see nearly the bump in rating, his team & teammates won't be as good.

The biggest positive swings in rating when factoring in playing time relative to player production relative to team production (without the player btw) tend to be players from good or great teams - with the occasional rookie whose production is horrible but plays alot on a bad team & sees his rating go from well below replacement level to around replacement level.

The biggest negative swings almost always are big per minute production guys who somehow still don't get on the court for bad teams.

DSMok1 · Post by **DSMok1** » Tue Dec 16, 2014 5:09 pm

Statman wrote:Yep, very much so. My metric makes assumptions that if a guy has miserable production but plays a ton - especially on a good team - his rating goes up significantly (probably a great defender). A guy with great per minute production but plays little - especially on a bad team - his rating goes down significantly (probably a bad defender). All players on their teams are adjusted accordingly, so player ratings still compile correctly at the team level. BPM has minutes played as part of the regression I believe - but I'm not sure the value is tied inversely to production. THAT would make sense to me in terms of solid correlations - if it hasn't been tested, it should be.

That's a clever approach. Sort of a Bayesian prior methodology, using minutes and "box score production" to inform a defensive prior estimate. Good idea.

APBRmetrics

The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM

Re: The debut and popularization of BPM