On RPM and other models..."Pieces, Pieces Everywhere"

JimiHendrix · Post by **JimiHendrix** » Mon Apr 14, 2014 3:16 pm

http://hardwoodparoxysm.com/2014/04/14/ ... verywhere/

I think this is a piece that quite a lot of you will either enjoy (or detest), so I thought I'd pass the link along. I agree with a fair amount of what Ian says, but this chunk of the article in particular sticks out to me:

My issue with RPM, and really all of the various plus/minus models, is that they are increasingly complex methods for stripping away the context of a player’s production, trying to measure it in a vacuum. It’s an admirable pursuit to some degree and these intricately designed techniques have become, in many ways, the basketball analytics arms race. The problem is that I’m just not that interested in the result. The context and the noise, which these models work so hard to control for, are exactly the things I’m interested in. I don’t just want to know which player is better. I want to know why and in what ways. I want to know what that implies about both the player and team, his teammates and opponents, and basketball as a whole. As constructed and presented, I typically find precious little of that information in plus-minus statistics.

This problem is not unique to RPM, or even to the entire family of plus/minus models. Win Shares, Wins Produced, PER, also chase the same goal–generalizing the “why” to highlight the “what.” But the “why” is what I find most interesting, the “why” is the reason I watch and write about basketball.

I'm curious to hear everyone's thoughts, in particular those who have created their own models.

J.E. · Post by **J.E.** » Mon Apr 14, 2014 4:38 pm

It'd be obviously be a lot 'cooler' if we had perfect BoxScore (or whatever) stats that could perfectly describe/predict player value.

Though, I probably wouldn't have brought that point up with a metric that, as far as I know, was the first to include statistics derived from the PlayByPlay like 'Blocks rebounded by the defense', 'live/dead ball turnovers' and others, on top of the BoxScore statistics

Crow · Post by **Crow** » Mon Apr 14, 2014 4:59 pm

Why gets attempted answers when RAPM is taken down to 4 Factors (for offense and defense) or JE's three (eFG% and FT/FGA combined). That level of detail has noise (but I believe no more than the overall RAPM estimates) but it can be compared with raw data. Looking at both seems potentially more useful than looking at just raw boxscore data IMO.

J.E. · Post by **J.E.** » Mon Apr 14, 2014 5:23 pm

Crow wrote:Why gets attempted answers when RAPM is taken down to 4 Factors (for offense and defense) or JE's three (eFG% and FT/FGA combined). That level of detail has noise (but I believe no more than the overall RAPM estimates) but it can be compared with raw data. Looking at both seems potentially more useful than looking at just raw boxscore data IMO.

What would give even more answers is BoxScore/PBP-prior informed 4(3)-factor RAPM, although that will only help to some degree. I guess it could give us estimates how much a player with such-and-such assist numbers helps his teammates shoot, etc.

The big leap will come from SportVU data though. That would include things like
- successful box-outs
- steal gamble rate
- gamble recovery time
- shot contest rate
- hockey assists
- different types of assists (how many dribbles were taken after the pass / how many seconds was the ball held)
- turnovers per touch
- quality of screens
and many more

Before that happens somebody needs to develop very good algorithms to extract that data from SportVU though. Then somebody will need to make the data public. Then we'd probably need 2-3 years of data before coefficients from SPM regressions make sense. Long ways to go

Mike G · Post by **Mike G** » Mon Apr 14, 2014 6:29 pm

... The problem is that I’m just not that interested in the result. The context and the noise, which these models work so hard to control for, are exactly the things I’m interested in...

The writer is over-extending a bit. It's like saying, "I don't care who won, I want to know how they won."

I like to watch 2-4 min. of highlights without knowing who won; but it's hard to do. You kind of have to know who won, to fully appreciate how they won.

So, there's not that much information in the final score; there's more info in what transpired to produce the win (or the loss).

Zach Randolph is better than Marc Gasol at rebounding and creating a shot; Gasol is better at hitting the shot, passing, and defense. Who is the better player? If you grasp the importance of the elements, you can venture a guess at the total.

knarsu3 · Post by **knarsu3** » Mon Apr 14, 2014 9:29 pm

J.E. wrote:
Crow wrote:Why gets attempted answers when RAPM is taken down to 4 Factors (for offense and defense) or JE's three (eFG% and FT/FGA combined). That level of detail has noise (but I believe no more than the overall RAPM estimates) but it can be compared with raw data. Looking at both seems potentially more useful than looking at just raw boxscore data IMO.
What would give even more answers is BoxScore/PBP-prior informed 4(3)-factor RAPM, although that will only help to some degree. I guess it could give us estimates how much a player with such-and-such assist numbers helps his teammates shoot, etc.

The big leap will come from SportVU data though. That would include things like
- successful box-outs
- steal gamble rate
- gamble recovery time
- shot contest rate
- hockey assists
- different types of assists (how many dribbles were taken after the pass / how many seconds was the ball held)
- turnovers per touch
- quality of screens
and many more

Before that happens somebody needs to develop very good algorithms to extract that data from SportVU though. Then somebody will need to make the data public. Then we'd probably need 2-3 years of data before coefficients from SPM regressions make sense. Long ways to go

The Vantage dataset does include all of that plus more (specifically defense where we can measure things like help defense etc., not to mention a contest is actually a contest with the hand being up) but obviously as we discussed, there's an issue with the amount of data we have right now. But the positive is that the data doesn't really need to be extracted (combination of human eyes + their proprietary technology) or well not by us at least.

As for making the data public, well unfortunately the kickstarter didn't work out.

Statman · Post by **Statman** » Mon Apr 14, 2014 9:53 pm

JimiHendrix wrote:http://hardwoodparoxysm.com/2014/04/14/ ... verywhere/

I think this is a piece that quite a lot of you will either enjoy (or detest), so I thought I'd pass the link along. I agree with a fair amount of what Ian says, but this chunk of the article in particular sticks out to me:
My issue with RPM, and really all of the various plus/minus models, is that they are increasingly complex methods for stripping away the context of a player’s production, trying to measure it in a vacuum. It’s an admirable pursuit to some degree and these intricately designed techniques have become, in many ways, the basketball analytics arms race. The problem is that I’m just not that interested in the result. The context and the noise, which these models work so hard to control for, are exactly the things I’m interested in. I don’t just want to know which player is better. I want to know why and in what ways. I want to know what that implies about both the player and team, his teammates and opponents, and basketball as a whole. As constructed and presented, I typically find precious little of that information in plus-minus statistics.

This problem is not unique to RPM, or even to the entire family of plus/minus models. Win Shares, Wins Produced, PER, also chase the same goal–generalizing the “why” to highlight the “what.” But the “why” is what I find most interesting, the “why” is the reason I watch and write about basketball.
I'm curious to hear everyone's thoughts, in particular those who have created their own models.

Before people maybe jump on Ian about his single # rating complaint- our own DeanO told me the same thing a while back about my college & NBA ratings/rankings. He metioned specifically that he didn't like/trust single number ratings for players.

Now, I actually can split my ratings up into all the skillset minutae - I just don't usually post it because it seems most don't care. I kinda doubt this can be done with a metric like RPM, but maybe? We know it tries to seperate O & D - right? Or, am I confusing it with xRAPM - or are they actually the same? I'll get this straight soon I'm sure.

Like I mentioned to DeanO - ESPN almost always presents Total QBR, as a single number metric - that doesn't mean that's ALL it is. I wouldn't be devising my NBA draft model and career curve projections if my metrics were "just" a single number.

But, that single number allows the metric to be palatable to the general public, and allows for a ranking and debate stemming from it. Debate is good for ESPN and all media.

mystic · Post by **mystic** » Tue Apr 15, 2014 11:38 am

Statman wrote: Before people maybe jump on Ian about his single # rating complaint- our own DeanO told me the same thing a while back about my college & NBA ratings/rankings. He metioned specifically that he didn't like/trust single number ratings for players.

That actually makes little sense, because in fact it sounds like an argument on personal incredulity to me. Such a thing doesn't lead to anything. The author of that article is essentially critizising the metric based on the fact that he doesn't really understand how it works, but also because he isn't really sure how to interpret the number. Sure, a "holy grail" will not tell you "how" a player acts on the court, but in case of RPM it tells you which impact his action on the court makes. It is a quantification of a specific question: How much helps a player a team to outscore an average opponent over an average player per 100 possessions? At no point does that mean that this would be the only number needed to fully describe or understand the action of an individual player. That is not what the number is meant to do. And critizing it for something which is neither the intention nor would be in the validity range of a metric makes very little sense.

The context in which those numbers are created is important, and without understanding the context, it is rather unlikely to create a meaningful metric anyway. Removing the player from the context of role and playing time also removing part of the equation going into that. A player has that impact in the role he is used for as well as the minutes he played. That is something very important.

And yes, having more quantified actions on the court to describe a player in a more objective way, is better. But even if those numbers would be available, the quantification of the overall impact of a player on the game result is still a very important information in order to put things into perspective as well as being able to challenge common beliefs while giving a chance to learn and understand the game better. Therefore, I found that article to not be useful, because from my perspective it misses the entire point of that "new ESPN stat" (which isn't exactly new to anyway following the developments made by J.E. over the past ca. 3 years).

wilq · Post by **wilq** » Sun Apr 20, 2014 8:19 pm

JimiHendrix wrote:The problem is that I’m just not that interested in the result. The context and the noise, which these models work so hard to control for, are exactly the things I’m interested in. I don’t just want to know which player is better.I want to know why and in what ways.[...]

This problem is not unique to RPM, or even to the entire family of plus/minus models. Win Shares, Wins Produced, PER, also chase the same goal–generalizing the “why” to highlight the “what.” But the “why” is what I find most interesting, the “why” is the reason I watch and write about basketball.

My response would be: if you are not interested in something it doesn't mean it's not interesting at all for anybody.
Also why does author act as if we have to choose between "what" and "why"? Those are not contradictory goals, aren't they? So I would argue we should follow both paths... and there's nothing wrong that some people will prefer one over the other.

Statman · Post by **Statman** » Mon Apr 21, 2014 2:55 am

wilq wrote:
JimiHendrix wrote:The problem is that I’m just not that interested in the result. The context and the noise, which these models work so hard to control for, are exactly the things I’m interested in. I don’t just want to know which player is better.I want to know why and in what ways.[...]

This problem is not unique to RPM, or even to the entire family of plus/minus models. Win Shares, Wins Produced, PER, also chase the same goal–generalizing the “why” to highlight the “what.” But the “why” is what I find most interesting, the “why” is the reason I watch and write about basketball.
My response would be: if you are not interested in something it doesn't mean it's not interesting at all for anybody.
Also why does author act as if we have to choose between "what" and "why"? Those are not contradictory goals, aren't they? So I would argue we should follow both paths... and there's nothing wrong that some people will prefer one over the other.

Agreed.

For the general fan - I would think the simpler, the better. Also, the results better not stray too much from general perception (laugh test), or the general fan will probably declare BS and be done with it.

I'm happy ESPN is venturing into more complex analytics - I am worried the results RPM produces may turn off many general fans (say, numerous backup players raning ahead of Anthony Davis), and thus scare off ESPN from the basketball analytics. But, in general - I want ESPN to explore ranking players in general, because I feel it gives guys like myself (and many at this board) a chance to land a media gig and maybe get a chance to write. Fans LOVE rankings, & we (the basketball nerds) can come up with some pretty solid stuff - whether it be from more conventional box score stats like what I, Mike G, Wins Shares, etc do - or from the harder to find stuff like RPM (xRAPM) and future metrics involving the "new" stats now being recorded.

It does bother me that everyone tells me I should be doing fantasy ratings. I almost certainly will next season (mainly to get eyes to my work), but I kind of wish fans and media in general would appreciate more the attempts by many people trying to quantify "real" performance.

DSMok1 · Post by **DSMok1** » Mon Apr 21, 2014 3:53 pm

Give me a player's "Holy Grail" precise impact on the court, and I can figure out what he's doing to make that impact (that's the whole concept of ASPM). Way harder to go the other way.

repole · Post by **repole** » Mon Apr 21, 2014 5:44 pm

Part of the issue from my perspective is that those who don't fully understand what something like RPM calculates are still more than happy to throw it around as some sort of end all be all rating. A few years ago when PER entered the mainstream, seeing people argue that player A was better than player B because their PER was a bit higher was mind numbing. The same has started to and will undoubtedly continue to happen with RPM.

That's ok though, because ultimately those statements are usually followed by someone coming along and explaining that PER is heavily tied to usage and that any sort of adjusted plus minus is heavily tied to player roles and utilization. The end result is a fan base that's more educated, but the process of getting there sure can be painful.

Statman · Post by **Statman** » Tue Apr 22, 2014 12:20 am

DSMok1 wrote:Give me a player's "Holy Grail" precise impact on the court, and I can figure out what he's doing to make that impact (that's the whole concept of ASPM). Way harder to go the other way.

True.

sideshowbob · Post by **sideshowbob** » Tue Apr 22, 2014 6:51 pm

knarsu3 wrote:The Vantage dataset does include all of that plus more (specifically defense where we can measure things like help defense etc., not to mention a contest is actually a contest with the hand being up) but obviously as we discussed, there's an issue with the amount of data we have right now. But the positive is that the data doesn't really need to be extracted (combination of human eyes + their proprietary technology) or well not by us at least.

As for making the data public, well unfortunately the kickstarter didn't work out.

Ahh, that's unfortunate. I was going to say that you guys have data that seems to take it quite a bit further than SportsVU so my thought was always that there'd be a larger breakthrough with that set.

Crow · Post by **Crow** » Tue May 13, 2014 7:43 pm

Would it be possible to further split RAPM/ RPM in the following way? It can already be split 8 ways into the 4 offensive and defensive factors for players. There are several ways to calculate a player's direct SPM as well and to calculate the difference between overall RAPM/RPM player impact estimate and that direct SPM to find the estimated indirect impact, overall or down to factor level with a bit more work. But could that indirect impact be split into factor level and position level?

The way I'd think of doing so this would be to best guess assign position to all players for all bits of game data and then run the RAPM/RPM analysis at factor level for changes to the scoreboard that came from each of the 5 positions on the court. The estimate for the player's own position could be compared to the SPM estimate and the total indirect impact estimate could be compared to the estimates for the other 4 positions down to the factor level. If folks object to RPM being a black box and weren't satisfied taking it down to 8 factor level data elements or 16 if you broke it into direct SPM and indirect RPM at factor level, this would take it down into 5 positions * 8 factor elements = 40 impact estimates for the totality of a player's direct and indirect impacts.

I do wonder a bit about the direct / indirect division of impact credit between 2 players but the thought is that you've already found the direct impacts and the total indirect impacts and all you are trying to do is to sub-divide that indirect impact data set into where the scoreboard occurred by position (I was thinking about it at team level, not distinct individuals, but I guess it could be at individual level with more work.).

Hypothetically with this one could see the impact of a PG on the center position's eFG% or the PF's offensive rebound rate or the wing's FT/FGA rate, or the SG's turnover rate etc. Any one have comments on how to get this done or interest in trying it? I know in advance that some will be skeptical of dividing the data this much. But my point is that there might be valuing in trying it and seeing what it says. More leads/ clues may yield more in the end after careful consideration than less leads/clues without this approach. I am under the impression that since the sample size is still the same, the average size of the errors has not increased.

APBRmetrics

On RPM and other models..."Pieces, Pieces Everywhere"

On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"

Re: On RPM and other models..."Pieces, Pieces Everywhere"