Estimating Offensive Production

kennethcwilbur · Post by **kennethcwilbur** » Tue Jul 05, 2011 9:14 pm

Greetings from a first-time poster. My collaborators and I just finished a paper proposing a new methodology to estimate individual basketball players' offensive production.

To the best of my knowledge, two core ideas are new. Each player's offensive production estimate is based on his team's points per possession when he is on the court (not new), changes in teammates' usage rates when he substitutes out of the game (new), and a theoretical model designed to represent basketball (new; most models are designed to represent baseball or other games with simpler structures).

An example conveys the intuition. Imagine that players LeBron, Chris and Dwyane play on the same team in a 2-on-2 league, so one player is always on the bench. Suppose that LeBron uses 75% of all possessions when he plays with Chris and 50% of all possessions when he plays with Dwyane. If the team distributes shots well, this pattern of usage rates will be used to infer that Dwyane is better than Chris. The example easily extends to the 5-on-5 case. Basically, the more your teammates defer to you, and the more your team scores while you're on the court, the better your offensive production estimate will be.

The results show good face validity. Carmelo, LeBron and Wade are in a statistical dead heat at the top of the league. Most traded players' estimates are indistinguishable across teams. The estimates are rather precise and enable meaningful comparisons between players based on a single season of data.

The paper is here:
http://papers.ssrn.com/sol3/papers.cfm? ... id=1861192
Readers of this board will probably be most interested in sections 4 and 5. Please be aware that we used 2009-2010 regular season data, since it took us 18 months to finish this first version of the analysis. Hopefully we'll do another version using newer data, but it might not be soon.

I would be glad to receive any comments or feedback by email (kennethcwilbur at gmail).

Cheers,
Ken

Kenneth C. Wilbur
Assistant Professor, Duke University
http://kennethcwilbur.com

Crow · Post by **Crow** » Wed Jul 06, 2011 5:16 am

Thanks for posting this paper. There is a lot to follow with the model and a lot to consider regarding the various findings.

You suggested comment via e-mail as one option or maybe the preferred one. If it is ok, I'll post these brief comments here with the thought that maybe someone else can say more perhaps building from them.

After my initial read, my first observation is that of the top 25 on offensive impact on table 2, 19 are in the top 25 for usage (among players over 600 minutes) and most of those who missed the top 25 on usage (4) are in the next 10. http://bkref.com/tiny/QYedm

This model comes to the finding that a high percentage of the highest usage players have the most estimated offensive impact. From that usage and / or how the model values usage and I guess the impacts on that usage on teammates.

Thinking about that, I then reviewed some findings of Jeremias Englemann. He has a file for point per shot impact using the Adjusted +/- methodology but in this case the focus was on the impact of a player's shooting alone instead of his total offensive impact from all "Factors" (shooting, turnovers, offensive rebounding and getting to the line) and all other non-boxscore actions (and on the flip side, it also shows the impact of his shot defense alone).

http://stats-for-the-nba.appspot.com/pps

It is multi-season so it is different than just 2009-10 and therefore it is not an equal comparison to your 2009-10 data, and it is probably tougher to get in the top 25 of a multi-season dataset with more total players competing for that honor, but only 7 of the top 25 on impact in your table 2 are also top 25 here. That is pretty low. And only 5 players in the top 25 on this Adjusted Point Per Shot Factor estimate are also top 25 in usage in 2009-10. The performance findings are pretty different for the very top players on usage.

Perhaps some further discussion and study of these models and findings together could provide additional value for model refinement and improve interpretation of the findings of each model and the relationship(s) between the findings.

EvanZ · Post by **EvanZ** » Wed Jul 06, 2011 2:24 pm

Along the same line of thought as Crow (I think), if you simply look at (TS%)*(USG%), the following are the top 25 for 2009-2010:

Code: Select all

RK	Player	TS*USG
1	LeBron James	20.25
2	Dwyane Wade	19.61
3	Kevin Durant	19.41
4	Carmelo Anthony	18.32
5	Kobe Bryant	17.61
6	Chris Bosh	16.99
7	Amare Stoudemire	16.78
8	Dirk Nowitzki	16.64
9	Corey Maggette	16.41
10	Danny Granger	16.18
11	Monta Ellis	15.21
12	Brandon Roy	15.15
13	Manu Ginobili	15.07
14	Dwight Howard	15.05
15	Carlos Boozer	14.87
16	Jamal Crawford	14.78
17	Al Harrington	14.62
18	Chauncey Billups	14.60
19	Paul Pierce	14.58
20	Tim Duncan	14.55
21	Derrick Rose	14.47
22	Chris Kaman	14.27
23	Stephen Jackson	14.25
24	Joe Johnson	14.15
25	Aaron Brooks	14.10

Quite similar to Table 2 of the paper, and a whole lot easier to calculate I suppose. If I have any more thoughts after reading the paper fully, I'll definitely come back and post some more.

Jeff Fogle · Post by **Jeff Fogle** » Thu Jul 07, 2011 4:15 pm

Ken, in your view, how does this improve on what's already out there in terms of production metrics?

bbstats · Post by **bbstats** » Sun Jul 10, 2011 7:29 pm

Melo at the top? Count me skeptical...

Interesting looking read, though! I'll hopefully skim through it if I can make it past all the equations

EDIT: I'm having a hard time making it through the equations and economics language, but it seems that the paper talks mostly about shooting efficiency, which If I remember correctly, only explains about 40% of a team's overall offensive efficiency?

Perhaps the title "offensive production" should have been narrowed down to "effective field-goal percentage" ?

kennethcwilbur · Post by **kennethcwilbur** » Mon Jul 11, 2011 5:58 pm

Thank you for all of your thoughtful contributions. I realize this is a challenging paper--calculus, game theory and statistics--so I really appreciate your replies. Let me try to boil down our arguments:

1. Defense
Optimal defensive strategy has to leave each offensive player with the same expected points per possession used.

For example, if LeBron scores 1.8 points per possession (PPP) and Bosh scores 1.2 PPP, the defense should guard LeBron more and Bosh less. It should make this adjustment until the point that LeBron's PPP just equals Bosh's PPP.

If you're skeptical about this, then suppose the opposite. LeBron is scoring 1.8PPP and Bosh is scoring 1.2 PPP. Eric Spoelstra's best move (in the short run, anyway) would be to give LeBron a 100% usage rate and Bosh a 0% usage rate.

2. Offense
Given (1), each player's individual PPP depends critically on who he plays with. Hence the paper's title, "there's no 'I' in 'team.'"

However, if the offense distributes shots optimally in anticipation of the defense's strategy, then usage rates should correspond to players' offensive abilities.

If true, it would be wrong to look at individual PPP as a measure of individual capacity for offensive production. Doing so would undervalue great players who play with lesser teammates and it would overvalue average players who play with better teammates.

However, individual usage would accurately reflect offensive talents. Better players would shoot more often. Not more successfully, but more often.

3. Testing
The prediction that each offensive player within a 5-man team has the same efficiency is testable. Of course, the data are subject to the randomness inherent in shooting outcomes.

We formally test this prediction and the data do not falsify it. Scientifically speaking, it is impossible to prove that anything is true, but we tried and failed to find evidence that the prediction is false.

4. Caveat
If you believe that teams or players distribute shots poorly, you should be very skeptical of our results.

This is an important point with limited evidence on either side. Justin Rao's papers contain several results consistent with optimal distribution of shot attempts. Our investigation generally supports the idea.

The few counterarguments I have seen are woefully dependent on hindsight bias. Perhaps there is other disconfirming evidence that I am not aware of? I would be glad to know about it.

A few specific replies
@Crow: You are dead on that usage rates correlate highly with offensive production rankings, as suggested above. The model also controls for team PPP while the player is on the floor, team-game factors and the quality of the defense, but these factors appear to be secondary.

I agree that it is useful to compare results against other systems, and I was not aware of this ranking system. I was surprised to see Bonner ranked so highly; I wonder about the standard errors?

I should caution that our paper purports to measure offensive production only, whereas the link you included is ordered by offense+defense. The reliability of comparing a multi-season database to a single-season database depends on the degree of player variation over time, something I haven't investigated.

@EvanZ: Nice observation. Usage*(oncourt Team PPP) might get you even closer but I haven't tried it.

In general, any sophisticated model should have first-order effects that are reproducible using simple transformations of the data. If not, you have to be suspicious of where the model's results are coming from.

The advantages of estimating the model are that it controls for more information and provides precise standard errors, but the additional costs are substantial.

@Jeff Fogle: To my knowledge, no other system provides such precise standard errors using so little data. You can make meaningful statements among pairs of players, e.g. "We can reject at the 95% confidence level the null hypothesis that Kobe produced more than LeBron." It could probably be adapted to predict trade outcomes and calculate optimal contract offers.

I also want to note what is not new. This is not the first system to use players' usage rates to determine their ratings. Points per game and PER are conceptually similar in that regard. In a sense, you could interpret our theoretical model as rationalizing the traditional reliance on simple measures like scoring average as a measure of player quality.

@bbstats: Shooting efficiency is operationalized as PPP, not eFG (see section 3.1). The estimates are therefore likely to reflect offensive rebounding and turnovers in addition to shooting, but they are likely to undervalue passing.

Carmelo is a polarizing case. Many statistical systems rank him somewhat low because of his pedestrian individual efficiency. Yet the guy was a high-usage player on Team USA and finished 6th in MVP voting in 2010. Are the basketball experts crazy? My view- and this is just a personal opinion- is that if a system tells you that all the experts are wrong, it also needs to tell you in an objectively verifiable way why the experts are wrong. Otherwise you're just comparing opinions against opinions.

Thanks again for the discussion!
-Ken

EvanZ · Post by **EvanZ** » Mon Jul 11, 2011 8:32 pm

It's interesting that when Melo got to New York, he ran the exact same percentage of iso plays (37%), but his efficiency on those plays went way up from 0.85 to 0.97. Is that because he had Amare to share the defensive pressure? His overall efficiency went up from 0.95 to 1.02 PPP. Similarly, his TS% went up to 57.5%. It will be interesting to see if that trend continues next season. In 2007-08, when he and Iverson played a full year together, each had their career high TS% and it was almost exactly the same for both (56.7).

The guy is still only 26. It really doesn't take much more than one really great season to turn around a player's reputation.

schtevie · Post by **schtevie** » Mon Jul 11, 2011 10:41 pm

I am a bit confused. What is being estimated and what are the usage rates? Are we looking at a scoring efficiency (all points scored by fg and ft) conditional on the fact of a non-turnover? If so, we simply cannot compare these measures to any APM/RAPM results.

And then I am trying to figure out what the implications of the model's assumptions with regard to passing. The act of equating scoring opportunities on the margin itself is a hugely important skill that should (and does) get its reward. Does it in this scheme? It doesn't look like it.

If what we are looking at is some contingent measure of scoring efficiency, it wouldn't surprise me to see Carmelo Anthony toward or at the top. No one doubts he is an elite scorer. But so what? What is alarming from an offensive basketball standpoint is Steve Nash barely cracking the top 50.

What is also really curious, and I don't know how to think about it, is the incredible precision of the results. Apparently, whatever is being measured, there is no doubt. And also in the curious, or make that interesting, department, is the overall distribution of the top 51 offensive players shown. They are all essentially 5 points per possession and above. That is a whopping difference to what RAPM suggests (though potentially measuring something different) where Jeremias' single-season 2010 estimates (conditional on a 2009 prior) shows but four players at or above 4.9. It would be interesting to have a smack down between this approach and RAPM for explaining offensive variation.

As a final note, a question. Could it please be explained how "If you believe that teams or players distribute shots poorly, you should be very skeptical of our results."? I don't know exactly how one would define poorly, but I am one who is of the belief (and on the record - if that still exists) that NBA basketball only approximates an optimal distribution of shot attempts. And I don't think the counterarguments are woefully dependent on hindsight bias.

I am not sure what evidence is particularly relevant to your empirical results, but historically three point shots have been underutilized. And closely related, to this day, mid-range jumpers are overutilized - especially those that are contested and more especially those taken, still not rarely early in the shot clock.

In the paper it is noted that Goldman and Rao (2010) "..derived a test of dynamic efficiency--whether shots are optimally allocated across seconds of the shot clock--and found that only one player out of 674 deviated from this necessary condition for optimality." If that is an accurate representation of reality, I will need to check my lyin' eyes.

Crow · Post by **Crow** » Mon Jul 11, 2011 11:47 pm

I haven't spent much more time with the paper and the issues on the table yet but I am glad the discussion is rolling. Using different approaches and looking at contrasts in methods and results might yield some new insights.

I did compile a brief bit of data though related to usage more from a championship team perspective instead of just focused on individuals. I checked the usage of the highest usage player on the last 9 champions in the regular season and playoffs and also checked for other teammates who played 20+ minutes with usage real near or above 20%.

In the regular season the eventual champ was lead by a guy with more than 29% usage 3 times and less than that 6 times. In the playoffs the champ was lead by a guy with more than 29% usage 6 times and less than that 3 times. The average usage of the highest usage guy only went from a bit over 29% to a bit over 30% though, so it is a mix of some change and continuation. The range for the top guy on the title winner stayed from 26-33% in both the regular season and the playoffs.

The number of other teammates who played 20+ minutes per game with over 20% usage went from 2.4 in the regular season to 1.9 in the playoffs.

There is some variation but one could say there is also a fairly standard team model being employed at least from this simple perspective. There was never more than 4 teammates total over 20% usage and 20+ minutes per game in the regular season and playoffs. The only times there weren't at least 3 over 20% were the two recent Lakers playoffs that resulted in titles (but not the regular seasons). In the 2009 playoffs Gasol managed to be the one teammate beyond Kobe to also get over 20% usage, in 2010 none did and he was under 19%. In the 2011 playoffs, it was a more distributed top usage with Kobe and 3 teammates at or over 20% usage but the 3rd worst playoff defensive efficiency overcame whatever was being done on offense.

I think this data gives some additional and perhaps useful context to the usage debate which is a part of the discussion here. Ultimately I am more interested in what title winners do than how various individuals rank amongst each other.

P.S. Ken, though the internet list at the link for Adjusted +/- PPS is ordered by offense + defense, I resorted it by just Offensive PPS for my earlier comparison to your analysis.

Bobbofitos · Post by **Bobbofitos** » Tue Jul 12, 2011 4:51 am

kennethcwilbur wrote:
1. Defense
Optimal defensive strategy has to leave each offensive player with the same expected points per possession used.

Is this true? Isn't optimal defensive strategy to force the player with the least expected PPP to use each possession? (Even better is not allowing a shot in the first place!)

If you're skeptical about this, then suppose the opposite. LeBron is scoring 1.8PPP and Bosh is scoring 1.2 PPP. Eric Spoelstra's best move (in the short run, anyway) would be to give LeBron a 100% usage rate and Bosh a 0% usage rate.

How much of usage is coach derived and how much is player mentality? How much control do outside forces have on shot selection/using possessions? Can Spoelstra force LBJ to shoot 100% of the time? Can LBJ force himself to use every single possession?

Joel Pryzbilla is godly when he uses possessions. Why doesn't he use more possessions? Why didn't the Blazers run every play for Joel Pryz?

Crow · Post by **Crow** » Tue Jul 12, 2011 5:39 am

"Its central prediction is that all five shooting players within a five-man offensive team will produce the same number of expected points per shot."

How much movement is there on expected points per shot on average in 5 man lineup data in the second halves of individual games compared to the first halves? How close to optimized do they get? From individual data for the entire season / all lineups it doesn't look like they get very close but of course it might not show up cleanly there.

How much does this optimization progress game to game across regular season series between teams? How much does it progress for the offense across the whole regular season despite the ups and downs of different opponents? How much does the optimize progress in a playoff series?

The article does many things and table 1 presents the finding that the central hypothesis is not rejected, but I don't recall seeing any place where it was easy to see direct answers to these questions associated with the central hypothesis. I want to know if there is ever greater optimization on average within individual games, season and playoff series and not just that the model that assumes that is not rejected. If it is happening that would seem like additional information worth presenting in detail.

P.S. I also found tables 5 and 7 interesting.

huevonkiller · Post by **huevonkiller** » Tue Jul 12, 2011 9:48 am

Crow wrote:
I think this data gives some additional and perhaps useful context to the usage debate which is a part of the discussion here. Ultimately I am more interested in what title winners do than how various individuals rank amongst each other.

P.S. Ken, though the internet list at the link for Adjusted +/- PPS is ordered by offense + defense, I resorted it by just Offensive PPS for my earlier comparison to your analysis.

I think it is very interesting to discuss what title winners do vs the losers. But ultimately the answer lies in: 1)sample size, 2)injuries , 3)defense (lower standard deviation), 4) trade or some drastic roster change. The 5th option is that the right team wins.

If I understood you correctly, you're implying the Lakers "play" the right way? That's misleading and this is coming from a hybrid LeBron-Kobe fan. So I've watched nearly all Laker playoff games the past 6 years. I watched the 2004 Finals as well (that was depressing!).

2011 Mavs = Injuries. Best rated team when healthy according to basketball-reference, even before the Finals. Also employed this strategy of constantly doubling LeBron and letting the others beat them.
2010 Lakers = Injuries - Sample Size. The Celtics were hurt and could have easily been champs with a different sample size. Lakers were very hurt during the season, Kobe drained his knee multiple times.
2009 Lakers= Sample Size. The Cavs lost an overtime game and another game at home by 1 point
2008 Celtics = Right team. Highest SRS teams of the 3 point era, after Jordan's Bulls.
2007 Spurs = "Right team". #1 SRS team although kind of cheated their way through Phoenix. And Dallas messed up this season. They played the Jazz and Cavs in the last two rounds, not really impressive.
2006 Heat = Injuries. Shaq always takes games off during the regular season.The Mavs were not a dominant team SRS-wise.
2005-Spurs = Right team - Sample size. #1 in SRS, although a close affair.
2004 Pistons = This one is a toughie, but they did go 20-5 after the Rasheed Wallace trade.

Crow · Post by **Crow** » Tue Jul 12, 2011 4:35 pm

I noted that the Lakers won their recent titles with an offense with less support players over 20% usage around Kobe than any other title winner in 9 years. After some debate I ended up not characterizing it as the right or wrong way. It worked for them.

I noted that when their offensive usage became more distributed at the top in 2011 they also happened to lose. It appears that it was because of poor defense. I did wonder a bit if the poor defense was at least in part because the top support players were focusing too much of their attention and effort on offense this time but I didn't share that speculation until now because it is just a speculation and while I raise it as a possibility to consider I am not necessarily endorsing it. It is a pretty simplistic explanation and maybe doesn't adequately account for the opponent who knocked them out or other things like maybe a weaker bench. It also goes against the grain of how the other title winners in the period did it and against the way I'd prefer things work.

Tape review might help but there is the temptation of confirmation bias. So I left that all out, 'til now. It can get fuzzy, pretty quick. Others can make their own judgments about the right way to play / split usage, the Lakers and the right way for the Lakers to play if they want to. I initially settled for just noting that they were the exception to the general top usage pattern but they did win 2 titles- when they also showed enough defensive strength. In 2009 playoffs 4th best on offensive efficiency, 2nd best on defensive efficiency. In 2010 4th best on offensive efficiency, 7th best on defensive efficiency. In 2011 3rd best on offensive efficiency, 14th best on defensive efficiency. They had enough offense with 2 different ways to split the usage but without a good enough defense they went out pretty fast and easily.

I agree with you that injuries or later trades can make it harder to see "the best team" in regular season stats. I further agree that defense matters a lot, especially in the playoffs. I would also concur that some Finals could have gone either way and the still fairly small sample size of a playoff series (and a few key plays or calls) is important to keep in mind when reviewing results and drawing conclusions about best teams.

I think reviewing what recent title winners did in many ways can help and probably a lot, but there is some wiggle room to vary from the successful patterns with some rosters and opponent sets and there is and will be more than one standard way to win in fairly short series. Even that may not completely settle for all who the very best team was in some abstract, ultimate sense. As with players, one probably can't nail down and settle everything about team quality. But with more time and smart effort you can get better.

huevonkiller · Post by **huevonkiller** » Sat Jul 16, 2011 1:54 pm

I think that's a pretty fair and objective assessment, Crow.

If I had to speculate why the Lakers lost this time, they were trending downward against New Orleans (Pau's numbers certainly). Age and all those deep post-season runs caught up to them. Plus having to face a solid team like Dallas.

Crow · Post by **Crow** » Sat Jul 16, 2011 5:26 pm

Thanks huevonkiller. I agree age and all those deep post-season runs are worth including in the assessment / explanation.

APBRmetrics

Estimating Offensive Production

Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production

Re: Estimating Offensive Production