Will Success Come From Better Data?

Home for all your discussion of basketball statistical analysis.
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Will Success Come From Better Data?

Post by schtevie »

Appropriate apologies for starting a new string on Daryl's recently cited 800 words in the HBR. (http://blogs.hbr.org/cs/2011/08/success ... _data.html) I do so because what struck me as interesting was not the part of the argument others have focused on (that unindividuated analysts in perfectly elastic supply bring no particular competitive value - what is actually true, by definition). My quarrel is with the premise itself, that better data augurs success.

Why should we believe this to be true? First, in an absolute sense, and second, in an economic sense. Is it reasonable to believe that the potential gains will be worth the costs associated with gathering and utilizing such information?

To begin, let's ponder the contours of an apparent I-45 consensus. Both the Mavs and the Rockets, two clubs at the forefront of the analytic revolution in basketball, agree that acquiring new data is critical for future competitive advantage. Past this point, opinions appear to diverge as to where the green pastures are likely to be found. In his article, Daryl makes reference to potential gains from systematically tracking on-court activity (defensive challenges being the unconventional, potential stat cited). By contrast, Mark Cuban has moved on - way on - apparently believing little to no advantage can be gained by acquiring such data. You listen to him speak at the Sloan Conference and he very clearly states that he is hoping that the NBA socializes on-court information-gathering. And this from the guy who owns Synergy and has a comparative advantage in data acquisition! For him, success will come from other types of data. Psychological? Biometric? Those focused on highschoolers and foreigners? Who knows?

The point I am raising is whether it is reasonable to believe that there is a serious upside from investing much more money in any of this. Consider the following question to calibrate expectations: do you believe that future, competitive gains from analytics will be bigger or smaller than those that have already been realized? I like to think that reasonable (wo)men would agree that "smaller" is, clearly, the correct answer. There are two, basic reasons why this is so.

First, the primary, competitive edge realized to date has been the advantage early adapters have taken of "traditional" franchises. And this has been large. Evidence? Thank you Kevin McHale for either not having availed yourself of or dismissed the information provided by defensive APM! And more generally, every year Mark Cuban appears at MIT and has a chuckle, expressing his glee when playing teams that clearly have no idea how to maximize performance using (what is public) line-up data. "Getting excited" is the term I recall him having used last year, or the year before. And it's not only Mark, other franchise reps can be seen nodding along on stage (and I presume off). The point is that this low-hanging analytical fruit will, inevitably be picked by the laggards, at which point all the infra-marginal expenditures on analytics lose all return. This is a zero-sum game and league, after all.

And then the second reason to believe that the future will not be as glorious as the past is the simple, generic observation that one should expect diminishing marginal returns on future investment. If there is no holy grail in analyzing box score stats, do we think there will be one found from testing players' psychological fitness, or anything else? At the end of the day, the realities of the box score and the rules of the game impose strict limits on potential improvement. That isn't going to change; it cannot change. Perhaps I have a failure of imagination on this account.

So far, so bad. But maybe past gains from using analytics have been really big, such that something smaller than that in future might still be kinda important? Indeed.

The question then is: what success has analytics brought to the early adapters? To provide a tentative estimate of an upper bound, I would argue that one should look to the Mavs - the one franchise that has been at it the longest (right?), that has integrated the approach throughout its operations, and that hasn't stinted in spending in support of the effort.

To begin, nothing but the highest praise for Mark Cuban. Having joined the Celtics and the Lakers as the only franchises to win 50 games for ten years (and now more) consecutively is a remarkable achievement. Looking at the last ten years, the Mavs have averaged 56.7 wins. Using the Pythagorean formula, this is equivalent to besting their opponents on average by 5.73 points per game. Pretty darn good. The question now is how much of that success is due to analytics versus other factors.

Well, factor number one that needs to be taken into account is inheriting Dirk Nowitzki. This is rather simply done. Taking the 10 year RAPM (http://stats-for-the-nba.appspot.com/ranking), assuming 91 possessions per game on average (about right), factoring in DN's actual minutes played per game, we can estimate that Dirk was responsible for 3.93 points of the 5.73. Then, using Eli W's APM numbers (http://www.countthebasket.com/blog/2008 ... lus-minus/) to debit the contribution of an average PF, playing similar minutes, what you get is that the Mavs, net of the contributions of Dirk (above the average PF), were basically 2 points (2.01) better than their opponents, on average, over the last decade.

So, a 2 point superiority is what remains to be explained owing to contributions from analytics and all other factors. And what are the latter?

Well, clearly, an important negative factor (correspondingly an augment to the contribution of analytics) is the drag of continued success on replenishing the talent pool through the draft. I don't really have a strong sense on what this amounts to, 1 point per year, perhaps? Estimates, anyone?

Against this is the fact that Mark Cuban hasn't been afraid to spend, spend, spend (Maverick payroll having never left the top five over the past decade, I believe.) Now, perhaps, informed by superior analysis, he has spent his player salary dollars more wisely than others. But this aside, clearly, part of the 2 point competitive edge that needs to be explained is done so by the total volume of salary expenditure. And, again, if someone has a good estimate of the competitive returns to salary expenditures, please divulge.

Pending such context, let's stipulate that the drag on success from drafting low is offset perfectly by "excess" expenditures, leaving us a decadal return of analytics of 2 points per game. Formidable still, no?

Well, yes, as a historical matter, but the issue again is whether it can be sustained. My hunch is that over the past ten years (this year's Finals included) Mark Cuban got excited a lot, watching opposing teams put in inappropriate lineups. And my guess is that this factor alone is a big chunk of the 2 points that needs to be explained. Probably, on average, over the past ten years there were but a handful of teams that weren't susceptible to such errors, and one doesn't tend to get excited about only an expected fractional point in a line-up mismatch.

So, going forward, my sense is that the upper bound for the returns on investing in data (and analysts) is really quite small. Remaining "traditionalists" will, before too long, get in the game, if only so that Mark Cuban will stop laughing at them. And when that happens, a lot of money will then be spent chasing, on average, what is a fractional point per game, and one that will continue to decrease as time passes.

Success?
xkonk
Posts: 307
Joined: Fri Apr 15, 2011 12:37 am

Re: Will Success Come From Better Data?

Post by xkonk »

I've seen estimates that a win is worth about $1.5 million, so a better question might be: do you think better data could lead to a team winning one extra game per year? Because it seems likely to me that such a small increase is plausible, even with diminishing returns from publicly known work, but it seems very unlikely that a team would have to pay that much to get new data. Don't they usually hire interns, maybe unpaid, to log/code game film? So the cost of acquiring (and presumably analyzing) new data is extremely low, but the payoff is comparatively lucrative.

Also, the Dirk argument isn't solid. Dirk may have already been on the team before the Mavs started using advanced stats, but the team presumably used the stats in evaluating whether or not to keep Dirk, how many minutes to give him, how to use him, etc. So the value he gave the Mavs since they started using advanced stats is intertwined with their use of stats. That applies to every other player the Mavs have employed and not employed over the same time period.
Crow
Posts: 10565
Joined: Thu Apr 14, 2011 11:10 pm

Re: Will Success Come From Better Data?

Post by Crow »

Better playoff data and better analysis of playoff data (and team matchups in general) might be worthy of specific mention and extra efforts. Beyond lineup selection to other strategic decisions.


What % of the potential gain from analytic knowledge are teams actually capturing on the court? I'd guess it is fairly low, less than half or well less. It is just a guess but partly based upon seeing a good amount of after the fact apparent lineup inefficiencies even on the best teams. Cuban's recent focus on psychological assistance (Dr. Don Kalkstein, sports psychology coach, full-time) and bringing a stats coach to the bench (Roland) both seem to be aimed at getting more knowledge actually applied on the court.
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Will Success Come From Better Data?

Post by schtevie »

xkonk wrote:I've seen estimates that a win is worth about $1.5 million, so a better question might be: do you think better data could lead to a team winning one extra game per year? Because it seems likely to me that such a small increase is plausible, even with diminishing returns from publicly known work, but it seems very unlikely that a team would have to pay that much to get new data. Don't they usually hire interns, maybe unpaid, to log/code game film? So the cost of acquiring (and presumably analyzing) new data is extremely low, but the payoff is comparatively lucrative.
This view used to be mine. And I held it for....hmmm....close to two decades. The opportunities that are available now were much greater a decade ago (again, in terms of a potential greater informational advantage relative to then more ignorant opponents). Not to fixate on the Mavs, but it is worth fixating on the Mavs. If this franchise didn't fully exploit such opportunities (documenting a much richer set of on-court outcomes) and seems to have no interest in doing so now, one is obliged to rethink what is likely to be optimal.

I believe you, about the size of the prize. However, there is money spent gathering data, money to be spent on analyzing data, and (stipulating that clear cut opportunities arise from such efforts) there is then the wish and prayer that such information will be effectively turned into on court advantage. For this, first the coaching staff must be persuaded of its value and thereafter the players. And it is an illusion to believe that such interests are near-perfectly aligned.
xkonk wrote:Also, the Dirk argument isn't solid. Dirk may have already been on the team before the Mavs started using advanced stats, but the team presumably used the stats in evaluating whether or not to keep Dirk, how many minutes to give him, how to use him, etc. So the value he gave the Mavs since they started using advanced stats is intertwined with their use of stats. That applies to every other player the Mavs have employed and not employed over the same time period.
Actually, I think the Dirk argument is pretty darn solid. As a general proposition, talent drives outcomes in the NBA. And most of this talent is innate/cultivated by conventional coaching. And Dirk is a first ballot HOF talent. But surely, some of Dirk's Dirkness came from Wayne Winston pulling on the puppet strings, so to speak. (Though, regarding my little RAPM exercise, some of that might already be appearing in the coaching component, also in the RAPM ratings. But never mind that.) How much do you think that might be?

Dirk has played approximately one sixth of the Mav's minutes over the last decade. Presumably, the Mav's special analytical sauce is spread evenly across all player minutes/ratings. If one third of a point per game of Dirk's apparent value is actually due to better use through analytics (what would be approximately one tenth of his value) then we return to two points per game overall.

Some coincidental confirmation regarding the value of star talent. So, if you take Jeremias' 10 year, top ten RAPM players, and compile the winning records of the non-Mav teams such stars played on (NB culling out years when the star players didn't play more than 60 games, as unrepresentative, and necessarily culling out a few years of Steve Nash's contributions when in Dallas) what we see is that non-Mavs teams, fortunate enough to employ such transcendent players, win 51.7 games, on average, over the last ten years.

What does this imply in Pythagorean terms? Well, such teams are expected to have a 3.78 points per game victory margin, which is a difference of....drum roll, please.....1.95 points per game less than the decadal advantage of the Mavs.

This is some confirmation, anyway, that 2 points is in the ballpark of what is to be accounted for, by analytics and all other factors (with willingness to spend, prominently included in such a list).
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Will Success Come From Better Data?

Post by schtevie »

Not that there appears to be much interest in these pessimistic meditations, but here are a few additional thoughts...

First, apologies to the Spurs for neglecting to include them in the 50 wins or more over ten (plus) consecutive seasons club.

Second, in trying to contextualize the accomplishments of the Cuban Mavs, I inadvertently diminished them by a bit of double counting in the control group. The exercise was to establish a baseline of how many wins would be expected if a franchise was privileged to have under contract a top 10 player (as defined by Jeremias' top 10 players over the last ten years, per RAPM. These are, in descending order: Lebron James, Kevin Garnett, Dwayne Wade, Manu Ginobili, Kobe Bryand, Chris Paul, Tim Duncan, Steve Nash, Dirk Nowitzki, and Baron Davis.) Included in the calculation were only seasons where the player played a minimum of 60 games.

Previously, I came up with an average for the non-Maverick franchises of 51.7, but it so happens that that involved double counting of non-Mav franchise seasons where there were two such stars per team (in particular, LBJ and Wade last year and Ginobili's career overlap with Duncan). Eliminating this double counting, there are 60 such player seasons.

The modified results then are that non-Mav franchises, with healthy(ish) top 10 players over the last ten years, have won an average of 50.2 games per year (compared to the Mavs' 56.7). And in Pythagorean terms (using a 2011 scoring average) this is equivalent to a team being 3.24 points per game better than one's opponents on average (assumed to be equally superior on offensive and defensive performance). This compares to the Pythagorean Mavs being 5.73 points per game better - a difference of 2.49 points per game (not the 1.95 estimate, reported previously).

So, reprising the argument, in terms of estimating the contribution of analytics to the Cuban Mavs, they have won 6.5 more games (equivalent to a 2.5 extra point margin on the score board) compared to other lucky-ducky franchises, those "gifted" with a top ten talent.

Accounting for this "6.5/2.5" superiority is the Mavs analytical superiority, luck (I think on net good - at least in the sense that Dirk has enjoyed above-average health, compared to the other top 10 stars), and deep pockets - what has surely been necessary to help overcome the accumulated burden of being kept out of the lottery.

And finally, some context about the spending. Taking Patricia Bender's team salary data for the same player/franchise seasons over the last ten years, we may note the following:

10 Year Average of Median Team Payroll: $61.294 million
Average non-Mav, lucky-ducky Payroll: $64.574 million
Average Mav Payroll: $85.570 million

Two observations. First, if you luck out and get a great player, it hasn't cost much extra, on average, to join the 50 win club. To jump up to historically elite status, win-wise, is another story. Mark Cuban has paid, on average, $21 million per year for the extra 6.5 wins, $3.23 million per win (what doesn't include luxury tax payments). This is a bit more than xkonk's suggested value of $1.5 million per win.

Finally, bringing it back to where I began, speculating about the prospective value of investing more in data for analysis (and/or analysts themselves), if Mark Cuban (ex ante) spent his money wisely over the last decade, in terms of buying wins as inexpensively as possible, reflecting what we shall assume is the Mavs' "best practice" in analytics, it may be that there has never been a positive economic return to analytics.

If these dollar values are an appropriate rough guide, value via data collection/analysis would only appear to obtain by making players under contract better, and, again, I am not terribly optimistic about this potential, given expected decreasing returns to such activity and age-old, institutional constraints in adopting the potential opportunities that data collection/analysis uncovers.
Crow
Posts: 10565
Joined: Thu Apr 14, 2011 11:10 pm

Re: Will Success Come From Better Data?

Post by Crow »

The impact of additional spending should probably be net of the additional spending on Dirk above the average PF to fully isolate his impact.

The average cost /worth of all wins may be about $1.5 million. The cost of moving from the win total yielded by Dirk and his pay and an otherwise average team to the actual win total for Dallas may have cost $2.5- $3.2 million per win (lower than $3.2 million to the extent that Dirk's salary is backed out of the extra sending factor). But that could be worth it several ways- wanting do very well in the regular season and well or very well in the playoffs and wanting the extra revenue that these accomplishments return, which I don't think are fully considered by the simple average worth of wins calculation.

"...a lot of money will then be spent chasing, on average, what is a fractional point per game, and one that will continue to decrease as time passes."

A fraction of a point per game can separate teams from making the conference finals or not or winning it all over the rival or not. The marginal total value of that last point could be very high. Up to at least $10 million? $20 million, $50 million or more in some cases where it gets you the title? Titles not only yield real and psychic season level benefits, they can raise franchise values a lot. Even repeat deep contention can (as the Cavs valuations without and without LeBron James shows).

I have wondered if the talk of the limits on the frontier of what analytics can achieve (by GM and owner) is in part an attempt to get the competition to relax somewhat and not drive up the ante as much and reduce the risk off the early adapters being matched or passed. Yes the point gain may be fairly small and going down with the greater diffusion of analytic effort (... and analytic knowledge... and then understanding of that knowledge by the coach & players... and then effective implementation of that knowledge & understanding), but as long as people value one step up in achievement over the one just below it a lot and the final step up a heck of a lot more than the next to last one the margin return of the last point could justify lots of spending... on analytics, players, coaches and the optimal implementation from all those edge producing talents.

If an NBA title isn't worth at least $50 million additional (across a competitive window of say 3-5 years) to an owner (over and above just having an average team) he probably should do something else with his money. If it isn't worth that much to him, I doubt he'll get a title. If it is only worth $20 million or less, I really doubt he'll get one.

Is an NBA title worth an even greater effort, say another $25-50 million additional, including luxury tax, to an owner over a competitive window beyond the cost having a good or very good team who has some realistic chance to win a title (but not as good as with the greater effort, pushing to the max for that last point or fractions of a point)? Answers vary here. Owners prefer to win with a lower overall level of expenditure and it happens (largely because of the importance of the drafting and retention of superstars and the ability to find productive role players to put around them) but levels of effort on all fronts matter and there are teams who fell short of their goal which could have been better with higher spending on something.

What is the marginal cost of an extra win above average from analytics, coaching, scouting, etc. compared to from players? I'd think this marginal cost is lower for these management contributions from the best teams than for players. I am not sure sure how big the management impacts are or could be but as noted earlier they can contribute to player level APM (in a model where management is not present as a distinct contributor).
Last edited by Crow on Tue Oct 04, 2011 7:35 pm, edited 3 times in total.
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Will Success Come From Better Data?

Post by schtevie »

Hoisted from another string...
J.E. wrote:I agree that more teams should play their top unit more often and I think it would make sense for almost everybody to play your best unit at least 5-6 minutes a game: the last minutes in the 4th quarter.....Sometimes though it seems a team's main goal doesn't seem to be "winning"....
This comment speaks to the potential for analyzed data to be brought productively to bear on a team's competitiveness (my interpretation, let it be very clear, and not Jeremias'). I have written before on the apparent mystery, where highly productive starting units are seemingly inexplicably yanked from the floor short of their potential contributions.

Consider (again) the Celtics late starting lineup, of Perkins, Rondo, Garnett, Allen, and Pierce. This was surely the NBA's best, over its three plus year run, and, unsurprisingly, the one allocated most minutes.

However, taking 2010 (the last full year of the regime), this lineup, with an Overall Rating of 13.53, still accounted for only 37.5% of potential possessions. (I have the line-up having been available for 65 games and assume that these were typical of the season, overall possession-wise). Can this be construed as a reasonable line-up allocation? To believe so strains credulity.

In 2010, the starting line-up is all that separated the Cs from strict mediocrity. All other line-ups (starting players included) had an Overall Rating of 0.04. What serious argument is there whereby you remove one or more of these starters, sacrificing 13.5 points per 100 possessions, and expect to benefit on net? Match-up advantages? Huh? Eliminating match-up disadvantages? Um, we're talking about the starting line-up, here. If opposing teams had potential match-up advantages, they would be expected to be seen in the existing starting line-up data. Fatigue? We're talking about greater platooning, not butting up against any player's individual minute limit. Foul considerations? Again, there is a whole lot of slack for dealing with such considerations (real or imagined) if you are starting at a base of 37.5% of possessions. Am I missing a possible explanation? (Maybe, the trade-off wasn't so severe in the first half of the year when the line-up logged most of its possessions. But this would only lessen, not minimize, the counterfactual gains.)

If not, the most straightforward interpretation of the data is that the Celtics left a lot of points on the bench, when they pulled one or more of their starters off the floor.

This said, I come not to bury Doc, but to praise him.....conditionally. He played his starters a lot more than most and therefore could be commended on relative grounds. The problem is that we are talking about butting up against NBA norms here. He was, perhaps, in a way, almost, maybe, pushing the limits of coaching decorum (well, not quite) but not the limits of competitive advantage. On the floor, cheerleaders gotta cheer, and coaches gotta coach. That's why you hold the pompoms and wear the fancy suits. And bench coaching, inevitably, implies making substitutions.

And the moral of the story, regarding the potential utility of greater amounts of data, should be clear. Here we are talking about the most basic, unmanipulated data, data that are free, that all coaches have always had the ability to understand, showing a clear result (or at least a very interesting and potentially hugely valuable result that should at least merit aggressive experimentation) and.....nothing. No action. Ever. By anybody. In NBA history. (Can someone correct me on this point?)

So it goes. Just as it went with the sloooooooow adaptation of the 3 point shot. There the "data set" was also, essentially, free and conventional. All you needed to do was be able to multiply by 1.5 and realize that contested mid-range to long 2s were really, really terrible shots, and a really big competitive advantage was freely available for the first, aggressive actor. But that never came to be.

I digress.

But now we are potentially talking about something else. Acquiring data that won't be nearly as intuitive, likely won't show similarly strong results, and believing that some a credulous coaching staff (and players) will adopt whatever recommendations result.

We'll see. I really like J.E.'s formulation: Sometimes though it seems a team's main goal doesn't seem to be "winning".
Mike G
Posts: 6154
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Will Success Come From Better Data?

Post by Mike G »

schtevie wrote: Consider (again) the Celtics late starting lineup, of Perkins, Rondo, Garnett, Allen, and Pierce. This was surely the NBA's best, over its three plus year run, and, unsurprisingly, the one allocated most minutes.

However, taking 2010 (the last full year of the regime), this lineup, with an Overall Rating of 13.53, still accounted for only 37.5% of potential possessions....
Sorry if I'm dense, but you may assume others also aren't clear on this: What is Overall Rating? Link?
All other line-ups (starting players included) had an Overall Rating of 0.04. What serious argument is there whereby you remove one or more of these starters, sacrificing 13.5 points per 100 possessions, and expect to benefit on net?
Well, players have to rest. Would it be better to pull more than one or 2 starters, playing 3 to 5 subs? Such a lineup might get murdered beyond what the team could recoup.
Here we are talking about the most basic, unmanipulated data, data that are free, ... and.....nothing. No action. Ever. By anybody. In NBA history. (Can someone correct me on this point?)
Wow. I had assumed you were talking about play-by-play data and plus-minus stuff. And that's only been around a few years.
I'm obviously missing something; or your message is missing something.
Care to clarify?
Crow
Posts: 10565
Joined: Thu Apr 14, 2011 11:10 pm

Re: Will Success Come From Better Data?

Post by Crow »

Overall rating at basketball value is raw +/- per 48 minutes.

http://basketballvalue.com/teamunits.ph ... =2009-2010

Adjusted +/- has only been public for 8 years.
Overall rating or raw +/- per 48 minutes does not require specialized calculation as Adjusted +/- does but it has only been publicly available for 9 years, to my knowledge, at 82games. Whether any teams calculated raw +/- per 48 minutes for lineups before 2002-3 I don't know. I'd guess it was not common and probably rare if it occurred at all. So I'd tend to say raw +/- per 48 minutes for lineups or Overall rating has not been around forever (though it could have been).

But back to the meat of the topic.

I looked further at the 09-10 Celtics. 4 of the 5 best on APM and RAPM are pretty clear cut- Rondo, Allen, Pierce and Garnett. Center is a bit more difficult. R Wallace is better on 2010 single season RAPM and 2 year traditional APM at basketball value. Perkins is better on 1 year traditional APM and his lineup with all of the other best players (at +13 overall rating per 48 minutes) is far better than the same circumstance for the other 4 with Wallace (-6). One could wrestle with which to call the best center. Ordinarily I'd say Wallace based on 2 out of 3 on APM ratings used but given the huge gap in performance with the other 4 top guys and the large size of that lineup, I'll declare Perkins the best center for this study.

These 5 when all on the court together were +13.3 per 48 minutes, in 1154 minutes.

If you group all other lineups used over 25 minutes they played 1246 minutes at an average of +2.4 per 48 minutes.

Breaking it down, the lineups within this subset which had 4 of the top 5 on APM were much weaker. At an average of -1.3 per 48 minutes in about 720 minutes. Perkins seems like he might not have always been the best center but Wallace with the other 4 top guys was bad. The lineup with Perkins and Davis and 3 other top guys did terrible (-38 per 48 minutes in 47 minutes) Davis-Perkins may have been a bad pair. It was one of the Celtics' worst pairs in 08-09. I don't have access to the 09-10 data without more manual compilation.

The Celtics used 9 of the I think 10 combinations of 4 of the five top guys each over 25 minutes. (Small yay. Only 19 lineups got 25+ minutes the whole season. 6 of these 9 were positive, 3 negative. Only 2 were great (over +10 per 48 minutes) with the other one besides the starting lineup being the other 4 top APM guys with M. Daniels instead of Pierce.

When the Celtics' rolled with 3 of the 5 top guys (with Perkins' status questionable) they did great. +11.9 per 48 minutes in 239 minutes.

The only lineup used over 25 minutes with 2 of the top 5 was +2.1 in 40 minutes.

Surprisingly the lineups used over 25 minutes with 1 or 0 of the top 5 were on average were +14.0 in 161 minutes.

All lineups used less than 25 minutes averaged -1.9 per 48 minutes in 1508 minutes.

For the Celtics the performance of lineups with 5, 4, 3, 2 to 1 or zero of the top 5 guys did not decline in an expected / consistent fashion. A couple lineups (principally the bad lineups for Perkins and Wallace) threw the averages off in the group with 4 of the top guys. The strong performance of lineups with 3 or less of the top 5 and over 25 minutes of use wasn't a huge deal but it was not trivial either. They account for 21% of the total team +/- edge for the season.

4 of the 5 best on the court was not a simple and reliable formula for performance anything near what the 5 did together for the Celtics. This is probably an unusual case but goes to show that analysis may be a worthwhile endeavor. Emphasizing 5 vs maximizing the minutes with 4 are different paths with potentially significantly different results as was the case here. Take those 2 worst performing lineups with 4 top guys and Wallace or Davis out and the remaining 7 lineups with 4 top guys return to a better +3.8 per 48 minutes in 490 minutes but that is still somewhat disappointing compared to what the 5 together did.

In their 19 lineups used over 25 minutes the Celtics average use of 4.0 of their top 5 guys per lineup. They used 87% of these players minutes. So there wasn't much left for the dink lineups under 25 minutes and the choice was mainly about how to use them in those lineups used over 25 minutes, in combinations of 5 vs 4 vs 3 vs 2 or 1 or zero top guys on the floor.

This is just the start of the appropriate lineup analysis. You'd want to at least look at what the Adjusted +/- lineup data says compared to the raw +/- data and you'd want to look at both sets of data month to month as well to see if management and the Coach were getting better, staying the same or getting worse in average overall performance and see how the subgroup performance minutes per game and performance per 48 minutes changed month to month.

From where things stand at the moment in the analysis so far the Celtics were right to emphasize playing the starting 5 a lot and probably should have done so even more. The raw +/- data suggests it would have been better to have more minutes with all 5 of the top guys out there together even as the cost of less time with 4 out there because with just 4 out there the return was not as good as might have been expected and the return with 3 or less but used over 25 minutes was quite encouraging. Of course it is not that simple, increasing minutes could change the raw +/- results if the opponent quality in the additional minutes and the situation shifts. The dink lineups used under 25 minutes were a fairly significant source of drag, pulling team edge down by about 16%. They probably should have been reduced or perhaps the guessing or analysis about which ones to use or not could have been better (or not, that is a tough assignment).

Move to looking at Adjusted +/- performance the group with just 4 of the top 5 does worse. On average -4.6 per 48 minutes for all lineups and 0.7 when the 2 worst are removed. The group with 3 of the top 5 on the court isn't as strong on Adjusted +/-, just +6.2 instead of +11.9 but it was still very good. The group with 1 or 0 of the top 5 on the court isn't as strong on Adjusted +/-, just +4.7 instead of +14.0 but it was still good and much better than might be expected from simple reasoning. So by raw and Adjusted +/- they probably should have emphasized all 5 together even more. I believe I made that point back regarding the Celtics in 2009 and / or 2010 but now I have a more detailed analysis to support that view. Others have made this point recently and in the past as well.
Last edited by Crow on Fri Sep 16, 2011 1:29 am, edited 5 times in total.
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Will Success Come From Better Data?

Post by schtevie »

Mike,

As Crow says, the Overall Rating is basketballvalue nomenclature. However, it is a per (100) possession measure, not per 48 minute.

Regarding resting being the explanation of lack of platooning, be careful. What exactly is the argument? Every one of the starters in the Celtics example played a lot more minutes than the starting line-up did. The issue is whether the starters' rest overlaps, or not.

Then, it is possible that the resulting line-ups featuring no starters whatsoever (which would necessarily be but a few minutes per game) would be "murdered", but don't forget the quid pro quo: the opponents' sub line-ups playing against the extended minutes of the full starting line-up, would also - on the same terms - get "murdered". The zero sum nature of these things must always be at the forefront of one's thinking.

Finally, my historical query was a bit imprecise. I know that formal lineup data (with +/-) doesn't go back deep into history. My question was more impressionistic, whether there was ever a team rumored to have given its starting five major running time. There would have to have been a clear superiority of the starting five over the rest and coaching innovation. Ring a bell of any team and time?
Crow
Posts: 10565
Joined: Thu Apr 14, 2011 11:10 pm

Re: Will Success Come From Better Data?

Post by Crow »

"As Crow says, the Overall Rating is basketballvalue nomenclature. However, it is a per (100) possession measure, not per 48 minute."

Good point. My numbers above are therefore a bit off in accuracy (reduce the overall ratings reported by about 8%) but the big picture story is the same.
Crow
Posts: 10565
Joined: Thu Apr 14, 2011 11:10 pm

Re: Will Success Come From Better Data?

Post by Crow »

I don't think I will attempt to do a detailed analysis from a multi-season perspective for the Celtics right now but I will note that the Celtics starting lineup was a much stronger performer in 2007-8 at +19.5 per 100 possessions on raw team +/-compared to +11.1 in 2008-9, +13.3 in 2009-10 and 13.5 in 2010-11. The traditional Adjusted +/- was also strongest in 2007-8. The amount of minutes use was a bit higher in 09-10 but that wasn't enough to match the reduction in effectiveness. Did the league learn to adjust better to that lineup after its blaze to glory in its first season? Maybe or maybe something changed it how that lineup operated in certain ways or aging took its toil or effort changed or statistical noise is involved or playoff match-ups weren't as favorable or some combination. One could try to consult the video from the first season to the later ones, compare and try to find / refine answers.


The Lakers' starting lineup were best in 2008-9 in their first season and first title and then fell back 4-5 points the next years. Still good enough overall to win in 2009-10 but not in 2010-11. A check of the detail (aka analysis) could reveal significant insights just as the look at the 2007-8 Celtics did.

The Spurs changed their top lineup several times for the recent titles. The 2004-5 title team's top lineup was the same as first used in 2003-4 but it performed much better in . It slipped in 2005-6 though, so then it was changed. The 2006-7 most used lineup with Oberto instead of Nesterovic did about the same as the most used 2005-6 lineup but something else changed.

I'd like to track multi-season effectiveness of other contender most used lineups over several seasons of rise and fall, time and data permitting.

The Cavs' most used lineup went from moderately negative on raw +/- in 2005-6 to barely above average in 2006-7 to a nice +8.4 per 100 possessions in 2007-8 but on very low usage (just 215 minutes). 2008-9 had a great most used lineup (+19.5) but only used under 500 minutes. Thru those years the trend was in the right direction but in 2009-10 the most used lineup went way backwards to terrible at -5.9 in thankfully modest minutes. The 2nd and 3rd best lineups were very good (one with Jamison, one without) but they probably should have gotten more minutes than the most used lineup. If Brown and the Cavs had gotten the right top lineup at more highly elevated minutes that might have made a difference in those playoffs and the following "Decision". What will happen with the Lakers, overall and with regard to top lineups, with Mike Brown in charge and Brown operating without the input of the Cavs' analytic support but with his Coaching consultant from overseas and whatever analytic support the Lakers utilize (or not) next season?


The very top lineup made the most difference to Boston's 2008 success. For other years the most used lineups got less minutes and the top 3-5 lineups may be more important than the very top lineup by itself. That was the case with Dallas this season. Miami had the big advantage on official biggest regular season lineup this season but it is a closer race between Dallas and Miami for the lineups that should be the most used in the future. Chicago and OKC most used lineups were significantly behind Miami's most used. They have other candidates that would be competitive with Dallas' and Miami's best but will they increase use of them a lot and / or decrease the current most used lineup? The race for the next title may be affected somewhat by which team really emphasizes their best 1, 3 or 5 lineups.
Mike G
Posts: 6154
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Will Success Come From Better Data?

Post by Mike G »

schtevie wrote: Regarding resting being the explanation of lack of platooning... Every one of the starters in the Celtics example played a lot more minutes than the starting line-up did. The issue is whether the starters' rest overlaps, or not.

Then, it is possible that the resulting line-ups featuring no starters whatsoever (which would necessarily be but a few minutes per game) would be "murdered", but don't forget the quid pro quo: the opponents' sub line-ups playing against the extended minutes of the full starting line-up, would also - on the same terms - get "murdered"....
Well, I'm seeing this word platooning and thinking you may have assigned to it a definition unlike others I can find. You mean it should be an advantage for a team to substitute en masse, resting their entire starting 5, while 5 subs hold the fort?

What does Doc do, then, when Perkins has picked up his 2nd foul just 5 minutes in? Go until he fouls out, or yank the whole unit?

This is a simplistic example, but it's quite plausible that every coach, all through the ages, has had the ideal of using his starters as much as possible, until it becomes clear that virtually every night there will arise a problem of this nature.
A defensive substitution must be made. A player twists his ankle. A player just isn't sharp.

Flexibility may be a coach's best friend. A team ideally has some depth, and generally the 6th man is considerably better than the 10th man. How does he get more minutes?

The norm for substitution is of course for the starters to take turns catching their breath, while other starters are leading as many as 4 subs. Five subs is considered tight-rope walking; it's done so little that sample sizes may not tell us much.

If APM, RAPM, multi-year variants, and etc have recently reached a great level of agreement with one another, that would be news to me. Last time I checked, there were so many wild disagreements -- both with one another and with what I think I can see -- that it still seems very much a work in progress.

So, far from being hard knowledge available since the beginning of time, I'd say it may represent knowledge to someone in the future, and that's being optimistic.
whether there was ever a team rumored to have given its starting five major running time. There would have to have been a clear superiority of the starting five over the rest and coaching innovation. Ring a bell of any team and time?
In the '88 playoffs, the Celts used their starters 37-45 mpg each -- No one off the bench got more than 11 -- for 17 games.
And there were quite a few lopsided games in the early rounds (NY, Atl). Likely the big 5 were in all but a few mpg vs Det.
But as I recall, this was necessity and not being innovative. The wearing down of their aging core was what doomed them from repeating what they'd done when they'd had more of a bench.
Mike G
Posts: 6154
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Will Success Come From Better Data?

Post by Mike G »

Speaking of playoffs, I see this year the Celts' starters (w Jermaine) led all playoff units in playing 38% of their minutes (167/437). In fact, Jermaine played 197 minutes total, so 85% of his minutes were alongside all 4 of the other starters.
http://basketballvalue.com/teamunits.ph ... 20playoffs

In the regular season, this unit had 127 minutes, at +7 AdjPM. This +7 was continued in the playoffs.

Season best unit -- sub Davis for Jermaine -- was 2nd most used in playoffs, at 41 minutes (9%). They did not replicate their season +16 APM, falling to +8 in playoffs.

No other unit went even 20 minutes, and they all (54% of total) sucked big-time. Except for Green in the 5th slot, but for only 8 min (2%).
schtevie
Posts: 377
Joined: Thu Apr 14, 2011 11:24 pm

Re: Will Success Come From Better Data?

Post by schtevie »

Mike G wrote:Well, I'm seeing this word platooning and thinking you may have assigned to it a definition unlike others I can find. You mean it should be an advantage for a team to substitute en masse, resting their entire starting 5, while 5 subs hold the fort?
I think I'm using the term platooning more or less correctly. What I am saying is that there is suggestive evidence that starters as a unit are more productive than when broken into pieces and parts. Hence they should play more as a unit. As there are only so many minutes in a game, and if you assume fixed total playing time for each individual starter, this implies more minutes for line-ups with no starters.
Mike G wrote:What does Doc do, then, when Perkins has picked up his 2nd foul just 5 minutes in? Go until he fouls out, or yank the whole unit?
According to the argument, assuming that fouls aren't positively correlated (and that fouls and fouler effort aren't negatively correlated - on account of fear of getting yanked) there would be no reason to pull your best lineup. Because, well, you would be playing inefficiently. By definition. If it so happens that fouls are positively correlated AND the expected resulting reduced minutes of the starting lineup would be worse than the alternative, you would then yank the player in question and do what's best for the team conditional upon that information.
Mike G wrote:This is a simplistic example, but it's quite plausible that every coach, all through the ages, has had the ideal of using his starters as much as possible, until it becomes clear that virtually every night there will arise a problem of this nature.
You list exceptions that might merit the observed level of tinkering. This however begs the question. I think the observed "rule" of line-up tinkering has little to do with these. My guess is that it is overwhelmingly the dead hand of tradition.

Just throwing an idea out there, why don't hockey coaches mess around with line-ups to the same degree (or do they)? To my understanding, in that sport, the coach knows who his "starters" are. They are the best players that play best together (and so on for the second line, the third line, and so on). He then maximizes their playing time, with the constraint being physical exhaustion.
Mike G wrote:A defensive substitution must be made. A player twists his ankle. A player just isn't sharp.
OK. I'm not talking about injuries here. But your first suggestion puts us right down the rabbit hole.

What does "a defensive substitution must be made" mean? Let's start from the start, with starters playing starters. An opposing coach then subs in a player that allegedly creates a "mismatch" to his team's offensive advantage, an advantage that curiously wasn't perceived when the game began. The question then is: what is a coach to do in response? My point is that the answer is likely to be to do nothing at all. Stay seated. Resemble a potted plant. If the coach has done his job, the best players who play best together (on both ends of the court) have been identified and are already on the court. An "offensive mismatch" for one team, by definition, has as its counterpart a "defensive mismatch" for the other team, and the baseline for all this is overall ability.
Mike G wrote:Flexibility may be a coach's best friend. A team ideally has some depth, and generally the 6th man is considerably better than the 10th man. How does he get more minutes?
I disagree. Flexibility is the coach's worst enemy. It creates the insatiable urge to tinker.
Mike G wrote:The norm for substitution is of course for the starters to take turns catching their breath, while other starters are leading as many as 4 subs. Five subs is considered tight-rope walking; it's done so little that sample sizes may not tell us much.
I recognize the norm, but I am suggesting that it likely isn't quite optimal. There are, what, six timeouts in a game? Then three quarter breaks. And multiple stoppages. Basketball isn't a non-stop, endurance sport. There is a lot of room to maneuver here. Rest isn't the issue.
Mike G wrote:If APM, RAPM, multi-year variants, and etc have recently reached a great level of agreement with one another, that would be news to me. Last time I checked, there were so many wild disagreements -- both with one another and with what I think I can see -- that it still seems very much a work in progress.
The point I was trying to make is that this is an example where there is compelling, apparently actionable evidence, not requiring any fancy, pin-headed, mathematical interlocution. The unvarnished C's late starting lineup advantage was 19 points the first year, 11 the next, then 13.5 and the same again for the pre-trade partial year. Championship stuff.

More generally, it shouldn't be (and typically isn't) terribly controversial what the best lineup(s) is (are) on any given team (plural in the sense that, for some teams, a few might be correctly perceived as of approximately equal strength). And these lineups are underplayed as units.
Post Reply