Will Success Come From Better Data?
Posted: Fri Sep 02, 2011 8:26 pm
Appropriate apologies for starting a new string on Daryl's recently cited 800 words in the HBR. (http://blogs.hbr.org/cs/2011/08/success ... _data.html) I do so because what struck me as interesting was not the part of the argument others have focused on (that unindividuated analysts in perfectly elastic supply bring no particular competitive value - what is actually true, by definition). My quarrel is with the premise itself, that better data augurs success.
Why should we believe this to be true? First, in an absolute sense, and second, in an economic sense. Is it reasonable to believe that the potential gains will be worth the costs associated with gathering and utilizing such information?
To begin, let's ponder the contours of an apparent I-45 consensus. Both the Mavs and the Rockets, two clubs at the forefront of the analytic revolution in basketball, agree that acquiring new data is critical for future competitive advantage. Past this point, opinions appear to diverge as to where the green pastures are likely to be found. In his article, Daryl makes reference to potential gains from systematically tracking on-court activity (defensive challenges being the unconventional, potential stat cited). By contrast, Mark Cuban has moved on - way on - apparently believing little to no advantage can be gained by acquiring such data. You listen to him speak at the Sloan Conference and he very clearly states that he is hoping that the NBA socializes on-court information-gathering. And this from the guy who owns Synergy and has a comparative advantage in data acquisition! For him, success will come from other types of data. Psychological? Biometric? Those focused on highschoolers and foreigners? Who knows?
The point I am raising is whether it is reasonable to believe that there is a serious upside from investing much more money in any of this. Consider the following question to calibrate expectations: do you believe that future, competitive gains from analytics will be bigger or smaller than those that have already been realized? I like to think that reasonable (wo)men would agree that "smaller" is, clearly, the correct answer. There are two, basic reasons why this is so.
First, the primary, competitive edge realized to date has been the advantage early adapters have taken of "traditional" franchises. And this has been large. Evidence? Thank you Kevin McHale for either not having availed yourself of or dismissed the information provided by defensive APM! And more generally, every year Mark Cuban appears at MIT and has a chuckle, expressing his glee when playing teams that clearly have no idea how to maximize performance using (what is public) line-up data. "Getting excited" is the term I recall him having used last year, or the year before. And it's not only Mark, other franchise reps can be seen nodding along on stage (and I presume off). The point is that this low-hanging analytical fruit will, inevitably be picked by the laggards, at which point all the infra-marginal expenditures on analytics lose all return. This is a zero-sum game and league, after all.
And then the second reason to believe that the future will not be as glorious as the past is the simple, generic observation that one should expect diminishing marginal returns on future investment. If there is no holy grail in analyzing box score stats, do we think there will be one found from testing players' psychological fitness, or anything else? At the end of the day, the realities of the box score and the rules of the game impose strict limits on potential improvement. That isn't going to change; it cannot change. Perhaps I have a failure of imagination on this account.
So far, so bad. But maybe past gains from using analytics have been really big, such that something smaller than that in future might still be kinda important? Indeed.
The question then is: what success has analytics brought to the early adapters? To provide a tentative estimate of an upper bound, I would argue that one should look to the Mavs - the one franchise that has been at it the longest (right?), that has integrated the approach throughout its operations, and that hasn't stinted in spending in support of the effort.
To begin, nothing but the highest praise for Mark Cuban. Having joined the Celtics and the Lakers as the only franchises to win 50 games for ten years (and now more) consecutively is a remarkable achievement. Looking at the last ten years, the Mavs have averaged 56.7 wins. Using the Pythagorean formula, this is equivalent to besting their opponents on average by 5.73 points per game. Pretty darn good. The question now is how much of that success is due to analytics versus other factors.
Well, factor number one that needs to be taken into account is inheriting Dirk Nowitzki. This is rather simply done. Taking the 10 year RAPM (http://stats-for-the-nba.appspot.com/ranking), assuming 91 possessions per game on average (about right), factoring in DN's actual minutes played per game, we can estimate that Dirk was responsible for 3.93 points of the 5.73. Then, using Eli W's APM numbers (http://www.countthebasket.com/blog/2008 ... lus-minus/) to debit the contribution of an average PF, playing similar minutes, what you get is that the Mavs, net of the contributions of Dirk (above the average PF), were basically 2 points (2.01) better than their opponents, on average, over the last decade.
So, a 2 point superiority is what remains to be explained owing to contributions from analytics and all other factors. And what are the latter?
Well, clearly, an important negative factor (correspondingly an augment to the contribution of analytics) is the drag of continued success on replenishing the talent pool through the draft. I don't really have a strong sense on what this amounts to, 1 point per year, perhaps? Estimates, anyone?
Against this is the fact that Mark Cuban hasn't been afraid to spend, spend, spend (Maverick payroll having never left the top five over the past decade, I believe.) Now, perhaps, informed by superior analysis, he has spent his player salary dollars more wisely than others. But this aside, clearly, part of the 2 point competitive edge that needs to be explained is done so by the total volume of salary expenditure. And, again, if someone has a good estimate of the competitive returns to salary expenditures, please divulge.
Pending such context, let's stipulate that the drag on success from drafting low is offset perfectly by "excess" expenditures, leaving us a decadal return of analytics of 2 points per game. Formidable still, no?
Well, yes, as a historical matter, but the issue again is whether it can be sustained. My hunch is that over the past ten years (this year's Finals included) Mark Cuban got excited a lot, watching opposing teams put in inappropriate lineups. And my guess is that this factor alone is a big chunk of the 2 points that needs to be explained. Probably, on average, over the past ten years there were but a handful of teams that weren't susceptible to such errors, and one doesn't tend to get excited about only an expected fractional point in a line-up mismatch.
So, going forward, my sense is that the upper bound for the returns on investing in data (and analysts) is really quite small. Remaining "traditionalists" will, before too long, get in the game, if only so that Mark Cuban will stop laughing at them. And when that happens, a lot of money will then be spent chasing, on average, what is a fractional point per game, and one that will continue to decrease as time passes.
Success?
Why should we believe this to be true? First, in an absolute sense, and second, in an economic sense. Is it reasonable to believe that the potential gains will be worth the costs associated with gathering and utilizing such information?
To begin, let's ponder the contours of an apparent I-45 consensus. Both the Mavs and the Rockets, two clubs at the forefront of the analytic revolution in basketball, agree that acquiring new data is critical for future competitive advantage. Past this point, opinions appear to diverge as to where the green pastures are likely to be found. In his article, Daryl makes reference to potential gains from systematically tracking on-court activity (defensive challenges being the unconventional, potential stat cited). By contrast, Mark Cuban has moved on - way on - apparently believing little to no advantage can be gained by acquiring such data. You listen to him speak at the Sloan Conference and he very clearly states that he is hoping that the NBA socializes on-court information-gathering. And this from the guy who owns Synergy and has a comparative advantage in data acquisition! For him, success will come from other types of data. Psychological? Biometric? Those focused on highschoolers and foreigners? Who knows?
The point I am raising is whether it is reasonable to believe that there is a serious upside from investing much more money in any of this. Consider the following question to calibrate expectations: do you believe that future, competitive gains from analytics will be bigger or smaller than those that have already been realized? I like to think that reasonable (wo)men would agree that "smaller" is, clearly, the correct answer. There are two, basic reasons why this is so.
First, the primary, competitive edge realized to date has been the advantage early adapters have taken of "traditional" franchises. And this has been large. Evidence? Thank you Kevin McHale for either not having availed yourself of or dismissed the information provided by defensive APM! And more generally, every year Mark Cuban appears at MIT and has a chuckle, expressing his glee when playing teams that clearly have no idea how to maximize performance using (what is public) line-up data. "Getting excited" is the term I recall him having used last year, or the year before. And it's not only Mark, other franchise reps can be seen nodding along on stage (and I presume off). The point is that this low-hanging analytical fruit will, inevitably be picked by the laggards, at which point all the infra-marginal expenditures on analytics lose all return. This is a zero-sum game and league, after all.
And then the second reason to believe that the future will not be as glorious as the past is the simple, generic observation that one should expect diminishing marginal returns on future investment. If there is no holy grail in analyzing box score stats, do we think there will be one found from testing players' psychological fitness, or anything else? At the end of the day, the realities of the box score and the rules of the game impose strict limits on potential improvement. That isn't going to change; it cannot change. Perhaps I have a failure of imagination on this account.
So far, so bad. But maybe past gains from using analytics have been really big, such that something smaller than that in future might still be kinda important? Indeed.
The question then is: what success has analytics brought to the early adapters? To provide a tentative estimate of an upper bound, I would argue that one should look to the Mavs - the one franchise that has been at it the longest (right?), that has integrated the approach throughout its operations, and that hasn't stinted in spending in support of the effort.
To begin, nothing but the highest praise for Mark Cuban. Having joined the Celtics and the Lakers as the only franchises to win 50 games for ten years (and now more) consecutively is a remarkable achievement. Looking at the last ten years, the Mavs have averaged 56.7 wins. Using the Pythagorean formula, this is equivalent to besting their opponents on average by 5.73 points per game. Pretty darn good. The question now is how much of that success is due to analytics versus other factors.
Well, factor number one that needs to be taken into account is inheriting Dirk Nowitzki. This is rather simply done. Taking the 10 year RAPM (http://stats-for-the-nba.appspot.com/ranking), assuming 91 possessions per game on average (about right), factoring in DN's actual minutes played per game, we can estimate that Dirk was responsible for 3.93 points of the 5.73. Then, using Eli W's APM numbers (http://www.countthebasket.com/blog/2008 ... lus-minus/) to debit the contribution of an average PF, playing similar minutes, what you get is that the Mavs, net of the contributions of Dirk (above the average PF), were basically 2 points (2.01) better than their opponents, on average, over the last decade.
So, a 2 point superiority is what remains to be explained owing to contributions from analytics and all other factors. And what are the latter?
Well, clearly, an important negative factor (correspondingly an augment to the contribution of analytics) is the drag of continued success on replenishing the talent pool through the draft. I don't really have a strong sense on what this amounts to, 1 point per year, perhaps? Estimates, anyone?
Against this is the fact that Mark Cuban hasn't been afraid to spend, spend, spend (Maverick payroll having never left the top five over the past decade, I believe.) Now, perhaps, informed by superior analysis, he has spent his player salary dollars more wisely than others. But this aside, clearly, part of the 2 point competitive edge that needs to be explained is done so by the total volume of salary expenditure. And, again, if someone has a good estimate of the competitive returns to salary expenditures, please divulge.
Pending such context, let's stipulate that the drag on success from drafting low is offset perfectly by "excess" expenditures, leaving us a decadal return of analytics of 2 points per game. Formidable still, no?
Well, yes, as a historical matter, but the issue again is whether it can be sustained. My hunch is that over the past ten years (this year's Finals included) Mark Cuban got excited a lot, watching opposing teams put in inappropriate lineups. And my guess is that this factor alone is a big chunk of the 2 points that needs to be explained. Probably, on average, over the past ten years there were but a handful of teams that weren't susceptible to such errors, and one doesn't tend to get excited about only an expected fractional point in a line-up mismatch.
So, going forward, my sense is that the upper bound for the returns on investing in data (and analysts) is really quite small. Remaining "traditionalists" will, before too long, get in the game, if only so that Mark Cuban will stop laughing at them. And when that happens, a lot of money will then be spent chasing, on average, what is a fractional point per game, and one that will continue to decrease as time passes.
Success?