p-hacking
Re: p-hacking
Finding p less than .05 one year then dismissing it and quitting after checking just one previous year and not giving the value is not rigorous enough. It is almost as bad at stopping at one data point.
Re: p-hacking
Data mining like that seems like a pretty good way to produce hypothesis. Discoveries can happen 'by accident' too.
Re: p-hacking
The article is right in that doing what he described is poor statistical practice. It basically invalidates the p value, so if that's your basis for deciding what's "real" or important, you're in bad shape. But Nate and Crow are right that it could be the start of a more rigorous analysis that could lead to something interesting. Perhaps the more important thing to take away is that if you start by p-hacking, you don't necessarily have a great chance at finding the pattern again in out-of-sample testing.
Re: p-hacking
I'd make an even stronger statement: the researcher will almost always find much less statistical significance in out-of-sample testing, and indeed there's good evidence that chances are the follow-up research will fail to find statistical significance.xkonk wrote:Perhaps the more important thing to take away is that if you start by p-hacking, you don't necessarily have a great chance at finding the pattern again in out-of-sample testing.
Theoretical example: if you hunt for a coin that will give you five heads in a row (about a 3% probability), you will eventually find one, without too much work or trouble. But I guarantee that when you take that magic coin and flip it five more times, your chance of getting five heads will be very small (with fair coins, it's the same approximately 3% as your original significance level). Your research result will not be reproducible.
The evidence: 60% of recent psychology research results were not reproducible, according to this famous study from a couple of years ago. This is not because psychologists are especially incompetent or prone to take shortcuts in their research -- in fact it's a sign of their ability to self-criticize that they're the only field that I know of that made a major study like this. The same problem of p-hacking or data fishing occurs in all social sciences (and can occur in the natural sciences too). It's been an open dirty secret for decades, but the authors of this study were the first ones to systematically measure how badly this causes research results to be unreliable.
Re: p-hacking
Sure. Assuming the coin is fair is begging the question though....
Theoretical example: if you hunt for a coin that will give you five heads in a row (about a 3% probability), you will eventually find one, without too much work or trouble. But I guarantee that when you take that magic coin and flip it five more times, your chance of getting five heads will be very small (with fair coins, it's the same approximately 3% as your original significance level). Your research result will not be reproducible.
...
The point that p-hacking leads to misleading results is valid. More generally, it's also true that statistics is abused and mispresented in many other ways. P-hacking is the bete noir du jour, but nobody seems to care about the conflation of 'margin of error' with 'confidence interval', and then there's the whole 'explanatory stats' and trivia thing.
Hmm... do you have a p-value for that?I'd make an even stronger statement: the researcher will almost always find much less statistical significance in out-of-sample testing, and indeed there's good evidence that chances are the follow-up research will fail to find statistical significance.
The thing is, people should be leery of statistical results in general. It's not like p-hacking has to be an individual act: If we have a bunch of researchers independently checking similar hypotheses, then we expect 1 in 20 of them to get a p value of 0.05 or less by accident, right?
Re: p-hacking
This is certainly true, but I tried to be careful to say 'pattern' instead of 'statistical significance'. Number one, depending on how much data diving/p-hacking you did it could be just as likely that you find the opposite result in a new sample, let alone statistically significant in the same direction. Number two, we could have an entire separate discussion about if something so applied as the NBA (or anyone for that matter) should care about statistical significance as opposed to practical significance.mtamada wrote:I'd make an even stronger statement: the researcher will almost always find much less statistical significance in out-of-sample testing, and indeed there's good evidence that chances are the follow-up research will fail to find statistical significance.xkonk wrote:Perhaps the more important thing to take away is that if you start by p-hacking, you don't necessarily have a great chance at finding the pattern again in out-of-sample testing.
Under some circumstances I could envision this being true, but certainly not if the researchers were using different data sets or if the effect was clearly true/significant. Here's an example that people might find interesting: http://andrewgelman.com/2015/01/27/crow ... d-players/Nate wrote: If we have a bunch of researchers independently checking similar hypotheses, then we expect 1 in 20 of them to get a p value of 0.05 or less by accident, right?
Re: p-hacking
Do you know what the p-value means?xkonk wrote:...
Under some circumstances I could envision this being true, but certainly not if the researchers were using different data sets or if the effect was clearly true/significant. ...Nate wrote: If we have a bunch of researchers independently checking similar hypotheses, then we expect 1 in 20 of them to get a p value of 0.05 or less by accident, right?
Re: p-hacking
This is why when considering a lot of variables and possible interaction variables then application of AIC or cross-validation is far more valuable for discerning actual value.
Re: p-hacking
Yeah, I'm pretty familiar. It's P(data | null hypothesis is true). In the example of the original article in the thread, the null for each of his correlation tests would be that the correlation is 0. But if there's an actual effect then one would hope that more than 1 in 20 researchers would find p<.05. Even if the null were true, the particulars of any data set and how the researchers decide to test a hypothesis could affect if the p value reflects what it's supposed to.Nate wrote:Do you know what the p-value means?xkonk wrote:...
Under some circumstances I could envision this being true, but certainly not if the researchers were using different data sets or if the effect was clearly true/significant. ...Nate wrote: If we have a bunch of researchers independently checking similar hypotheses, then we expect 1 in 20 of them to get a p value of 0.05 or less by accident, right?
Did I pass?
Re: p-hacking
Sure, you pass.xkonk wrote:...
Yeah, I'm pretty familiar. It's P(data | null hypothesis is true). In the example of the original article in the thread, the null for each of his correlation tests would be that the correlation is 0. But if there's an actual effect then one would hope that more than 1 in 20 researchers would find p<.05. Even if the null were true, the particulars of any data set and how the researchers decide to test a hypothesis could affect if the p value reflects what it's supposed to.
Did I pass?
So how would having independent data sets reduce the chance of an accidental p<0.05 result on an individual trial?
Re: p-hacking
Hi everyone!
I'm super flattered by all of you taking an interest in my piece - thanks! Your posts and thoughts have given me quite a bit to think about. I'm still learning and while I don't pretend to have done a perfect job (either in the analysis or the explanation), I'd like to explain a little further about my thought processes that I didn't get into in the original piece (I feel like I'm among kindred analytical spirits here as opposed to a general basketball audience).
To Nate's first point, yes, discoveries can certainly happen by accident - but, in this context, I think the best (or at least a better) practice would be to re-evaluate out of sample to verify. (I kind of shot myself in the foot in this regard because I used all the available Synergy data in the first pass
)
To Crow's (and xkonk's) point about this being the starting point for a more rigorous analysis, I totally agree in the general case - out-of-sample testing is where I would start! However, in this case, the purpose of this analysis was really to say, "This is an example of a sort of statistical analysis that, in its undeveloped and flawed form, a general layperson might believe -- let's try to guard against that a little." I suppose I could've included in the piece potential next steps to make this into a meaningful analysis.
Nate, re: your second post, are you saying there's applicability of "conflation of 'margin of error' with 'confidence interval', and then there's the whole 'explanatory stats' and trivia thing" to this particular analysis? I'm curious to hear your thoughts! Also, you're pretty right that p-hacking is a very common (approaching cliched) topic among stats-inclined folks, but I don't think it's quite as well-known in the general public, maybe even less so among basketball fans. I think/hope that introducing/re-introducing this idea to a more general audience, even in this limited scope and fairly simplified form, is worthwhile.
To xkonk's last point, this is exactly what I was thinking - my prior belief is that no single play type correlates with team quality. To Nate's response, by "independent data sets" in this context, do you mean just separate partitions of the larger data set? Otherwise, I don't think you can have truly independent data sets that describe these offensive play type distributions, though I might be missing something!
Again, thanks to all for your interest in my piece! I'm still in the nascent stages of doing sports analytics work, so I'm certainly open to any additional suggestions, comments, criticisms, etc.
Thanks,
Ryan
I'm super flattered by all of you taking an interest in my piece - thanks! Your posts and thoughts have given me quite a bit to think about. I'm still learning and while I don't pretend to have done a perfect job (either in the analysis or the explanation), I'd like to explain a little further about my thought processes that I didn't get into in the original piece (I feel like I'm among kindred analytical spirits here as opposed to a general basketball audience).
To Nate's first point, yes, discoveries can certainly happen by accident - but, in this context, I think the best (or at least a better) practice would be to re-evaluate out of sample to verify. (I kind of shot myself in the foot in this regard because I used all the available Synergy data in the first pass

To Crow's (and xkonk's) point about this being the starting point for a more rigorous analysis, I totally agree in the general case - out-of-sample testing is where I would start! However, in this case, the purpose of this analysis was really to say, "This is an example of a sort of statistical analysis that, in its undeveloped and flawed form, a general layperson might believe -- let's try to guard against that a little." I suppose I could've included in the piece potential next steps to make this into a meaningful analysis.
Nate, re: your second post, are you saying there's applicability of "conflation of 'margin of error' with 'confidence interval', and then there's the whole 'explanatory stats' and trivia thing" to this particular analysis? I'm curious to hear your thoughts! Also, you're pretty right that p-hacking is a very common (approaching cliched) topic among stats-inclined folks, but I don't think it's quite as well-known in the general public, maybe even less so among basketball fans. I think/hope that introducing/re-introducing this idea to a more general audience, even in this limited scope and fairly simplified form, is worthwhile.
To xkonk's last point, this is exactly what I was thinking - my prior belief is that no single play type correlates with team quality. To Nate's response, by "independent data sets" in this context, do you mean just separate partitions of the larger data set? Otherwise, I don't think you can have truly independent data sets that describe these offensive play type distributions, though I might be missing something!
Again, thanks to all for your interest in my piece! I'm still in the nascent stages of doing sports analytics work, so I'm certainly open to any additional suggestions, comments, criticisms, etc.
Thanks,
Ryan
Re: p-hacking
Good response post. Thanks for dropping by. Will watch / listen for more.
Re: p-hacking
I don't think it would on an individual test per se, but if your results differed across sets or you used an independent set for out-of-sample testing and noticed a big drop in accuracy/fit you would realize that your results are probably not so significant.Nate wrote: So how would having independent data sets reduce the chance of an accidental p<0.05 result on an individual trial?
Re: p-hacking
Thanks for coming by and reading the thread. I don't think those issues are particularly apropos to a discussion about p-hacking in any technical way, but they're other issues with how statistics are presented to the public.ryanchen wrote:...
Nate, re: your second post, are you saying there's applicability of "conflation of 'margin of error' with 'confidence interval', and then there's the whole 'explanatory stats' and trivia thing" to this particular analysis? ...
...