Here's a first pass idea: pick two BPM values that you feel good about. Maybe the max in a typical year, and the number that around 100 guys get. You know the numbers better than I do, but let's say 10 and 0 for the sake of argument.DSMok1 wrote:Trying to figure out how to do this. Right now I'm investigating what kind of curves fit well the right tail of the actual NBA talent pool, with the assumption that this right tail is also that of the general populace.xkonk wrote:If you think it's a gamma, can't you pull some random samples and compare them to actual NBA ratings to see what looks close? Then you'll have the distribution and will know the asymptote.
Then I would crank up R or whatever you want to use to generate random values. You mentioned a gamma function, so let's start with a gamma with shape=1 and rate=1. We want to sample a big value so that we feel good about the extreme tail and can match it up with that max player. For 'reality', maybe we'd pick a number that reflects the actual number of potential NBA players. I also have no idea what that is, but maybe you could google some census numbers for the number of 18-35 year old males in the world. Let's say it's 3 million, which is too small but won't keep R running longer than I want it to right now.
If I run test = rgamma(3e6, shape=1, rate=1) and then data.frame(count=hist(test)$counts,value=hist(test)$mids), I'll generate a bunch of potential BPM scores and then a data frame (mostly to line up the numbers nicely) with how many people fall into each bin and the midpoint value of that bin. For my run, I get 2 in the 13.75+ bin, 2 in a 13.25 bin, and more going lower.
The first thing I need to do is move the max back to 10; since gamma can only be positive but BPM obviously goes negative, this is fine because I need to adjust the axis. In my case, I'll subtract about 4 because the max was 13.9. The 20th bin that R made for me has 88 people, which I'll call close enough for now. The bin value there is 9.75, which gets adjusted down to 4.75. That's too high; it should be 0 to match observed BPM values. Not surprising since subtracting 4 from the gamma function predicts no one under -4, and we've observed worse in the NBA. So now I'd try a different shape and rate and go through the process again.
Does that seem like the way to do it? You might need to pick more than two values to compare, since we're talking about a curve instead of a line, and you might want to play with bin defaults and whatnot to make comparisons easier or more directly in line with your BPM data. Or, at least if you're using R specifically, you can look into functions that directly fit different models to BPM data. Here's one potentially relevant stack overflow question: http://stackoverflow.com/questions/1419 ... cific-data . Presumably if you had an equation, you could figure out the asymptote.