Page 1 of 1

APBR Retrodiction Challenge (Ilardi, 2009)

Posted: Fri Apr 15, 2011 1:08 am
by Crow
recovered page 1 of 5

Author Message
Ilardi



Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 9:46 am Post subject: APBR Retrodiction Challenge Reply with quote
As several have noted in this forum in recent months, retrodiction can be quite useful as a means of evaluating the utility of player metrics. However, to my knowledge, there has never been a systematic comparison of the relative retrodictive performance of the various widely used "omnibus" metrics - e.g., Win Shares, Wins Produced, APM, SPM, PER, etc.

Accordingly, I thought it might be informative (and fun) to see if a few interested parties might want to collaborate on such an undertaking under a shared set of guidelines for generating retrodictive estimates for each measure. Although it would ultimately be ideal to conduct the investigation across several seasons' worth of data, I'd like to suggest a more modest "proof of concept" investigation to begin with: retrodiction of team performance (net efficiency) during the 2008-2009 season.

Following are some basic guidelines (and I'm open to friendly amendments on any of them):

1) Only data on each metric from prior seasons (i.e., up through 2007-2008) can be used;

2) Each player's actual minutes from 2008-2009 will be used;

3) Systematic age-adjustments to each metric are permitted in projecting each player's 08-09 values;

4) An average rookie metric value will be used for all rookies (a similar procedure will be employed for all players who logged minimal minutes prior to the 08-09 season);

5) Team-by-team estimates of net efficiency (pts scored/100 poss - pts allowed/100 poss) will be generated on the basis of aggregated teamwise projected metric values. For metrics that generate estimated wins or raw point differentials, appropriate conversions will be used to derive each team's net efficiency value. (Secondary analyses can also look at wins as an outcome variable of interest, to see if choice of DV makes any difference.)

6) Finally, each metric will be evaluated on the basis of the mean observed difference (absolute value) between each team's actual and projected efficiency during the 2008-2009 season.


If there is sufficient interest, I will be happy to supply relatively low-noise APM estimates for the Retrodiction Challenge. (Since the ultimate focus of the exercise is net efficiency, I'll need to re-run my APM model to generate direct estimates of Total APM rather than deriving them indirectly on the basis of Offensive and Defensive APM, as the former method generates lower-noise Total APM estimates.)

Last edited by Ilardi on Fri Aug 28, 2009 10:03 am; edited 1 time in total
Back to top
View user's profile Send private message
Neil Paine



Joined: 13 Oct 2005
Posts: 774
Location: Atlanta, GA

PostPosted: Fri Aug 28, 2009 9:53 am Post subject: Reply with quote
http://www.basketball-reference.com/blog/?p=2264
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 10:03 am Post subject: Reply with quote
davis21wylie2121 wrote:
http://www.basketball-reference.com/blog/?p=2264


So, we have 1 potential entry (SPM), although it wasn't clear to me (on cursory read) what default value was used for rookies.
Back to top
View user's profile Send private message
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 10:09 am Post subject: Reply with quote
I'm up for this.

"4) An average rookie metric value will be used for all rookies (a similar procedure will be employed for all players who logged minimal minutes prior to the 08-09 season); "

I'd like to be able to estimate rookies performance based on draft position (and possibly other factors such as ht/wt/position), would this be allowed?
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 10:32 am Post subject: Reply with quote
battaile wrote:
I'm up for this.

"4) An average rookie metric value will be used for all rookies (a similar procedure will be employed for all players who logged minimal minutes prior to the 08-09 season); "

I'd like to be able to estimate rookies performance based on draft position (and possibly other factors such as ht/wt/position), would this be allowed?


I think this would only make sense if we could find an agreed-upon, systematic method of doing so for each metric . . .
Back to top
View user's profile Send private message
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 10:35 am Post subject: Reply with quote
Ilardi wrote:
battaile wrote:I'm up for this.

"4) An average rookie metric value will be used for all rookies (a similar procedure will be employed for all players who logged minimal minutes prior to the 08-09 season); "

I'd like to be able to estimate rookies performance based on draft position (and possibly other factors such as ht/wt/position), would this be allowed?
I think this would only make sense if we could find an agreed-upon, systematic method of doing so for each metric . . .
Ah, I was thinking it'd be part of what differentiated the challenge entries. What will differentiate them, just the aging adjustment and how you weight prior seasons?
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 10:37 am Post subject: Reply with quote
But then we might inadvertently conflate metric retrodictive performance with an individual modeler's skill in projecting rookie performance . . .
Back to top
View user's profile Send private message
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 10:48 am Post subject: Reply with quote
Ilardi wrote:But then we might inadvertently conflate metric retrodictive performance with an individual modeler's skill in projecting rookie performance . . .
Ah, I gotcha. One question though, is that same risk not present with retrodictive performance and aging adjustment?
Back to top
View user's profile Send private message Visit poster's website
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 708
Location: Raleigh, NC

PostPosted: Fri Aug 28, 2009 11:02 am Post subject: Reply with quote
We could always have two measures: one with and without rookies. As Steve mentioned, although we want to project rookies as best as possible, that's a separate modeling challenge that could be issued.

The intent (hopefully?) of this challenge is to use prior season data to best predict the future efficiency of a team. Projecting rookies with data from other leagues (college, NBDL, euro, etc) requires more models that we don't really care about at this point.

I'm not sure how well this will fall in line with Steve's outline, but I'm most interested in retrodictions where you're given specific information for each "shift" of players.

The intent of this type of retrodiction is to understand who predicts the best with a specific set of information, and to try and understand what information is valuable to know. For example, I'll tell you which lineup starts on offense, which players are on the court, and how many possessions they each had. We could then extend this to provide other relevant information, like lead/deficit, quarter, time left, etc. When you make all of these predictions, we aggregate them all together and determine the predicted offensive and defensive efficiency for each team.

I'll be posting some results from this type of analysis soon, so hopefully that will give a better idea of how exactly that works.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 11:02 am Post subject: Reply with quote
battaile wrote:
Ilardi wrote:
But then we might inadvertently conflate metric retrodictive performance with an individual modeler's skill in projecting rookie performance . . .


Ah, I gotcha. One question though, is that same risk not present with retrodictive performance and aging adjustment?


Potentially . . . that's why I'm hoping we can find some agreed-upon method of age adjustment to apply to all metrics. If not, then maybe we could do a multi-part analysis:

1) Retrodiction with no adjustments for age and using a simple average rookie value for all rookies;

2) Retrodiction with age adjustment (derived in custom-tailored fashion for each metric)

3) Retrodiction with age adjustment and rookie adjustment (both derived in custom-tailored fashion)
Back to top
View user's profile Send private message
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 11:13 am Post subject: Reply with quote
Ilardi wrote:
battaile wrote:
Ilardi wrote:But then we might inadvertently conflate metric retrodictive performance with an individual modeler's skill in projecting rookie performance . . .
Ah, I gotcha. One question though, is that same risk not present with retrodictive performance and aging adjustment?
Potentially . . . that's why I'm hoping we can find some agreed-upon method of age adjustment to apply to all metrics. If not, then maybe we could do a multi-part analysis:

1) Retrodiction with no adjustments for age and using a simple average rookie value for all rookies;

2) Retrodiction with age adjustment (derived in custom-tailored fashion for each metric)

3) Retrodiction with age adjustment and rookie adjustment (both derived in custom-tailored fashion)
Ah ok, I interpreted this
"Systematic age-adjustments to each metric are permitted in projecting each player's 08-09 values;"
incorrectly as each entry would come up with our own system (but you'd have to show that it was systemic and provide backing, not just throw out numbers), but this would actually be something standardized across all entries. Ok, it all makes sense now. Smile

I'd vote for number one as I think aging adjustments are something that you can do a lot with in their own right, so for a baseline on APM-retrodiction I'd rather see them left out. Then once the baseline is established start trying to improve on it with specific aging formulas. (competition number two?)
Back to top
View user's profile Send private message Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 263
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 11:18 am Post subject: Reply with quote
battaile wrote:

I'd vote for number one as I think aging adjustments are something that you can do a lot with in their own right, so for a baseline on APM-retrodiction I'd rather see them left out. Then once the baseline is established start trying to improve on it with specific aging formulas. (competition number two?)


I think we see it similarly, although I'd suggest conducting and publishing the two analyses (with and without age adjustment) in tandem, as I'm pretty sure some such adjustment algorithms already exist for most metrics in widespread use (APM, SPM, WS, PER, etc.), and as Neil Paine recently found with SPM, their inclusion can improve retrodictive accuracy in non-trivial fashion.

[Edit] I also think it might make the most sense to simply plug in the average (minutes-weighted) 08-09 rookie value for each rookie in the league for each metric.
Back to top
View user's profile Send private message
Neil Paine



Joined: 13 Oct 2005
Posts: 774
Location: Atlanta, GA

PostPosted: Fri Aug 28, 2009 12:04 pm Post subject: Reply with quote
You know, I'm not sure where I got the -1.17 value for rookies, either. It's been a while.

Incidentally, I think the best way for the challenge might be this:
*Use per-minute rates
*No age adjustment
*The weighting is as follows: 3 parts Y-1, 2 parts Y-2, 1 part Y-3
*For all seasons where no data exists for a player, use the league average

Clearly this is not going to produce the best results any of our metrics can do, but then again, that's not exactly the point here, is it? The point is to test the predictive value of past results. This method is going to be easily reproducible (no possibility of cheating), simple to execute, and not reliant on the ability to project playing time or the ability to fit a model for existing players or rookies.

It will be all about the metrics themselves and how predictive they are, with no external factors muddying the waters.
Back to top
View user's profile Send private message Visit poster's website
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 978
Location: Seattle

PostPosted: Fri Aug 28, 2009 12:33 pm Post subject: Reply with quote
I think Neil's proposed rules are pretty good, with the caveat that I think we might just zero out years where the player was not in the league for the sake of players who are well below or above average during their smaller sample.

If we do it this way, it's pretty much a matter of providing three years' worth of numbers, right?

battaile: Are you disabling BBCode in all of your posts? Your quotes are consistently messed up ...
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike G



Joined: 14 Jan 2005
Posts: 3570
Location: Hendersonville, NC

PostPosted: Fri Aug 28, 2009 1:23 pm Post subject: Reply with quote
If I'm creating an age-effect prediction algorithm, I'm going to do it by averaging trends over the last few years. In essence, a multi-year retrodiction would be precisely the exercise that would result in said algorithm.

So, all our retrodictions with their age-effects are 'suspect', in that they in effect create the best fit with our models. Hence, it's essential that we each submit a set of retros without age-effect factors, as well as with.

Then, an actual prediction has more validity.

I guess I'm just agreeing with what Battaile wrote.

Author Message
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 1:33 pm Post subject: Reply with quote
Kevin Pelton wrote:
I think Neil's proposed rules are pretty good, with the caveat that I think we might just zero out years where the player was not in the league for the sake of players who are well below or above average during their smaller sample.

If we do it this way, it's pretty much a matter of providing three years' worth of numbers, right?

battaile: Are you disabling BBCode in all of your posts? Your quotes are consistently messed up ...

Ah yeah, that was it, I couldn't figure out what was going on there. Not sure how I got that as the default setting.
Back to top
View user's profile Send private message Visit poster's website
DLew



Joined: 13 Nov 2006
Posts: 222


PostPosted: Fri Aug 28, 2009 2:35 pm Post subject: Reply with quote
On the rookies issue, I think that at least for the first go round, actual rookie performance should be used. This eliminates the issue of picking a value for rookie performance that is equivalent across the various metrics. The only other possible option that would be quick, easy, and fair, is to use league average. Even though we know that is not a great estimate for rookies, if you wanted to keep things to strictly previous season data then that is really the only value which will be easily verifiable as equivalent across metrics.
Back to top
View user's profile Send private message
Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 4:01 pm Post subject: Reply with quote
davis21wylie2121 wrote:
Incidentally, I think the best way for the challenge might be this:
*Use per-minute rates
*No age adjustment
*The weighting is as follows: 3 parts Y-1, 2 parts Y-2, 1 part Y-3
*For all seasons where no data exists for a player, use the league average


Hmmm . . .

I'm afraid I can't sign on to your proposed weighting scheme, Neil, as the generation of low-noise APM estimates requires something rather different. Specifically, I've found that using a large multiyear dataset (5-6 years) substantially improves estimation, especially with prior season weights declining exponentially (not linearly). Obviously, many other metrics don't require such extensive multi-season datasets to derive useful single-season estimates, but I don't regard this as a reason to penalize APM (unfairly, in my view) in the Retrodiction Challenge. In essence, I'm in favor of simply using the value of each metric in 2007-2008 as it is standardly derived - whether that involves utilization of multiple prior seasons or none at all.

Also, I agree that there should be a Part I of the Challenge that has no age adjustment, but I would also like to see a Part II that utilizes such an adjustment (various metrics may be more or less sensitive to age effects, after all).

On another note, I'm not sure what you mean by "use per-minute rates"? Are you referring to the outcome measure (team net efficiency)? If so, isn't it more traditional to render it on a per-100-possession basis? No big deal either way, I suppose . . .

Finally, I tend to like DLew's recommendation of using actual observed 08-09 values for rookies, although I wouldn't object to simply using the average rookie value for all rookies (a gross oversimplification which could reasonably be expected to affect all metrics about the same).
Back to top
View user's profile Send private message
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle

PostPosted: Fri Aug 28, 2009 4:41 pm Post subject: Reply with quote
Ilardi wrote:
Specifically, I've found that using a large multiyear dataset (5-6 years) substantially improves estimation, especially with prior season weights declining exponentially (not linearly).

We're all familiar with the results in terms of reducing the error term on APM values using multiple years. Is that what you're referring to, or have you found when retrodicting in the past that multi-year data improves the quality of those estimates?

Presumably, averaging a player's APM over multiple seasons calculated independently (that is, the 2007-08 APM is calculated and the 2008-09 APM and the two are averaged, and so on) would be another way of mitigating the noisiness inherent to single-season APM calculations. You've chosen to go a different direction by combining seasons, but to me an independent-season method would be preferable to the extent that it minimizes the problems created by player aging and other changes in the player's underlying "true worth" (if such a thing exists).

It would be interesting, if no such study has been performed before, to compare the two methods in this retrodiction. I think that would go a long ways toward making me comfortable with your method.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 4:42 pm Post subject: Reply with quote
Ilardi wrote:


... proposed weighting scheme...


If we all used the same weighting scheme, what would be the difference in entries?
Back to top
View user's profile Send private message Visit poster's website
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle

PostPosted: Fri Aug 28, 2009 4:43 pm Post subject: Reply with quote
The difference would be the amount that the ratings capture a player's true value, in theory. Obviously, noise is a part of it as well.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
battaile



Joined: 27 Jul 2009
Posts: 38


PostPosted: Fri Aug 28, 2009 4:44 pm Post subject: Reply with quote
Kevin Pelton wrote:
The difference would be the amount that the ratings capture a player's true value, in theory. Obviously, noise is a part of it as well.


Still not following this, if we're all using the same data, with the same weights for predicting 09, it seems like we're just coming up with a method that could be run by one person.
Back to top
View user's profile Send private message Visit poster's website
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle

PostPosted: Fri Aug 28, 2009 4:51 pm Post subject: Reply with quote
Let us say that one of us has created the "holy grail" metric that distills all player value to a single number. This being the case, using this metric to product team performance the following season should be a superior method as compared to using other player metrics, and that should be reflected in the quality of the retrodiction of past seasons.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 5:17 pm Post subject: Reply with quote
Kevin Pelton wrote:
Ilardi wrote:
Specifically, I've found that using a large multiyear dataset (5-6 years) substantially improves estimation, especially with prior season weights declining exponentially (not linearly).

We're all familiar with the results in terms of reducing the error term on APM values using multiple years. Is that what you're referring to, or have you found when retrodicting in the past that multi-year data improves the quality of those estimates?

Presumably, averaging a player's APM over multiple seasons calculated independently (that is, the 2007-08 APM is calculated and the 2008-09 APM and the two are averaged, and so on) would be another way of mitigating the noisiness inherent to single-season APM calculations. You've chosen to go a different direction by combining seasons, but to me an independent-season method would be preferable to the extent that it minimizes the problems created by player aging and other changes in the player's underlying "true worth" (if such a thing exists).

It would be interesting, if no such study has been performed before, to compare the two methods in this retrodiction. I think that would go a long ways toward making me comfortable with your method.


Yes, using multi-season datasets improves the quality of APM estimates by making them both more reliable and more accurate. Given that the multi-season approach dramatically reduces estimation error (i.e., helps us hone in on the actual underlying parameter values), how could it be otherwise?

Remember, when using a single season to generate APM estimates, you're going to be dealing with some isolated player pairs who spend the great majority of their on-court time together . . . and I know of no way to effectively disentangle such players' respective APM effects except by including several additional seasons' worth of lineups.

Of course, it is true, as you suggest, that simply averaging a bunch of highly noisy single-year APM estimates will improve things somewhat, but not as much as utilizing all relevant lineups in a single multi-season analysis.

Nevertheless, if you'd like to see both types of APM estimation tested in the Retrodiction Challenge, I have no objections.
Back to top
View user's profile Send private message
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle

PostPosted: Fri Aug 28, 2009 5:24 pm Post subject: Reply with quote
Ilardi wrote:
Nevertheless, if you'd like to see both types of APM estimation tested in the Retrodiction Challenge, I have no objections.

Cool. I know we chatted via e-mail a long time ago about you possibly creating a database of independent season-by-season ratings. Is that something you ever did?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS

PostPosted: Fri Aug 28, 2009 6:06 pm Post subject: Reply with quote
Kevin Pelton wrote:
Ilardi wrote:
Nevertheless, if you'd like to see both types of APM estimation tested in the Retrodiction Challenge, I have no objections.

Cool. I know we chatted via e-mail a long time ago about you possibly creating a database of independent season-by-season ratings. Is that something you ever did?


Sort of . . . the seasonal ratings aren't completely independent, but they're each weighted about 70% toward the target season of interest (with very minimal weight accorded to more temporally distant seasons).

Independent single-season APM estimates are just so damned noisy that they're of very limited value (in my opinion). However, since the typical year-to-year age-related increase/decrease in APM is only around 0.3, I see very little downside in using multiseason databases to derive APM estimates.
Back to top
View user's profile Send private message
Crow



Joined: 20 Jan 2009
Posts: 796


PostPosted: Fri Aug 28, 2009 6:38 pm Post subject: Reply with quote
Not sure exactly what and how much to say about this but I'll share something:


Stat +/- Error Sagarin Strength of Schedule
Atlantic 1 5
Central 2 1
Southeast 4 4
Southwest 3 2
Pacific 6 6
Northwest 5 3

For statistical +/- 1= least avg. error, 6=most
For avg. strength of schedule 1 is toughest, 6 weakest

For 5 of the 6 divisions easier strength of schedule tracks quite tightly with higher average statistical +/- error. The Atlantic division is way different. Something is making the average statistical error way less, the lowest in the comparison. I'd think that would be worth considering further. The Atlantic division plays 24% less games against top 10 opponents (teams who are the most difficult to get wins from) than league average, the biggest variance in the league. They won those games 40% less than league average, again the biggest variance. And this is not a 1 season fluke of scheduling, the Atlantic division has been very low on facing top 10 for 4 straight years and with the exception of finishing second to the Southeast by one less such game last season they have played the top 10 the least. Perhaps not facing the top 10 as much causes less distortion to the statistical estimate than average. If you are trying to predict team wins that is worth being aware of and perhaps addressing in some way.


How should retrodictions adjust for expected strength of schedule and how much of the error of the unadjusted for SOS retrodiction results is because of it? I leave that for discussion.

Last edited by Crow on Sun Aug 30, 2009 3:05 pm; edited 6 times in total
Back to top
View user's profile Send private message
Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle

PostPosted: Fri Aug 28, 2009 6:49 pm Post subject: Reply with quote
As with rookies, strength of schedule -- to the extent it matters it all -- should affect everyone equally, no?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Crow



Joined: 20 Jan 2009
Posts: 796


PostPosted: Fri Aug 28, 2009 6:57 pm Post subject: Reply with quote
All retrodiction participants? On the surface it might look that way or close enough if no one adjusts for SOS, though the degree to which each metric captures defense could especially have some metric-based affect to the extent that schedules vary on offensive and defensive strengths and the metrics capture those differences more or less than other metrics.

Equal for all players and teams? No.
With all the recent talk about adjusting blocks and assists for scorekeepers I'd think there be some incentive to adjust other metrics for opponent quality in simple ways. But if the proponents of these other metrics prefer to leave that entirely for adjusted, so be it.

Last edited by Crow on Sun Aug 30, 2009 1:56 pm; edited 4 times in total
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Fri Aug 28, 2009 7:06 pm Post subject: Reply with quote
Crow wrote:
No.


He means "everyone" as in the participants, not the players.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com

Crow



Joined: 20 Jan 2009
Posts: 818


PostPosted: Fri Aug 28, 2009 7:18 pm Post subject: Reply with quote
Of the 63 players whose actual statistical +/- exceeded the projected by 3 or more points it was pretty evenly divided by position with forwards slightly underrepresented and centers a bit over represented though that could be related to position assignment.

On average for this group guys were projected as around -1s but turned out to be around +2s. Takes a pretty good eye or system to catch this extra value.

Last edited by Crow on Fri Aug 28, 2009 7:30 pm; edited 1 time in total

Crow



Joined: 20 Jan 2009
Posts: 818


PostPosted: Fri Aug 28, 2009 7:51 pm Post subject: Reply with quote
There appears to be a standard assumption for the contest of adjusting every player in exactly the same way. Part of the reason for raising the detail about the statistical +/- overachievers was to see if there was any pattern to that which could improve the performance of statistical +/- for the contest and to raise the possibility that maybe different positions or roles or team quality levels might be better projected in ways that had some degree of uniqueness using any metric for the contest.

If I put together an entry it would likely be a meta-metric. If, for example, adjusted appeared to capture the value of centers better in such a test I might weight that higher. If it didn't work as well with PGs, perhaps lower.


And guess what? Briefly looking at statistical overachievers I see that the average age of centers are a modestly noteworthy 3 years older than forwards and 2 years older than the guards. And among underachievers only 1 of the 24 was a center. For the contest, if you are going to age adjust that is at least food for thought for possibly age-adjusting differently by position.

Regression might find the best average weights for using 3 years of data but regression or trial and error might find even better weights for subsets of players based on the data pattern or if you don't want to treat subsets differently base the weights for the data in part on how far from a player's career average the performance of a particular year was or from the average career curve or the average career curve of similar players in addition to whatever weighting is the best general fit for year to year data for everyone.

I think it would be worth comparing the weights given in this new version of statistical +/- for P/40, TSA40 and TSA40^2 to how other metrics score attempts and points, especially PER and Wins Produced. I'd think this relationship is important for predicting the win impact of shooting and scoring for different types of players.


But if you are not interested in that kind of stuff, me getting to that, providing others the chance to add their perspective / find other stuff, ignore it. To me it is on point. And even where I might miss, worth a shot.

Last edited by Crow on Sun Aug 30, 2009 1:46 pm; edited 6 times in total
Back to top
View user's profile Send private message
jim



Joined: 01 Aug 2009
Posts: 13


PostPosted: Fri Aug 28, 2009 9:08 pm Post subject: Reply with quote
Here's an idea: We could retrodict a half-season by using the other half (maybe use something similar to an odd-even split of games, though an exact odd-even split probably isn't possible). While decreasing our sample data will probably increase error, this allows us to sidestep issues such as how to deal with rookies and aging.

Mike G



Joined: 14 Jan 2005
Posts: 3605
Location: Hendersonville, NC

PostPosted: Sun Aug 30, 2009 4:10 pm Post subject: Reply with quote
Here's my first pass entry at retrodicting (which according to my spell-check is not a word). xW (expected wins) derive from eWins by the formula -
xW = eW*2 - 41 (for 82 G)
Code:
tm expW pyth err
Atl 36.1 45.3 9.1
Bos 65.4 60.0 5.4
Cha 24.9 37.4 12.5
Chi 41.8 40.3 1.5
Cle 45.4 63.2 17.8
Dal 53.5 46.2 7.3
Den 46.6 49.7 3.1
Det 51.7 39.6 12.1
GSW 29.5 32.1 2.6
Hou 53.2 51.8 1.4
Ind 42.8 38.2 4.6
LAC 32.5 19.8 12.7
LAL 65.0 59.4 5.6
Mem 25.0 26.6 1.6
Mia 32.8 41.7 8.9
Mil 24.5 38.1 13.6
Min 30.8 28.4 2.4
NJN 35.5 34.5 1.0
NOH 50.8 45.3 5.5
NYK 19.8 34.5 14.7
Okl 19.1 25.5 6.5
Orl 48.5 58.1 9.7
Phl 46.9 41.2 5.6
Phx 53.4 45.7 7.6
Por 41.4 55.1 13.7
Sac 27.5 20.7 6.8
SAS 50.5 51.3 .8
Tor 47.1 33.6 13.5
Uta 55.0 47.8 7.2
Was 29.1 22.5 6.6

40.9 41.1 7.37

eW per minute from 2008 were multiplied by 2009 minutes.
Rookies and others without 2008 figures were just retrodicted with the eW rates they actually had for 2009. I figure their rates were at least as unpredictable as their minutes.
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
Ilardi



Joined: 15 May 2008
Posts: 265
Location: Lawrence, KS

PostPosted: Mon Aug 31, 2009 10:02 am Post subject: Reply with quote
Mike G wrote:
Here's my first pass entry at retrodicting (which according to my spell-check is not a word). xW (expected wins) derive from eWins by the formula -
xW = eW*2 - 41 (for 82 G)
Code:
tm expW pyth err
Atl 36.1 45.3 9.1
Bos 65.4 60.0 5.4
Cha 24.9 37.4 12.5
Chi 41.8 40.3 1.5
Cle 45.4 63.2 17.8
Dal 53.5 46.2 7.3
Den 46.6 49.7 3.1
Det 51.7 39.6 12.1
GSW 29.5 32.1 2.6
Hou 53.2 51.8 1.4
Ind 42.8 38.2 4.6
LAC 32.5 19.8 12.7
LAL 65.0 59.4 5.6
Mem 25.0 26.6 1.6
Mia 32.8 41.7 8.9
Mil 24.5 38.1 13.6
Min 30.8 28.4 2.4
NJN 35.5 34.5 1.0
NOH 50.8 45.3 5.5
NYK 19.8 34.5 14.7
Okl 19.1 25.5 6.5
Orl 48.5 58.1 9.7
Phl 46.9 41.2 5.6
Phx 53.4 45.7 7.6
Por 41.4 55.1 13.7
Sac 27.5 20.7 6.8
SAS 50.5 51.3 .8
Tor 47.1 33.6 13.5
Uta 55.0 47.8 7.2
Was 29.1 22.5 6.6

40.9 41.1 7.37

eW per minute from 2008 were multiplied by 2009 minutes.
Rookies and others without 2008 figures were just retrodicted with the eW rates they actually had for 2009. I figure their rates were at least as unpredictable as their minutes.


Thanks, Mike. A few thoughts:

1) At least as I conceived it, the primary outcome metric for the Challenge would be team efficiency (i.e., net point margin per 100 possessions) - with Wins as a secondary metric of interest. Would you be willing to convert your estimates accordingly?

2) In order to make Challenge results transparent, accurate, and fully replicable, I'm also hoping we'll be able to compile a collective online database (preferably in Google Spreadsheet) containing all the "raw data" underlying each team's estimate for each metric - i.e., a listing of each player, his 2007-2008 metric value, his 2009 minutes played, etc. Would you be able to begin that process for us with your eWins data?

3) Could you also clarify which players (beyond rookies) had no 2007-2008 data in your model? (e.g., was it all players below a certain minutes-played threshold?)

4) Could you also provide estimates in which a default average value is entered for each rookie and each otherwise un-estimated player from 07-08? (We'll want to look at retrodiction under both scenarios - as well as under an age-adjustment scenario).

Thanks.
Back to top
View user's profile Send private message
DLew



Joined: 13 Nov 2006
Posts: 224


PostPosted: Mon Aug 31, 2009 11:08 am Post subject: Reply with quote
Well wins relate to team net efficiency per 100 possessions by the equation:
(2.4 * NetEffper100) + 41 = Wins (the coefficient becomes 2.7 if you use per game efficiency differential)
So, by algebra this equation becomes: (Wins - 41) / 2.4 = Net Efficiency per 100 Possessions
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Mon Aug 31, 2009 11:32 am Post subject: Reply with quote
In my opinion, looking at a single season is not going to yield much useful information, as there is a non-trivial chance that what is actually the "best" metric won't come out on top. Likewise, there is a non-trivial chance that the "worst" metric will come out on top.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Neil Paine



Joined: 13 Oct 2005
Posts: 774
Location: Atlanta, GA

PostPosted: Mon Aug 31, 2009 11:39 am Post subject: Reply with quote
http://www.basketball-reference.com/blog/?p=3277

Re: APBR Retrodiction Challenge

Posted: Fri Apr 15, 2011 7:09 am
by Crow
Page 5

Author Message Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle
Posted: Mon Aug 31, 2009 2:30 pm Post subject:

Regarding a goal: I would put it this way. We'd like to have an objective way to measure various rating systems. Testing for how well they measure performance at the team level is ineffective because team adjustments and the nature of APM mean they simply divide credit after establishing the team's performance level. What we're left with, then, is seeing how well metrics predict performance. This is difficult to do in the future, because seasons only come once a year. Therefore, the best way to do it is by looking backwards and retrodicting. The key here is finding a level playing field for multiple different systems. Do people generally agree with that?
Back to top

HoopStudies



Joined: 30 Dec 2004
Posts: 705
Location: Near Philadelphia, PA
Posted: Mon Aug 31, 2009 3:03 pm Post subject:

jkubatko wrote:
Ilardi wrote:
With, say, 5 seasons' worth of data, we'd have 150 separate observed predictions of team efficiency/wins . . . probably enough of a sample to get at least some sense of which metrics are yielding better or worse prediction, yes?
I suppose, but here's something that has been bugging me: What is the goal of this exercise? Without a well-defined goal, it makes it difficult to decide what the "rules" should be. For example, if the primary goal is to figure out which metric produces the best weather forecasts (see Dean, I did read your book), then the inclusion of future seasons should not be permissible.
As I think about it, I'm not sure it matters much. Regardless of the goal -- better predictor (I don't think this can settle it), identification of the cases where most methods differ, understanding of whether minutes prediction matters more than quality -- my guess is that future years are going to most matter for rookies and matter little for other guys. This then gets back to how to handle rookies, limited minute players, and aging curves. Maybe there should be a couple different versions of this game. A. Uses a constant value of player value in all years retrodicted (presumably the player's minute-weighted average), incorporating future years. B. Uses the prior year of player value along with the method's actual player value for rookies. This may need some modification for guys who play 10 minutes and get hurt with a very large or very small player value. C. Like A, but uses some formulaic aging/experience curve. This could allow derivation of better aging/experience curves. D. Like A, but where the constant value is more like an average of all previous played years (no future). Would also need to deal with rookies and limited minute players, as in B. Just some thoughts._________________Dean Oliver Author, Basketball on Paper The postings are my own & don't necess represent positions, strategies or opinions of employers.
Back to top

Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS
Posted: Mon Aug 31, 2009 3:24 pm Post subject:

Kevin Pelton wrote:
Regarding a goal: I would put it this way. We'd like to have an objective way to measure various rating systems. Testing for how well they measure performance at the team level is ineffective because team adjustments and the nature of APM mean they simply divide credit after establishing the team's performance level. What we're left with, then, is seeing how well metrics predict performance. This is difficult to do in the future, because seasons only come once a year. Therefore, the best way to do it is by looking backwards and retrodicting. The key here is finding a level playing field for multiple different systems. Do people generally agree with that?
Absolutely. (And I particularly like the emphasis on finding a level playing field for each metric.)
Back to top

Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS
Posted: Mon Aug 31, 2009 3:44 pm Post subject:

HoopStudies wrote:
jkubatko wrote:
Ilardi wrote:
With, say, 5 seasons' worth of data, we'd have 150 separate observed predictions of team efficiency/wins . . . probably enough of a sample to get at least some sense of which metrics are yielding better or worse prediction, yes?
I suppose, but here's something that has been bugging me: What is the goal of this exercise? Without a well-defined goal, it makes it difficult to decide what the "rules" should be. For example, if the primary goal is to figure out which metric produces the best weather forecasts (see Dean, I did read your book), then the inclusion of future seasons should not be permissible.
As I think about it, I'm not sure it matters much. Regardless of the goal -- better predictor (I don't think this can settle it), identification of the cases where most methods differ, understanding of whether minutes prediction matters more than quality -- my guess is that future years are going to most matter for rookies and matter little for other guys. This then gets back to how to handle rookies, limited minute players, and aging curves. Maybe there should be a couple different versions of this game. A. Uses a constant value of player value in all years retrodicted (presumably the player's minute-weighted average), incorporating future years. B. Uses the prior year of player value along with the method's actual player value for rookies. This may need some modification for guys who play 10 minutes and get hurt with a very large or very small player value. C. Like A, but uses some formulaic aging/experience curve. This could allow derivation of better aging/experience curves. D. Like A, but where the constant value is more like an average of all previous played years (no future). Would also need to deal with rookies and limited minute players, as in B. Just some thoughts.
I like this framework, Dean: it's quite similar to the one I've proposed - albeit in dribs and drabs - over the past couple of days. However, I would also suggest a separate analysis limited to teams with high between-season roster turnover, which should (as others have argued) provide a particularly compelling test of retrodictive efficacy.
Back to top

Mike G



Joined: 14 Jan 2005
Posts: 3528
Location: Hendersonville, NC
Posted: Mon Aug 31, 2009 3:54 pm Post subject:

Ilardi wrote:
3) Could you also clarify which players (beyond rookies) had no 2007-2008 data in your model? (e.g., was it all players below a certain minutes-played threshold?)
Players over 100 minutes: Code:
no 2008 tm min e484 eWins Battie,Tony Orl 1198 .51 1.3 Skinner,Brian LAC 845 .46 .8 Morrison,Adam Cha 667 .06 .1 Miles,Darius Mem 302 .78 .5 May,Sean Cha 301 .34 .2 Mensah-Bonsu,Pop Tor 263 .80 .4 Brown,Dee Was 232 .09 .0 Livingston,Shaun Okl 189 .57 .2
Average e484 is 1.00 . While none of these players were significantly better (in significant minutes) than an average backup, it could be enough 'cheating' (thru laziness) to make a difference. I can easily-enough use older data -- though it's not up to current standards. All rookies had a total e484 of .79 . Does it make sense to have an 'entry' giving all rookies some such value?_________________` 36% of all statistics are wrongLast edited by Mike G on Mon Aug 31, 2009 4:00 pm; edited 1 time in total
Back to top

schtevie



Joined: 18 Apr 2005
Posts: 404
Posted: Mon Aug 31, 2009 3:55 pm Post subject:

Hey, enough with the navel gazing. Let's get on with it! Beyond transparency, is there any criterion that ultimately matters? If someone were to go to the extra effort, say, to find aging curves specific to positional play and used these as a refinement and they proved valuable, would this be viewed as somehow illegitimate because others didn't? Let each "competitor" proceed in a manner that he or she sees fit, in a manner expected to pass critical scrutiny. After all, the point of this isn't competition, it is to advance understanding. My extra $0.10 would be to request that folks do as Justin did with his previous retrodiction efforts and provide estimates for multiple years. It is not to be unexpected that certain years might favor certain approaches. For example, my presumption is that in a year where true defensive standouts switch teams that such years would favor APM. Maybe not. And finally, as SPM estimates are derived from APM, shouldn't these results await Steve's new and improved efforts? So, grab a Speedo, or grab an Arena suit, your choice, and get in the water!
Back to top

Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle
Posted: Mon Aug 31, 2009 4:26 pm Post subject:

schtevie wrote:
If someone were to go to the extra effort, say, to find aging curves specific to positional play and used these as a refinement and they proved valuable, would this be viewed as somehow illegitimate because others didn't?
Yes. Did you even skim my previous post?
Back to top

Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS
Posted: Mon Aug 31, 2009 4:53 pm Post subject:

Kevin Pelton wrote:
schtevie wrote:
If someone were to go to the extra effort, say, to find aging curves specific to positional play and used these as a refinement and they proved valuable, would this be viewed as somehow illegitimate because others didn't?
Yes. Did you even skim my previous post?
Kevin: perhaps I'm being too generous, but I think Schtevie was probably assuming that such refinements - if shown to be valuable on a given metric by some enterprising APBRer - could be always be extended to other metrics under the aegis of a separate analysis (thereby preserving the level playing field ideal).
Back to top

Kevin Pelton
Site Admin


Joined: 30 Dec 2004
Posts: 976
Location: Seattle
Posted: Mon Aug 31, 2009 5:07 pm Post subject:

I don't think it's realistic to expect that a player development model could be applied evenly to all the different metrics being used. That being the case, opening things up would distract from the core goal of this effort, to make distinctions amongst various player rating systems. If we're comparing on both the system and its ability to model aging and other effects, it's impossible to tell -- in the context of this specific exercise -- whether its success or failure is due to one or the other, which would work against the desire to advance understanding.
Back to top

Ilardi



Joined: 15 May 2008
Posts: 262
Location: Lawrence, KS
Posted: Mon Aug 31, 2009 5:15 pm Post subject:

Kevin Pelton wrote:
I don't think it's realistic to expect that a player development model could be applied evenly to all the different metrics being used. That being the case, opening things up would distract from the core goal of this effort, to make distinctions amongst various player rating systems. If we're comparing on both the system and its ability to model aging and other effects, it's impossible to tell -- in the context of this specific exercise -- whether its success or failure is due to one or the other, which would work against the desire to advance understanding.
Agreed, but I would certainly be interested in seeing if someone - outside of the strict confines of The Challenge (or "The Contest", with apologies to Seinfeld) - could demonstrate markedly enhanced prediction with a refinement that proved extendable (in principle) to other metrics.
Back to top

Mike G



Joined: 14 Jan 2005
Posts: 3528
Location: Hendersonville, NC
Posted: Mon Aug 31, 2009 7:14 pm Post subject:

Right now, I'm thinking the easiest way to apply an aging adjustment is to take the average age of a player-minute on a team, rather than adjust (for age) every member of the team. There may be a positional aging variation, but I'd guess not much._________________` 36% of all statistics are wrong
Back to top

DJE09



Joined: 05 May 2009
Posts: 148
Posted: Mon Aug 31, 2009 11:16 pm Post subject:

Why don't you Predict the playoffs based on the regular season. There are no rookies, every player in the playoffs must have been playing for that team - ie. there are no cases were there is "No" data for a player. Sure the ammounts of data will be small in some cases - but at least it will be real world. Justin has a good track record of predicting the series, but maybe the point of the exercise would be to predict the game results - ie return games with scores. If the objective is transperancy, then I would suggest a Prediction is made, so you have to use your model to predict team playoff line ups / minutes, and then also a retrodiction, and see which gives a better match with reality?
Back to top

schtevie



Joined: 18 Apr 2005
Posts: 404
Posted: Tue Sep 01, 2009 11:14 am Post subject:

I think it is agreed that the point of the proposed retrodiction-palooza is to compare player rating systems, which aspire to represent player value. And in such a light I cannot fathom what the point is of trying to restrict the proper estimation of player values. An estimate of a player's value applied to a following year may be inaccurate for one of, at least, three general reasons. First, the player will be older, hence might improve or deteriorate in some predictable fashion. Second, a player might be more or less injured than in the base year estimate. And third, a player will necessarily be utilized somewhat differently on the court. Oh yeah, and fourth, the rating system may be at variance with reality. There is no rational reason, beyond some misguided conception of even-handedness, to oblige any scheme to accept baseline values generated by another. We don't expect that replacement level players are estimated to be of equal value by Win Shares, Wins Produced, APM, or whatever, so why oblige a common estimate? This would only serve to debase all results for the sake of "competition". Similarly, for aging curves. Now, it may come to pass that there isn't much variance in these matters. (Though I note that for 2007-08, that there was a distinct difference in the relationship between team retrodiction residuals and minute-weighted average team age between Dave Berri's data and Steve's 2006-07 APM data.) But one shouldn't presume. Let's have the results generated in an internally consistent manner, then see what's up. Apples to apples comparisons will be easy. Let's not stymie progress. P.S. One additional suggestion, in line with previous commentary. As a separate "competition", perhaps a list of 2008-09 teams which had a threshold number of minutes played by players who had switched teams from the year before, could be agreed upon in advance. Accuracy in predicting such results is arguably of greater value that better predicting the status quo.