With that sample size, I'd expect the split half correlation to be pretty high.xkonk wrote:Some of the patterns in the data look extremely similar - the bounces from -2 to 1, the dip at 10, etc. I can see the values aren't exactly the same, but is this maybe more similar than we should expect? I know that's a vague question, but I would have guessed that you'd see something a little different just from cutting the sample in half.
The 'effect of being up X'
Re: The 'effect of being up X'
Re: The 'effect of being up X'
Jeremias, a couple of preliminary questions:
(1) You say you entered binary variables for the range +/- 57, but the graphs only show +/- 25. Why the omission of the extremes?
(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential. Might there be any relevant econometric issues arising from lumping line-up times of different duration?
(1) You say you entered binary variables for the range +/- 57, but the graphs only show +/- 25. Why the omission of the extremes?
(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential. Might there be any relevant econometric issues arising from lumping line-up times of different duration?
Re: The 'effect of being up X'
I agree with schtevie: shouldn't it ideally be possession by possession?schtevie wrote:
(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential. Might there be any relevant econometric issues arising from lumping line-up times of different duration?
Correct me if I'm wrong, but wouldn't the weighted average not take into account whether a lineup combination was up 2 for a stretch of possessions, vs. up 2, then up 4, then up 6, then up 4 and back to 2 again? Couldn't these result in different values?
Re: The 'effect of being up X'
Kind of confused here....are you or aren't you?J.E. wrote:Yes. This is already a variable in my current RAPM calculation. I still need to do some more research to find out whether it's a good idea to give less weight to garbage time possessionsbbstats wrote:Have you considered using this as another RAPM variable (if it's linear)? That would go a long way in helping adjust player values for garbage time.

If "current point margin" is already a variable, then I'd say you are weighing garbage time.
Re: The 'effect of being up X'
Sample size is small for the extremes and thus coefficients can be a) all over the place and b) very large, messing with the graph scale, thus making it harder to spot values in the graphschtevie wrote:Why the omission of the extremes?
The coefficients are not possession-weighted, also not an average.(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential.
What I'm doing is I'm adding binary variables for "point differential_before_the_possession_takes_place" to the regression. These binary variables work in the same way as the RAPM player binary variables. If you will, 'Player_UP_X' becomes a 6th man in the regression (just like I do things with coaches)
Everything is possession by possessionshouldn't it ideally be possession by possession?
In regression analysis you can give different weights to different observations. See WikipediaIf "current point margin" is already a variable, then I'd say you are weighing garbage time.
I'm not doing that at the moment
-
- Posts: 194
- Joined: Tue Oct 30, 2012 6:58 pm
- Location: The Alpha Quadrant
- Contact:
Re: The 'effect of being up X'
If it's at all helpful or interesting, we've (ofc) tried out various ways of operationalizing a garbage time cutoff of some sort in our RAPM/similar models, and always find that any benefit (to out of sample predictive power) is slim to none. This is probably due to the shrinking sample sizes from doing so, I assume.
Re: The 'effect of being up X'
Yeah that sure helps. Saves me some time. Thanks for the infotalkingpractice wrote:If it's at all helpful or interesting, we've (ofc) tried out various ways of operationalizing a garbage time cutoff of some sort in our RAPM/similar models, and always find that any benefit (to out of sample predictive power) is slim to none. This is probably due to the shrinking sample sizes from doing so, I assume.
Re: The 'effect of being up X'
Yes, that is interesting information. J.E.--I'd love to see the "effect of being up X" for the 1st half and the 4th quarter, overlaid. Would put to rest "garbage time" as a variable (since you're accounting for who's playing, which is the primary driver of such notions).J.E. wrote:Yeah that sure helps. Saves me some time. Thanks for the infotalkingpractice wrote:If it's at all helpful or interesting, we've (ofc) tried out various ways of operationalizing a garbage time cutoff of some sort in our RAPM/similar models, and always find that any benefit (to out of sample predictive power) is slim to none. This is probably due to the shrinking sample sizes from doing so, I assume.
-
- Posts: 237
- Joined: Sat Feb 16, 2013 11:56 am
Re: The 'effect of being up X'
When I looked at the issue, the "default" was having one day of rest and all other states were tested. The only significant one I found was having no rest, so it's not being canceled out.J.E. wrote:I don't, but might in the future. Although I suspect the "days rest" effects are mostly going to cancel each other outAcrossTheCourt wrote:By the way, do you use a variable or adjustment for having no rest (i.e. a game the day before for the home/away team)?
Re: The 'effect of being up X'
Please humor me here. An example: So, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2. For the regression then, it is the case that, the 0 dummy has two realizations, the +2 has one, and the -2 also has one. (Conversely, it is not the case that this four possession run at the start of the game is represented by a single 0 entry.) Correct? So, this implies that the graph is actually plotting the marginal "effort" as a function of score differential (as opposed to the average).J.E. wrote:The coefficients are not possession-weighted, also not an average.schtevie wrote:(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential.
What I'm doing is I'm adding binary variables for "point differential_before_the_possession_takes_place" to the regression. These binary variables work in the same way as the RAPM player binary variables. If you will, 'Player_UP_X' becomes a 6th man in the regression (just like I do things with coaches)Everything is possession by possessionshouldn't it ideally be possession by possession?
Re: The 'effect of being up X'
I'm talking from a player metrics point of view. Assuming that everyone plays a more or less similar schedule (in terms of rest days) then adding in the effect of 'X days rest' to the model has no (or little) effect on the final player values, no?AcrossTheCourt wrote:When I looked at the issue, the "default" was having one day of rest and all other states were tested. The only significant one I found was having no rest, so it's not being canceled out.J.E. wrote:I don't, but might in the future. Although I suspect the "days rest" effects are mostly going to cancel each other outAcrossTheCourt wrote:By the way, do you use a variable or adjustment for having no rest (i.e. a game the day before for the home/away team)?
That's not a possible sequence. You can't be up 2 after your opponent had the first possession of the gameSo, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2
Re: The 'effect of being up X'
Aren't possessions defined as successive turns with the ball for each team, i.e. two "half possessions"? As such, this is a possible sequence, no? The game starts 0,0 (home team, away team, say) so this is a lead of 0 at the start of the first possession. Then during the first possession the home team scores 2 and the away team doesn't score so the score is (2,0) or a lead of +2 (defined as home - away). And so on.J.E. wrote:That's not a possible sequence. You can't be up 2 after your opponent had the first possession of the gameSo, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2
Are you defining a possession as what I referred to as a "half possession"? And if so, is this the same definition as displayed in the graphs, implying that with a 10 point lead, say, a player is 3.5 (per 100 half possessions) worse on offense and an additional 3.5 worse on defense?
Re: The 'effect of being up X'
To me a game has ~188 full possessionsschtevie wrote:Aren't possessions defined as successive turns with the ball for each teamJ.E. wrote:That's not a possible sequence. You can't be up 2 after your opponent had the first possession of the gameSo, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2
No. It says offensive efficiency for a 5-man-unit (away), up 10, is 3.5 points per 100 poss. worse than if it were tied. How much the home offense gains by being down 10.. you'll have to look that up in a seperate graph for the home team (haven't posted it yet, it's slightly different, tells the same overall story though).Are you defining a possession as what I referred to as a "half possession"? And if so, is this the same definition as displayed in the graphs, implying that with a 10 point lead, say, a player is 3.5 (per 100 half possessions) worse on offense and an additional 3.5 worse on defense?
If you want to break it down to the player level and you assume(!) the effect comes in equal parts from the defense being more lazy and the offense trying harder, you'd get an effect of 3.5/10 = 0.35 per 100 poss. for each offensive player
Re: The 'effect of being up X'
Jeremias, thanks for this clarification, what is a good segue for what I hope is a general clarification/contextualization of all the recent modifications to past procedure, embodied in http://stats-for-the-nba.appspot.com/ratings/2014.html. Perhaps I am missing something, but as I understand things, your 2014 results differ from previous years' estimates in the following way:
(1) Now pure RAPM, not xRAPM (i.e. no priors, neither box-score nor previous year estimates)
(2) Aging Curve effects are incorporated.
(3) Effort Curve (Line?) effects (the topic here) are incorporated.
(4) Coaches have been added to the regression as well.
(3) RSMEs (a suggestion of standard errors) are provided.
Is that it? Assuming so, the question I have, and I am supposing that I am not alone, is what would be the independent effect of each of (or sub-group of) these factors when incorporated in the xRAPM framework?
These are my conjectures:
(1) A 2014 xRAPM would not look dissimilar in range and general characteristics to the years preceding it. As such, one could be absolutely sure that a 6'6" SF/SG wouldn't be topping the list, owing to his being the highest ranked defender in the NBA.
(2) Aging Curve effects would be very small. What the curve shows are very small annual changes (approx. 0.2 points per year up then down) for the vast majority of players (ages 22 to 33) who play the vast majority of possession. Players playing disproportionately with much older or younger teammates would be affected most, but such results would be exceptional and still not that large.
(3) The effect of incorporating the "Effort Curve" too should also be quite small, no? (At least, proportionately, in terms of its effect on the ratings of the best and worst players.) Suppose you are a LeBron James - a very good player, playing for a very good team. As a result of his and team's efforts, he finds himself playing a lot with his team in the lead. Let's say his average possession sees him playing with a 10 point lead. So, this should lead to a 0.35 (upward?) adjustment to his rating? If this is correct, such an adjustment is not nothing (and there is also the issue of how one should interpret this factor, but that is another discussion) but it isn't much.
(4) And this leaves the Coaching factor. In another string, I expressed concerns about these estimates, and I won't repeat those arguments here. But I do have an econometric question. To what extent are the coaching estimates misleading for their picking up above/below-average player development not owing to coaching "intervention"? I am supposing that the supposed greatness of Scott Brooks is in significant measure an "arbitrary" subtraction of value from Kevin Durant (what for example didn't occur with LBJ for the timing of the Cavs coaching changes).
(5) Finally, the question of "error of estimation". It sure would be interesting to see what the RMSEs were for xRAPM estimates, with the sequential inclusion of the above modifications.
(1) Now pure RAPM, not xRAPM (i.e. no priors, neither box-score nor previous year estimates)
(2) Aging Curve effects are incorporated.
(3) Effort Curve (Line?) effects (the topic here) are incorporated.
(4) Coaches have been added to the regression as well.
(3) RSMEs (a suggestion of standard errors) are provided.
Is that it? Assuming so, the question I have, and I am supposing that I am not alone, is what would be the independent effect of each of (or sub-group of) these factors when incorporated in the xRAPM framework?
These are my conjectures:
(1) A 2014 xRAPM would not look dissimilar in range and general characteristics to the years preceding it. As such, one could be absolutely sure that a 6'6" SF/SG wouldn't be topping the list, owing to his being the highest ranked defender in the NBA.
(2) Aging Curve effects would be very small. What the curve shows are very small annual changes (approx. 0.2 points per year up then down) for the vast majority of players (ages 22 to 33) who play the vast majority of possession. Players playing disproportionately with much older or younger teammates would be affected most, but such results would be exceptional and still not that large.
(3) The effect of incorporating the "Effort Curve" too should also be quite small, no? (At least, proportionately, in terms of its effect on the ratings of the best and worst players.) Suppose you are a LeBron James - a very good player, playing for a very good team. As a result of his and team's efforts, he finds himself playing a lot with his team in the lead. Let's say his average possession sees him playing with a 10 point lead. So, this should lead to a 0.35 (upward?) adjustment to his rating? If this is correct, such an adjustment is not nothing (and there is also the issue of how one should interpret this factor, but that is another discussion) but it isn't much.
(4) And this leaves the Coaching factor. In another string, I expressed concerns about these estimates, and I won't repeat those arguments here. But I do have an econometric question. To what extent are the coaching estimates misleading for their picking up above/below-average player development not owing to coaching "intervention"? I am supposing that the supposed greatness of Scott Brooks is in significant measure an "arbitrary" subtraction of value from Kevin Durant (what for example didn't occur with LBJ for the timing of the Cavs coaching changes).
(5) Finally, the question of "error of estimation". It sure would be interesting to see what the RMSEs were for xRAPM estimates, with the sequential inclusion of the above modifications.
Re: The 'effect of being up X'
No priors, but more years than before, which is essentially almost the same as using priorsschtevie wrote:(1) Now pure RAPM, not xRAPM (i.e. no priors, neither box-score nor previous year estimates)
Yes(2) Aging Curve effects are incorporated.
Yes(3) Effort Curve (Line?) effects (the topic here) are incorporated.
I derive the coach ratings from one large regression over all data. When I compute the ratings for this season the coaches aren't a variable anymore, their rating just gets added in(4) Coaches have been added to the regression as well.
Yes(3) RSMEs (a suggestion of standard errors) are provided.
YesIs that it?
When I have the time and motiviation I'll post some of the variants, e.g. how the ratings change when I delete one of the the adjustments. There are other things I'd rather do, though; so it might take a whileAssuming so, the question I have, and I am supposing that I am not alone, is what would be the independent effect of each of (or sub-group of) these factors when incorporated in the xRAPM framework?
Remember that this APM variant spans over 3 seasons. You are correct that 'playing disproportionately with much older or younger teammates would be affected most' and yes, the effect is definitely not gigantic(2) Aging Curve effects would be very small. What the curve shows are very small annual changes (approx. 0.2 points per year up then down) for the vast majority of players (ages 22 to 33) who play the vast majority of possession. Players playing disproportionately with much older or younger teammates would be affected most, but such results would be exceptional and still not that large.
As I said I only posted the results for influence on away offense. Home offense is a similar picture, so I guess the adjustment for certain players, per 200 possessions (assuming a game has ~190), could be more like 0.7 when assuming a 10-point lead on average (which might be a little high, even for LeBron). I'm not entirely sure though, need to run the numbers(3) The effect of incorporating the "Effort Curve" too should also be quite small, no? (At least, proportionately, in terms of its effect on the ratings of the best and worst players.) Suppose you are a LeBron James - a very good player, playing for a very good team. As a result of his and team's efforts, he finds himself playing a lot with his team in the lead. Let's say his average possession sees him playing with a 10 point lead. So, this should lead to a 0.35 (upward?) adjustment to his rating? If this is correct, such an adjustment is not nothing (and there is also the issue of how one should interpret this factor, but that is another discussion) but it isn't much.
I think you can just divide the coach rating by 5 to get the impact of the coaches' rating on the player rating. Obviously, if a player has played for more than 1 coaches, you'd have to build a weighed mean for the coach rating(4) And this leaves the Coaching factor. In another string, I expressed concerns about these estimates, and I won't repeat those arguments here. But I do have an econometric question. To what extent are the coaching estimates misleading for their picking up above/below-average player development not owing to coaching "intervention"? I am supposing that the supposed greatness of Scott Brooks is in significant measure an "arbitrary" subtraction of value from Kevin Durant (what for example didn't occur with LBJ for the timing of the Cavs coaching changes).