The 'effect of being up X'

DSMok1 · Post by **DSMok1** » Mon Feb 24, 2014 2:56 pm

xkonk wrote:Some of the patterns in the data look extremely similar - the bounces from -2 to 1, the dip at 10, etc. I can see the values aren't exactly the same, but is this maybe more similar than we should expect? I know that's a vague question, but I would have guessed that you'd see something a little different just from cutting the sample in half.

With that sample size, I'd expect the split half correlation to be pretty high.

schtevie · Post by **schtevie** » Mon Feb 24, 2014 4:37 pm

Jeremias, a couple of preliminary questions:

(1) You say you entered binary variables for the range +/- 57, but the graphs only show +/- 25. Why the omission of the extremes?

(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential. Might there be any relevant econometric issues arising from lumping line-up times of different duration?

nbo2 · Post by **nbo2** » Mon Feb 24, 2014 4:48 pm

schtevie wrote:
(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential. Might there be any relevant econometric issues arising from lumping line-up times of different duration?

I agree with schtevie: shouldn't it ideally be possession by possession?

Correct me if I'm wrong, but wouldn't the weighted average not take into account whether a lineup combination was up 2 for a stretch of possessions, vs. up 2, then up 4, then up 6, then up 4 and back to 2 again? Couldn't these result in different values?

bbstats · Post by **bbstats** » Mon Feb 24, 2014 5:11 pm

J.E. wrote:
bbstats wrote:Have you considered using this as another RAPM variable (if it's linear)? That would go a long way in helping adjust player values for garbage time.
Yes. This is already a variable in my current RAPM calculation. I still need to do some more research to find out whether it's a good idea to give less weight to garbage time possessions

Kind of confused here....are you or aren't you?

If "current point margin" is already a variable, then I'd say you are weighing garbage time.

J.E. · Post by **J.E.** » Mon Feb 24, 2014 5:25 pm

schtevie wrote:Why the omission of the extremes?

Sample size is small for the extremes and thus coefficients can be a) all over the place and b) very large, messing with the graph scale, thus making it harder to spot values in the graph

(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential.

The coefficients are not possession-weighted, also not an average.

What I'm doing is I'm adding binary variables for "point differential_before_the_possession_takes_place" to the regression. These binary variables work in the same way as the RAPM player binary variables. If you will, 'Player_UP_X' becomes a 6th man in the regression (just like I do things with coaches)

shouldn't it ideally be possession by possession?

Everything is possession by possession

If "current point margin" is already a variable, then I'd say you are weighing garbage time.

In regression analysis you can give different weights to different observations. See Wikipedia
I'm not doing that at the moment

talkingpractice · Post by **talkingpractice** » Mon Feb 24, 2014 6:15 pm

If it's at all helpful or interesting, we've (ofc) tried out various ways of operationalizing a garbage time cutoff of some sort in our RAPM/similar models, and always find that any benefit (to out of sample predictive power) is slim to none. This is probably due to the shrinking sample sizes from doing so, I assume.

J.E. · Post by **J.E.** » Mon Feb 24, 2014 6:28 pm

talkingpractice wrote:If it's at all helpful or interesting, we've (ofc) tried out various ways of operationalizing a garbage time cutoff of some sort in our RAPM/similar models, and always find that any benefit (to out of sample predictive power) is slim to none. This is probably due to the shrinking sample sizes from doing so, I assume.

Yeah that sure helps. Saves me some time. Thanks for the info

DSMok1 · Post by **DSMok1** » Mon Feb 24, 2014 6:35 pm

J.E. wrote:
talkingpractice wrote:If it's at all helpful or interesting, we've (ofc) tried out various ways of operationalizing a garbage time cutoff of some sort in our RAPM/similar models, and always find that any benefit (to out of sample predictive power) is slim to none. This is probably due to the shrinking sample sizes from doing so, I assume.
Yeah that sure helps. Saves me some time. Thanks for the info

Yes, that is interesting information. J.E.--I'd love to see the "effect of being up X" for the 1st half and the 4th quarter, overlaid. Would put to rest "garbage time" as a variable (since you're accounting for who's playing, which is the primary driver of such notions).

AcrossTheCourt · Post by **AcrossTheCourt** » Mon Feb 24, 2014 7:42 pm

J.E. wrote:
AcrossTheCourt wrote:By the way, do you use a variable or adjustment for having no rest (i.e. a game the day before for the home/away team)?
I don't, but might in the future. Although I suspect the "days rest" effects are mostly going to cancel each other out

When I looked at the issue, the "default" was having one day of rest and all other states were tested. The only significant one I found was having no rest, so it's not being canceled out.

schtevie · Post by **schtevie** » Mon Feb 24, 2014 7:53 pm

J.E. wrote:
schtevie wrote:(2) If I am understanding things correctly, the regression coefficients (shown in the graph) are the possession-weighted average of (away) +/- line-up performance at each starting point differential.
The coefficients are not possession-weighted, also not an average.

What I'm doing is I'm adding binary variables for "point differential_before_the_possession_takes_place" to the regression. These binary variables work in the same way as the RAPM player binary variables. If you will, 'Player_UP_X' becomes a 6th man in the regression (just like I do things with coaches)
shouldn't it ideally be possession by possession?
Everything is possession by possession

Please humor me here. An example: So, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2. For the regression then, it is the case that, the 0 dummy has two realizations, the +2 has one, and the -2 also has one. (Conversely, it is not the case that this four possession run at the start of the game is represented by a single 0 entry.) Correct? So, this implies that the graph is actually plotting the marginal "effort" as a function of score differential (as opposed to the average).

J.E. · Post by **J.E.** » Mon Feb 24, 2014 8:39 pm

AcrossTheCourt wrote:
J.E. wrote:
AcrossTheCourt wrote:By the way, do you use a variable or adjustment for having no rest (i.e. a game the day before for the home/away team)?
I don't, but might in the future. Although I suspect the "days rest" effects are mostly going to cancel each other out
When I looked at the issue, the "default" was having one day of rest and all other states were tested. The only significant one I found was having no rest, so it's not being canceled out.

I'm talking from a player metrics point of view. Assuming that everyone plays a more or less similar schedule (in terms of rest days) then adding in the effect of 'X days rest' to the model has no (or little) effect on the final player values, no?

So, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2

That's not a possible sequence. You can't be up 2 after your opponent had the first possession of the game

schtevie · Post by **schtevie** » Mon Feb 24, 2014 9:39 pm

J.E. wrote:
So, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2
That's not a possible sequence. You can't be up 2 after your opponent had the first possession of the game

Aren't possessions defined as successive turns with the ball for each team, i.e. two "half possessions"? As such, this is a possible sequence, no? The game starts 0,0 (home team, away team, say) so this is a lead of 0 at the start of the first possession. Then during the first possession the home team scores 2 and the away team doesn't score so the score is (2,0) or a lead of +2 (defined as home - away). And so on.

Are you defining a possession as what I referred to as a "half possession"? And if so, is this the same definition as displayed in the graphs, implying that with a 10 point lead, say, a player is 3.5 (per 100 half possessions) worse on offense and an additional 3.5 worse on defense?

J.E. · Post by **J.E.** » Mon Feb 24, 2014 10:16 pm

schtevie wrote:
J.E. wrote:
So, let's suppose we are at the beginning of a game, with the following point differentials realized at the beginning of the first four possessions, whereafter there is a lineup change: 0, +2, 0, -2
That's not a possible sequence. You can't be up 2 after your opponent had the first possession of the game
Aren't possessions defined as successive turns with the ball for each team

To me a game has ~188 full possessions

Are you defining a possession as what I referred to as a "half possession"? And if so, is this the same definition as displayed in the graphs, implying that with a 10 point lead, say, a player is 3.5 (per 100 half possessions) worse on offense and an additional 3.5 worse on defense?

No. It says offensive efficiency for a 5-man-unit (away), up 10, is 3.5 points per 100 poss. worse than if it were tied. How much the home offense gains by being down 10.. you'll have to look that up in a seperate graph for the home team (haven't posted it yet, it's slightly different, tells the same overall story though).

If you want to break it down to the player level and you assume(!) the effect comes in equal parts from the defense being more lazy and the offense trying harder, you'd get an effect of 3.5/10 = 0.35 per 100 poss. for each offensive player

schtevie · Post by **schtevie** » Wed Feb 26, 2014 5:25 pm

Jeremias, thanks for this clarification, what is a good segue for what I hope is a general clarification/contextualization of all the recent modifications to past procedure, embodied in http://stats-for-the-nba.appspot.com/ratings/2014.html. Perhaps I am missing something, but as I understand things, your 2014 results differ from previous years' estimates in the following way:

(1) Now pure RAPM, not xRAPM (i.e. no priors, neither box-score nor previous year estimates)
(2) Aging Curve effects are incorporated.
(3) Effort Curve (Line?) effects (the topic here) are incorporated.
(4) Coaches have been added to the regression as well.
(3) RSMEs (a suggestion of standard errors) are provided.

Is that it? Assuming so, the question I have, and I am supposing that I am not alone, is what would be the independent effect of each of (or sub-group of) these factors when incorporated in the xRAPM framework?

These are my conjectures:

(1) A 2014 xRAPM would not look dissimilar in range and general characteristics to the years preceding it. As such, one could be absolutely sure that a 6'6" SF/SG wouldn't be topping the list, owing to his being the highest ranked defender in the NBA.

(2) Aging Curve effects would be very small. What the curve shows are very small annual changes (approx. 0.2 points per year up then down) for the vast majority of players (ages 22 to 33) who play the vast majority of possession. Players playing disproportionately with much older or younger teammates would be affected most, but such results would be exceptional and still not that large.

(3) The effect of incorporating the "Effort Curve" too should also be quite small, no? (At least, proportionately, in terms of its effect on the ratings of the best and worst players.) Suppose you are a LeBron James - a very good player, playing for a very good team. As a result of his and team's efforts, he finds himself playing a lot with his team in the lead. Let's say his average possession sees him playing with a 10 point lead. So, this should lead to a 0.35 (upward?) adjustment to his rating? If this is correct, such an adjustment is not nothing (and there is also the issue of how one should interpret this factor, but that is another discussion) but it isn't much.

(4) And this leaves the Coaching factor. In another string, I expressed concerns about these estimates, and I won't repeat those arguments here. But I do have an econometric question. To what extent are the coaching estimates misleading for their picking up above/below-average player development not owing to coaching "intervention"? I am supposing that the supposed greatness of Scott Brooks is in significant measure an "arbitrary" subtraction of value from Kevin Durant (what for example didn't occur with LBJ for the timing of the Cavs coaching changes).

(5) Finally, the question of "error of estimation". It sure would be interesting to see what the RMSEs were for xRAPM estimates, with the sequential inclusion of the above modifications.

J.E. · Post by **J.E.** » Fri Feb 28, 2014 2:49 pm

schtevie wrote:(1) Now pure RAPM, not xRAPM (i.e. no priors, neither box-score nor previous year estimates)

No priors, but more years than before, which is essentially almost the same as using priors

(2) Aging Curve effects are incorporated.

Yes

(3) Effort Curve (Line?) effects (the topic here) are incorporated.

Yes

(4) Coaches have been added to the regression as well.

I derive the coach ratings from one large regression over all data. When I compute the ratings for this season the coaches aren't a variable anymore, their rating just gets added in

(3) RSMEs (a suggestion of standard errors) are provided.

Yes

Is that it?

Yes

Assuming so, the question I have, and I am supposing that I am not alone, is what would be the independent effect of each of (or sub-group of) these factors when incorporated in the xRAPM framework?

When I have the time and motiviation I'll post some of the variants, e.g. how the ratings change when I delete one of the the adjustments. There are other things I'd rather do, though; so it might take a while

(2) Aging Curve effects would be very small. What the curve shows are very small annual changes (approx. 0.2 points per year up then down) for the vast majority of players (ages 22 to 33) who play the vast majority of possession. Players playing disproportionately with much older or younger teammates would be affected most, but such results would be exceptional and still not that large.

Remember that this APM variant spans over 3 seasons. You are correct that 'playing disproportionately with much older or younger teammates would be affected most' and yes, the effect is definitely not gigantic

(3) The effect of incorporating the "Effort Curve" too should also be quite small, no? (At least, proportionately, in terms of its effect on the ratings of the best and worst players.) Suppose you are a LeBron James - a very good player, playing for a very good team. As a result of his and team's efforts, he finds himself playing a lot with his team in the lead. Let's say his average possession sees him playing with a 10 point lead. So, this should lead to a 0.35 (upward?) adjustment to his rating? If this is correct, such an adjustment is not nothing (and there is also the issue of how one should interpret this factor, but that is another discussion) but it isn't much.

As I said I only posted the results for influence on away offense. Home offense is a similar picture, so I guess the adjustment for certain players, per 200 possessions (assuming a game has ~190), could be more like 0.7 when assuming a 10-point lead on average (which might be a little high, even for LeBron). I'm not entirely sure though, need to run the numbers

(4) And this leaves the Coaching factor. In another string, I expressed concerns about these estimates, and I won't repeat those arguments here. But I do have an econometric question. To what extent are the coaching estimates misleading for their picking up above/below-average player development not owing to coaching "intervention"? I am supposing that the supposed greatness of Scott Brooks is in significant measure an "arbitrary" subtraction of value from Kevin Durant (what for example didn't occur with LBJ for the timing of the Cavs coaching changes).

I think you can just divide the coach rating by 5 to get the impact of the coaches' rating on the player rating. Obviously, if a player has played for more than 1 coaches, you'd have to build a weighed mean for the coach rating

APBRmetrics

The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'

Re: The 'effect of being up X'