Need some analysis advice for potential assists

EvanZ · Post by **EvanZ** » Mon Jul 25, 2011 3:03 pm

Hey, everyone. I'm looking for some stats advice, and thought this might be of general interest to others.

Let me first explain the data set I am generating. I'm going through Synergy video, specifically for the following Golden State Warriors players: Ellis, Curry, Lee, Wright, Williams, Radmanovic. I'm only looking at spot-up attempts right now, as they appear to be by far the easiest to assign assists. Almost all spot-up attempts appear to be assisted (if the player makes it) or "potentially assisted" if the player misses. Each of the 6 players I'm tracking had at least 100 spot-up attempts. For each attempt, I record whether the shot was made or not (1 or 0), the type of shot (2 or 3 pt), and who the passer was. The passer can be anyone on the team, although it is typically one of those 6, as well.

Typical results look like this:

Code: Select all

GameID	ShotID	Q	Shooter	Make	Type	Passer
1	PORGSW041311	1	1	30	0	3	10
2	PORGSW041311	2	1	55	1	3	30
3	PORGSW041311	3	1	55	0	3	30
4	PORGSW041311	4	3	1	0	3	55
5	PORGSW041311	5	3	30	1	3	1
6	PORGSW041311	6	3	1	1	3	55
7	PORGSW041311	7	3	1	0	3	30

I'm keeping track of the GameID and quarter as a reference, but for now, those won't be factors in the analysis.

So, my first thought is that I have a dichotomous dependent variable (Make) and three categorical variables (Shooter, Passer, and Type). I know that ANOVA is usually run with a metric dependent variable, but would it also make sense to use it with one that is dichotomous? Would a logistic regression be useful here? Also, would it be interesting to model interactions between players/shooters/types?

I think it might be really useful to bring in Bayesian tools to this problem, but I'm still in the learning phase. The sample sizes in all but a handful of cases are fairly small, which is why I mention this.

Well, I thought I would put this problem out there, and see if you guys have thoughts. Obviously, in the meantime, I can do plenty of descriptive work with the data. But I think it would be more powerful with some hypothesis testing. Going forward, I'd like to look at other types of plays, especially pick and roll, post plays, and cuts to the basket.

DSMok1 · Post by **DSMok1** » Mon Jul 25, 2011 5:02 pm

I don't know much about ANOVA, but it looks like this setup would work well with a Logistic Regression in R.

schtevie · Post by **schtevie** » Mon Jul 25, 2011 5:09 pm

I am curious as to what questions you are hoping to answer by taking this approach.

EvanZ · Post by **EvanZ** » Mon Jul 25, 2011 5:39 pm

schtevie wrote:I am curious as to what questions you are hoping to answer by taking this approach.

Well, the most important question here is whether players shoot the ball better (worse) when they receive it from certain passers. Likewise, do certain players improve the shooting efficiency of their teammates by their passing? Is it a specific effect due to the interaction of certain pairs of players, or do some players have a real effect (positive or negative), in general, on their teammates? Are certain players better at 2-pt passing or 3-pt passing? And so on...

I'll give a (perhaps, surprising) example to illustrate the point (this is with 41 games analyzed so far):

Dorell Wright shoots 44.9% (22/49) on 3pters after receiving a pass from Monta Ellis, but only 32.8% (19/58) from David Lee, and 18.6% (8/43) from Stephen Curry.

Does Dorell Wright shoot the ball better when Monta passes it to him?

schtevie · Post by **schtevie** » Mon Jul 25, 2011 7:34 pm

Evan, I get that. But without controlling for the opposing team, it's particular line-up, the own team line-up (passer excluded), the particular offensive set (or failure thereof), the time remaining on the shot clock, what can one really expect, never mind the small sample sizes?

mtamada · Post by **mtamada** » Mon Jul 25, 2011 7:40 pm

DSMok1 wrote:I don't know much about ANOVA, but it looks like this setup would work well with a Logistic Regression in R.

Yes, although since he's looking at six dependent variables, there can be a gain from using a model which explicitly takes that into account, i.e. a simultaneous equations version of logistic regression. Also, most of the explanatory variables will be the same across the equations, e.g. Stephon Curry will likely appear as an explanatory variable in all of the equations (except his own obviously, as well as players that he doesn't pass to if any). So, six equations with some variables appearing across equations: a classic simultaneous equations situation.

I haven't actually seen a simultaneous equations version of logistic regression but I'm sure it's been done.

There's also log-linear models. There are not to be confused with what econometricians call a certain functional form, but instead are a sort of super version of crosstab tables, allowing for more than two dimensions. Here we'd have a 6 x 6 x 2 "table" (or "cube"); six passers, six shooters, two possible outcomes. A variety of hypotheses can be tested, e.g. testing for interaction terms (maybe Curry shoots well and Ellis's pass recipients shoot poorly, but Curry shoots extra well when Ellis passes to him unlike the other recipients). However I don't know if log-linear models can be set up to have an explicit dependent variable, or to take advantage of that dependency. Also I've only read about these models, haven't actually ever used one. Shelby Haberman was one of the statisticians who developed loglinear models ... actually peeking at one of the first google hits, it looks like one of his data sets was about cancer survival measured as a binary outcome, so there's a binary dependent variable right there.

EvanZ · Post by **EvanZ** » Mon Jul 25, 2011 8:00 pm

schtevie wrote:Evan, I get that. But without controlling for the opposing team, it's particular line-up, the own team line-up (passer excluded), the particular offensive set (or failure thereof), the time remaining on the shot clock, what can one really expect, never mind the small sample sizes?

Well, these are all real concerns of course. I think by focusing on individual play types, in this case spot-up attempts, it certainly helps. Out of the 500 or so plays I have analyzed so far, surprisingly, only a handful (definitely less than 10) have involved beating the shot clock. It's pretty noticeable when that happens, and I usually disregard those plays, if it appears to be a desperation shot/pass situation.

Even given all the issues, I think there is something to be learned here, especially if one can connect the results to some scouting. For example, it seems to me that the reason Dorell shoots so much better off of Monta's passes, is because they often come as the result of Monta driving and kicking it out, whereas Curry passes to Dorell more on the perimeter. It's almost more like he's just swinging the ball around, and Dorell decides to shoot it, rather than an intentional pass trying to set up Dorell, if that makes any sense. I know a player can't really "intend" the receiver to shoot or pass up a shot, but I can imagine Curry thinking to himself that Dorell shouldn't have taken the shot.

I also notice that Dorell tends to shoot a lot better when receiving the ball in front of his body rather than to either side. The former tends to occur more frequently when Monta drives, whereas predictably, when Curry passes from the side, Dorell has to turn to receive the ball and then shoot. In future iterations, I might just chart where the player receives the ball before he shoots. I'm sure there is a strong correlation there.

EvanZ · Post by **EvanZ** » Mon Jul 25, 2011 8:04 pm

mtamada wrote: There are not to be confused with what econometricians call a certain functional form, but instead are a sort of super version of crosstab tables, allowing for more than two dimensions. Here we'd have a 6 x 6 x 2 "table" (or "cube"); six passers, six shooters, two possible outcomes. A variety of hypotheses can be tested, e.g. testing for interaction terms (maybe Curry shoots well and Ellis's pass recipients shoot poorly, but Curry shoots extra well when Ellis passes to him unlike the other recipients). However I don't know if log-linear models can be set up to have an explicit dependent variable, or to take advantage of that dependency. Also I've only read about these models, haven't actually ever used one. Shelby Haberman was one of the statisticians who developed loglinear models ... actually peeking at one of the first google hits, it looks like one of his data sets was about cancer survival measured as a binary outcome, so there's a binary dependent variable right there.

Some good ideas here. Thanks.

schtevie · Post by **schtevie** » Mon Jul 25, 2011 8:20 pm

EvanZ wrote:
schtevie wrote:Evan, I get that. But without controlling for the opposing team, it's particular line-up, the own team line-up (passer excluded), the particular offensive set (or failure thereof), the time remaining on the shot clock, what can one really expect, never mind the small sample sizes?
Well, these are all real concerns of course. I think by focusing on individual play types, in this case spot-up attempts, it certainly helps. Out of the 500 or so plays I have analyzed so far, surprisingly, only a handful (definitely less than 10) have involved beating the shot clock. It's pretty noticeable when that happens, and I usually disregard those plays, if it appears to be a desperation shot/pass situation.

Even given all the issues, I think there is something to be learned here, especially if one can connect the results to some scouting. For example, it seems to me that the reason Dorell shoots so much better off of Monta's passes, is because they often come as the result of Monta driving and kicking it out, whereas Curry passes to Dorell more on the perimeter. It's almost more like he's just swinging the ball around, and Dorell decides to shoot it, rather than an intentional pass trying to set up Dorell, if that makes any sense. I know a player can't really "intend" the receiver to shoot or pass up a shot, but I can imagine Curry thinking to himself that Dorell shouldn't have taken the shot.

I also notice that Dorell tends to shoot a lot better when receiving the ball in front of his body rather than to either side. The former tends to occur more frequently when Monta drives, whereas predictably, when Curry passes from the side, Dorell has to turn to receive the ball and then shoot. In future iterations, I might just chart where the player receives the ball before he shoots. I'm sure there is a strong correlation there.

There is much more to a shot clock effect (in theory, and in application too, I believe) than desperation heaves. Working backward, in expectation, the value of each potential shot attempt is worse than the one prior. And though I share your intuition about the importance of where the ball being received, defensive pressure still should matter a lot.

Whatever. More is better. Hope something cool turns up.

EvanZ · Post by **EvanZ** » Mon Jul 25, 2011 8:30 pm

schtevie wrote:
There is much more to a shot clock effect (in theory, and in application too, I believe) than desperation heaves. Working backward, in expectation, the value of each potential shot attempt is worse than the one prior. And though I share your intuition about the importance of where the ball being received, defensive pressure still should matter a lot.

Whatever. More is better. Hope something cool turns up.

Yeah, I actually set out to record the shot clock time, but then realized that not every play on Synergy showed it, so I decided not to do it. Maybe I'll revisit that in the future, though.

Bobbofitos · Post by **Bobbofitos** » Tue Jul 26, 2011 6:15 am

EvanZ wrote:
I'll give a (perhaps, surprising) example to illustrate the point (this is with 41 games analyzed so far):

Dorell Wright shoots 44.9% (22/49) on 3pters after receiving a pass from Monta Ellis, but only 32.8% (19/58) from David Lee, and 18.6% (8/43) from Stephen Curry.

Does Dorell Wright shoot the ball better when Monta passes it to him?

How certain are you that those trends will hold up next year? Although it's not impossible that a Monta Ellis pass is better than a David Lee pass, the sample for each passer to shooter is pretty low. Is this just micro analyzing small samples and drawing conclusions?

That said, this is a case where the video does help inform you, since you probably can "see" good passes vs bad passes...

EvanZ · Post by **EvanZ** » Tue Jul 26, 2011 11:03 am

Bobbofitos wrote: How certain are you that those trends will hold up next year? Although it's not impossible that a Monta Ellis pass is better than a David Lee pass, the sample for each passer to shooter is pretty low. Is this just micro analyzing small samples and drawing conclusions?

That said, this is a case where the video does help inform you, since you probably can "see" good passes vs bad passes...

The primary motivation here is to determine whether this observation is random. That's what I'm asking help to decide. Also, I should note (again) that I've analyzed roughly half the season, so the sample size will roughly double the current total by the time I'm through.

EvanZ · Post by **EvanZ** » Tue Jul 26, 2011 2:47 pm

I think I've changed my mind. Before I said the dependent variable was dichotomous (0 or 1), but I think it makes more sense to treat it as a metric variable (i.e. with real numerical values of 0, 2, or 3). The regression thus involves a metric dependent variable and two categorical predictors (shooter and passer).

The reasoning here is that, say, a particular passer-shooter combination is above average in 2-pt efficiency. Well, that's great, but maybe it is actually worse than another pair that would be more efficient overall, even though the 3-pt efficiency is below average. Does that make sense? In general, long 2-pters are less efficient than 3-pters. The actual value of the shot does matter, so I don't want to lose that in the regression. This will also enable me to reduce the number of predictors from 3 to 2, which may actually improve the robustness of the model.

The difference between the two models in terms of output is that the logistic model would predict the 2-pt or 3-pt FG%, whereas the normal (metric) model will give points per potentially assisted pass (bet you haven't heard that phrase before).

Thoughts?

APBRmetrics

Need some analysis advice for potential assists

Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists

Re: Need some analysis advice for potential assists