Let me first explain the data set I am generating. I'm going through Synergy video, specifically for the following Golden State Warriors players: Ellis, Curry, Lee, Wright, Williams, Radmanovic. I'm only looking at spot-up attempts right now, as they appear to be by far the easiest to assign assists. Almost all spot-up attempts appear to be assisted (if the player makes it) or "potentially assisted" if the player misses. Each of the 6 players I'm tracking had at least 100 spot-up attempts. For each attempt, I record whether the shot was made or not (1 or 0), the type of shot (2 or 3 pt), and who the passer was. The passer can be anyone on the team, although it is typically one of those 6, as well.
Typical results look like this:
Code: Select all
GameID ShotID Q Shooter Make Type Passer
1 PORGSW041311 1 1 30 0 3 10
2 PORGSW041311 2 1 55 1 3 30
3 PORGSW041311 3 1 55 0 3 30
4 PORGSW041311 4 3 1 0 3 55
5 PORGSW041311 5 3 30 1 3 1
6 PORGSW041311 6 3 1 1 3 55
7 PORGSW041311 7 3 1 0 3 30
So, my first thought is that I have a dichotomous dependent variable (Make) and three categorical variables (Shooter, Passer, and Type). I know that ANOVA is usually run with a metric dependent variable, but would it also make sense to use it with one that is dichotomous? Would a logistic regression be useful here? Also, would it be interesting to model interactions between players/shooters/types?
I think it might be really useful to bring in Bayesian tools to this problem, but I'm still in the learning phase. The sample sizes in all but a handful of cases are fairly small, which is why I mention this.
Well, I thought I would put this problem out there, and see if you guys have thoughts. Obviously, in the meantime, I can do plenty of descriptive work with the data. But I think it would be more powerful with some hypothesis testing. Going forward, I'd like to look at other types of plays, especially pick and roll, post plays, and cuts to the basket.