Play-By-Play Substitutions

Home for all your discussion of basketball statistical analysis.
Post Reply
nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Play-By-Play Substitutions

Post by nileriver » Sun Feb 09, 2014 5:29 pm

I have made a scraper to pull information from SI.com. Now I am working on figuring out who is on the floor at a given time. With some SQL I think I will be able to get the first quarter figured out. What I am wondering is how everyone else handles the beginning of each other quarter (especially the second half). From all of the play-by-play sources I have looked at, they do not show substitutions or starting lineups for quarters other than the first. I saw on a thread a way to hit NBA.com to figure out who is on the floor at a given time. Do people use that for the beginning of each quarter and put that into their play-by-play dataset? Any help would be very appreciated.

kohanz
Posts: 34
Joined: Fri Jan 04, 2013 6:58 pm
Contact:

Re: Play-By-Play Substitutions

Post by kohanz » Sun Feb 09, 2014 8:04 pm

I use the SI feed as well. Substitutions are tricky. End-of-quarter subs are pretty easy to figure out. I basically walk through the game, tracking who is "on the court" at any given time. If I see a player "on court" who wasn't subbed on explicitly, it means that he was subbed on at the beginning of the quarter. Similarly, if I see a player subbed in who I thought was "off the court", it means they were subbed off at the end of the last quarter in which they were seen.

Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely.

Keep in mind that there are some out-right errors in the SI PBP. It's rare, but it does happen. So then you'll have the wrong player's name registered in the substitution event. There's basically no way around this unless you use secondary data sources to verify, which is more complicated.

Another lesson I learned is that you can't assume that a technical foul means that the player is on the court - technicals can be awarded to players on the bench.

nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Re: Play-By-Play Substitutions

Post by nileriver » Sun Feb 09, 2014 8:35 pm

Thanks for the response. I understand what you are saying by looking at player events. However, how would I determine which player he subbed in for? I was hoping to do some lineup analysis as well as per minute numbers for players.

Here is the post I was referencing:

"Most of the time it's easy to figure out who's on the court. For the times you don't know, or if you have conflicts, you can figure out pretty well who was playing at a given moment in the game using the stats.nba.com API. Just tweak the StartRange and EndRange to the appropriate time (it's based on seconds elapsed in the game x 10). Whoever shows up in the player stats, they're probably on the court.

http://stats.nba.com/stats/boxscore?Gam ... angeType=2"

-kpascual

I am going to try to do that to get who started each quarter and then use the substitution data from SI to store who is on the floor for every play.

AcrossTheCourt
Posts: 237
Joined: Sat Feb 16, 2013 11:56 am

Re: Play-By-Play Substitutions

Post by AcrossTheCourt » Tue Feb 11, 2014 5:20 am

"Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely."

It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)

colts18
Posts: 304
Joined: Fri Aug 31, 2012 1:52 am

Re: Play-By-Play Substitutions

Post by colts18 » Tue Feb 11, 2014 4:24 pm

AcrossTheCourt wrote:"Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely."

It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)
Do you have any update on the 99 and 00 RAPM data? I really loved the 97 and 98 RAPM's.

kohanz
Posts: 34
Joined: Fri Jan 04, 2013 6:58 pm
Contact:

Re: Play-By-Play Substitutions

Post by kohanz » Tue Feb 11, 2014 7:15 pm

AcrossTheCourt wrote:It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)
Excellent point - with the 5 minute overtime periods it is much more likely to occur. That being said, for my application, having 100% accurate minute totals and substitution tracking is not that important, so it's a problem I can live with for now.

kohanz
Posts: 34
Joined: Fri Jan 04, 2013 6:58 pm
Contact:

Re: Play-By-Play Substitutions

Post by kohanz » Tue Dec 01, 2015 8:57 pm

nileriver wrote:Thanks for the response. I understand what you are saying by looking at player events. However, how would I determine which player he subbed in for? I was hoping to do some lineup analysis as well as per minute numbers for players.

Here is the post I was referencing:

"Most of the time it's easy to figure out who's on the court. For the times you don't know, or if you have conflicts, you can figure out pretty well who was playing at a given moment in the game using the stats.nba.com API. Just tweak the StartRange and EndRange to the appropriate time (it's based on seconds elapsed in the game x 10). Whoever shows up in the player stats, they're probably on the court.

http://stats.nba.com/stats/boxscore?Gam ... angeType=2"

-kpascual

I am going to try to do that to get who started each quarter and then use the substitution data from SI to store who is on the floor for every play.
Did you ever get this working? I'm giving it a shot, but I can't find a robust manner to use this API yet for detecting end-of-quarter substitutions. Even though it supports the StartRange and EndRange parameters in 0.1 second precision, the underlying data doesn't appear to be queryable with that precision (just play around with the times and you'll see). This means that for some valid time ranges, there may be no (zero) players listed as on the court. It also means that if you change the time ranges slightly (by seconds), you'll likely end up with the same query result. I can seemingly reliably detect who was subbed on at the start of the new quarter (if you query a time range for the last second of the previous quarter, new players will show up with "0:00" under the "Minutes" column). However I am unable to detect who was sent to the bench in their place. Anyone have more experience with this API?

kohanz
Posts: 34
Joined: Fri Jan 04, 2013 6:58 pm
Contact:

Re: Play-By-Play Substitutions

Post by kohanz » Fri Dec 04, 2015 12:04 am

For future reference, I figured this out. Querying that API for short time ranges (e.g. less than a minute) is not reliable. So in the end, I just query the boxscore for each entire quarter and based on that, I know which players as the floor for a given quarter. Given that information and the play-by-play, I can reliably reconstruct the between-quarter substitutions. The advantage of this method over using only the play-by-play is that it is robust to the scenario where a player plays an entire period (e.g. 5 minute overtime) without registering on the play by play. It's a very unlikely scenario, but one that could not be detected by just looking at the PBP.

yobogoya
Posts: 3
Joined: Tue Dec 08, 2015 10:30 pm

Re: Play-By-Play Substitutions

Post by yobogoya » Wed Dec 09, 2015 11:45 pm

The play-by-play data sold on nbastuffer is pretty granular. It shows the lineup states at every action in the game (shots, rebounds, subs, timeouts, end/start of quarters, etc). It's not difficult to determine who subs in for whom and how many unique lineups exist in each game.

My problem right now is figuring out player matchups.Starter matchups are easy; just look at which position the player starts the game. But it's difficult to determine bench mathcups. I don't have and can't find any data that shows at what position the sub enters the game. If the sub is listed as a center and he subs in for a center, then the problem is trivial, but what happens when a coach want's to use a smaller lineup and subs the center for a forward/guard, and the power forward on the court shifts to the center spot? He now has a new matchup and it's necessary to have data that captures this in order to build comprehensive player-matchup model.

Anyone know where to find this data? I feel like it's too obvious to not exist anywhere, but I'm having no luck finding it. And I'm not sure if writing generalized rules to account for subs/position changes would be accurate or if the errors would add too much noise (though I haven't yet thought out the rules, they could be fairly simple).

ethanluo
Posts: 9
Joined: Sat May 16, 2015 8:14 am

Re: Play-By-Play Substitutions

Post by ethanluo » Fri Mar 11, 2016 2:43 am

I am working on a project for scraping the player events, the repo is https://github.com/ethanluoyc/statsnba-playbyplay, the functions are sort of already there if you look into the source code. I also addressed the problem of checking out who are the players on the floor by querying the api. It is just that I have not yet got the time to finish the doc and make more refactoring for general usage.

I think it will be a good starting point for us to make something that can be used by everyone?

Post Reply