Play-By-Play Substitutions
Play-By-Play Substitutions
I have made a scraper to pull information from SI.com. Now I am working on figuring out who is on the floor at a given time. With some SQL I think I will be able to get the first quarter figured out. What I am wondering is how everyone else handles the beginning of each other quarter (especially the second half). From all of the play-by-play sources I have looked at, they do not show substitutions or starting lineups for quarters other than the first. I saw on a thread a way to hit NBA.com to figure out who is on the floor at a given time. Do people use that for the beginning of each quarter and put that into their play-by-play dataset? Any help would be very appreciated.
			
			
									
						
										
						Re: Play-By-Play Substitutions
I use the SI feed as well. Substitutions are tricky. End-of-quarter subs are pretty easy to figure out. I basically walk through the game, tracking who is "on the court" at any given time. If I see a player "on court" who wasn't subbed on explicitly, it means that he was subbed on at the beginning of the quarter. Similarly, if I see a player subbed in who I thought was "off the court", it means they were subbed off at the end of the last quarter in which they were seen.
Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely.
Keep in mind that there are some out-right errors in the SI PBP. It's rare, but it does happen. So then you'll have the wrong player's name registered in the substitution event. There's basically no way around this unless you use secondary data sources to verify, which is more complicated.
Another lesson I learned is that you can't assume that a technical foul means that the player is on the court - technicals can be awarded to players on the bench.
			
			
									
						
										
						Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely.
Keep in mind that there are some out-right errors in the SI PBP. It's rare, but it does happen. So then you'll have the wrong player's name registered in the substitution event. There's basically no way around this unless you use secondary data sources to verify, which is more complicated.
Another lesson I learned is that you can't assume that a technical foul means that the player is on the court - technicals can be awarded to players on the bench.
Re: Play-By-Play Substitutions
Thanks for the response. I understand what you are saying by looking at player events. However, how would I determine which player he subbed in for? I was hoping to do some lineup analysis as well as per minute numbers for players.
Here is the post I was referencing:
"Most of the time it's easy to figure out who's on the court. For the times you don't know, or if you have conflicts, you can figure out pretty well who was playing at a given moment in the game using the stats.nba.com API. Just tweak the StartRange and EndRange to the appropriate time (it's based on seconds elapsed in the game x 10). Whoever shows up in the player stats, they're probably on the court.
http://stats.nba.com/stats/boxscore?Gam ... angeType=2"
-kpascual
I am going to try to do that to get who started each quarter and then use the substitution data from SI to store who is on the floor for every play.
			
			
									
						
										
						Here is the post I was referencing:
"Most of the time it's easy to figure out who's on the court. For the times you don't know, or if you have conflicts, you can figure out pretty well who was playing at a given moment in the game using the stats.nba.com API. Just tweak the StartRange and EndRange to the appropriate time (it's based on seconds elapsed in the game x 10). Whoever shows up in the player stats, they're probably on the court.
http://stats.nba.com/stats/boxscore?Gam ... angeType=2"
-kpascual
I am going to try to do that to get who started each quarter and then use the substitution data from SI to store who is on the floor for every play.
- 
				AcrossTheCourt
- Posts: 237
- Joined: Sat Feb 16, 2013 11:56 am
Re: Play-By-Play Substitutions
"Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely."
It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)
			
			
									
						
										
						It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)
Re: Play-By-Play Substitutions
Do you have any update on the 99 and 00 RAPM data? I really loved the 97 and 98 RAPM's.AcrossTheCourt wrote:"Of course, this isn't 100% full-proof, if a player manages to play a full quarter without registering on the play-by-play, but that is extremely unlikely."
It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)
Re: Play-By-Play Substitutions
Excellent point - with the 5 minute overtime periods it is much more likely to occur. That being said, for my application, having 100% accurate minute totals and substitution tracking is not that important, so it's a problem I can live with for now.AcrossTheCourt wrote:It happens during overtime a handful of times, at least from what I've seen. I think there was one quarter where I've seen it happen, but I'm not sure. I've seen some missed substitutions on my stats.NBA pbp data. It doesn't happen often, but look out for it. (The 90's data is probably a lot worse in quality than the current stuff though.)
Re: Play-By-Play Substitutions
Did you ever get this working? I'm giving it a shot, but I can't find a robust manner to use this API yet for detecting end-of-quarter substitutions. Even though it supports the StartRange and EndRange parameters in 0.1 second precision, the underlying data doesn't appear to be queryable with that precision (just play around with the times and you'll see). This means that for some valid time ranges, there may be no (zero) players listed as on the court. It also means that if you change the time ranges slightly (by seconds), you'll likely end up with the same query result. I can seemingly reliably detect who was subbed on at the start of the new quarter (if you query a time range for the last second of the previous quarter, new players will show up with "0:00" under the "Minutes" column). However I am unable to detect who was sent to the bench in their place. Anyone have more experience with this API?nileriver wrote:Thanks for the response. I understand what you are saying by looking at player events. However, how would I determine which player he subbed in for? I was hoping to do some lineup analysis as well as per minute numbers for players.
Here is the post I was referencing:
"Most of the time it's easy to figure out who's on the court. For the times you don't know, or if you have conflicts, you can figure out pretty well who was playing at a given moment in the game using the stats.nba.com API. Just tweak the StartRange and EndRange to the appropriate time (it's based on seconds elapsed in the game x 10). Whoever shows up in the player stats, they're probably on the court.
http://stats.nba.com/stats/boxscore?Gam ... angeType=2"
-kpascual
I am going to try to do that to get who started each quarter and then use the substitution data from SI to store who is on the floor for every play.
Re: Play-By-Play Substitutions
For future reference, I figured this out. Querying that API for short time ranges (e.g. less than a minute) is not reliable. So in the end, I just query the boxscore for each entire quarter and based on that, I know which players as the floor for a given quarter. Given that information and the play-by-play, I can reliably reconstruct the between-quarter substitutions. The advantage of this method over using only the play-by-play is that it is robust to the scenario where a player plays an entire period (e.g. 5 minute overtime) without registering on the play by play. It's a very unlikely scenario, but one that could not be detected by just looking at the PBP.
			
			
									
						
										
						Re: Play-By-Play Substitutions
The play-by-play data sold on nbastuffer is pretty granular. It shows the lineup states at every action in the game (shots, rebounds, subs, timeouts, end/start of quarters, etc). It's not difficult to determine who subs in for whom and how many unique lineups exist in each game. 
My problem right now is figuring out player matchups.Starter matchups are easy; just look at which position the player starts the game. But it's difficult to determine bench mathcups. I don't have and can't find any data that shows at what position the sub enters the game. If the sub is listed as a center and he subs in for a center, then the problem is trivial, but what happens when a coach want's to use a smaller lineup and subs the center for a forward/guard, and the power forward on the court shifts to the center spot? He now has a new matchup and it's necessary to have data that captures this in order to build comprehensive player-matchup model.
Anyone know where to find this data? I feel like it's too obvious to not exist anywhere, but I'm having no luck finding it. And I'm not sure if writing generalized rules to account for subs/position changes would be accurate or if the errors would add too much noise (though I haven't yet thought out the rules, they could be fairly simple).
			
			
									
						
										
						My problem right now is figuring out player matchups.Starter matchups are easy; just look at which position the player starts the game. But it's difficult to determine bench mathcups. I don't have and can't find any data that shows at what position the sub enters the game. If the sub is listed as a center and he subs in for a center, then the problem is trivial, but what happens when a coach want's to use a smaller lineup and subs the center for a forward/guard, and the power forward on the court shifts to the center spot? He now has a new matchup and it's necessary to have data that captures this in order to build comprehensive player-matchup model.
Anyone know where to find this data? I feel like it's too obvious to not exist anywhere, but I'm having no luck finding it. And I'm not sure if writing generalized rules to account for subs/position changes would be accurate or if the errors would add too much noise (though I haven't yet thought out the rules, they could be fairly simple).
Re: Play-By-Play Substitutions
I am working on a project for scraping the player events, the repo is https://github.com/ethanluoyc/statsnba-playbyplay, the functions are sort of already there if you look into the source code. I also addressed the problem of checking out who are the players on the floor by querying the api. It is just that I have not yet got the time to finish the doc and make more refactoring for general usage.
I think it will be a good starting point for us to make something that can be used by everyone?
			
			
									
						
										
						I think it will be a good starting point for us to make something that can be used by everyone?