Older play by play data

Home for all your discussion of basketball statistical analysis.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Older play by play data

Post by J.E. »

I've uploaded some of my older PBP data. It's in the same format, I think, as basketballvalue.com's files, except that it's one file per game instead of one big file.
These were downloaded from ESPN.com

For those who were interested, please download http://stats-for-the-nba.appspot.com/PBP/2002.rar
If everything's good, I'll upload the rest of them
brewers7
Posts: 3
Joined: Mon May 09, 2011 8:02 am

Re: Older play by play data

Post by brewers7 »

Looks good to me...

Thanx...
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Older play by play data

Post by mystic »

Do you derive the starting lineups for each game via those play-by-play files or do you have additional files for that?
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Older play by play data

Post by J.E. »

mystic wrote:Do you derive the starting lineups for each game via those play-by-play files or do you have additional files for that?
I use the PBP. It might be less work to crawl bbref (or similar) for the BoxScores though
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Older play by play data

Post by mystic »

Well, that means you are using those pbp files to generate matchup file/s? In such a case would it be possible to upload the matchup files too?
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Older play by play data

Post by J.E. »

http://stats-for-the-nba.appspot.com/PBP/m2002.rar

Those contain nothing but gameIDs, playerIDs, points scored and possessions though. No player names or anything else that bbv puts into those files. They have the same format though, so you'll see lots of 0's

Let me know if that works for you and I'll upload the playerID file for 2005 and earlier and the rest of the matchup files
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Older play by play data

Post by mystic »

Ok.

Just for the clarification: GameID, Time, 3x 0, 5x PlayerID home, 5x PlayerID away, 14x 0, home team score, away team score, 2x 0, home team possession, away team possession

I assume you are using negative PlayerIDs for players before 2005/06. Is that correct?

Overall that is perfect in that way and I would appreciate if, you could upload the other years and the file with the PlayerIDs. Thanks in advance.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Older play by play data

Post by J.E. »

mystic wrote:Just for the clarification: GameID, Time, 3x 0, 5x PlayerID home, 5x PlayerID away, 14x 0, home team score, away team score, 2x 0, home team possession, away team possession
I guess that's correct. To be 100% sure you can look it up in bbv's matchupfiles

http://stats-for-the-nba.appspot.com/PBP/m2003.rar
http://stats-for-the-nba.appspot.com/PBP/m2004.rar
http://stats-for-the-nba.appspot.com/PBP/m2005.rar
http://stats-for-the-nba.appspot.com/PBP/m2006.rar
http://stats-for-the-nba.appspot.com/PBP/players05.txt

The playerfile, even though it is named '05', contains players from all those years, but only those that didn't already have a bbv playerID

Once again, '06 is just playoffs, bbv has the regular season
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Older play by play data

Post by mystic »

Is it possible that the 2002er matchups have the wrong score? I don't see any 3pointer made for any team within the files, but the pbp shows multiple 3pt made. For example:

Code: Select all

20020204MINSAS	00:39:44	0	0	0	-204	26	30	321	-214	22	285	288	-222	271	0	0	0	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	1	0	

Code: Select all

20020204MINSAS	62	00:39:44	[SAS] Tony Parker made Three Point Jumper. Assisted by Tim Duncan
As you can see the matchup file counts only 2, but the pbp shows a 3pt made for Parker.


The pbp data from the games played in November 2001 does not contain any 3pt made despite the fact that 3pt shots were made in those respective games.


Thanks for uploading the data. That is really, really helpful.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Older play by play data

Post by J.E. »

http://stats-for-the-nba.appspot.com/PBP/all.rar

Should be fixed now. Everything's in one file now, too
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Older play by play data

Post by mystic »

There is still something wrong with the matchup file. In some cases the matchup file doesn't contain any informations for a whole quarter.

Three examples (first pbp then matchup):

Code: Select all

20020212WASLAL	102	00:36:13	[LAL] Devean George Defensive Rebound
20020212WASLAL	103	00:36:01	[LAL] Kobe Bryant missed Two Point Jumper.

20020212WASLAL	00:36:13	0	0	0	515	32	-179	186	190	-85	322	-110	-19	270	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	
20020212WASLAL	00:23:49	0	0	0	515	186	-3	300	-38	-294	-134	270	-110	-103	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	


20020212SASSAC	99	00:36:05	[SAS] David Robinson Defensive Rebound
20020212SASSAC	100	00:36:00	[SAS] Tony Parker made Two Point Jumper. 

20020212SASSAC	00:36:05	0	0	0	68	9	-249	67	44	-204	26	30	-248	-214	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	
20020212SASSAC	00:23:38	0	0	0	67	44	-249	68	9	-204	26	30	-248	-214	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	


20020213CHICHA	240	00:24:02	[CHA] Baron Davis Bad Pass
20020213CHICHA	241	00:24:00	[CHI] Eddie Robinson missed Three Point Jumper. 

20020213CHICHA	00:24:02	0	0	0	-157	66	-145	-26	87	61	235	-83	145	-236	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	
20020213CHICHA	00:11:42	0	0	0	-166	-6	-82	-227	51	-250	-178	292	118	235	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	
It seems to be related to the end of the quarter somehow.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Older play by play data

Post by J.E. »

mystic wrote:There is still something wrong with the matchup file. In some cases the matchup file doesn't contain any informations for a whole quarter.
That means the parser wasn't able to figure out who started the quarter. It happens sometimes because the PBP isn't perfect. You'll get things like
"19:02 Mike James makes two pointer"
"19:00 Gilbert Arenas gets replaced by Mike James"

or similar
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Older play by play data

Post by mystic »

Yeah, looks like the pbp missed substitutions. How many of those skipped quarters occured? Do you have any information about that?

Wayne Winston presented once a "Player of the decade" and his APM goes back to the 1999/2000 season. No idea, but did anyone ever tried to contact him about the raw data?
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Older play by play data

Post by J.E. »

I'm currently in the process of crawling/parsing bbr's PBP data. I'm doing this because

a) it seems it's not missing any games, ever
b) goes back to '00/'01
c) has shot distance
d) player names will not be confused anymore because there's always a link to the bbr player page next to the name

Format is
Time TAB Team TAB Action TAB Points scored (if any)
Action field contains links to the bbr sites of all involved players. Makes it look ugly but that way it's better for parsing

If you see anything wrong with them, tell me

More years and corresponding matchupfiles will follow at some point

http://stats-for-the-nba.appspot.com/PBP/2001.rar
http://stats-for-the-nba.appspot.com/PBP/2002.rar
Post Reply