NBA Datasets 2013-14 Season

Home for all your discussion of basketball statistical analysis.
Post Reply
Madey Jay
Posts: 8
Joined: Sat May 25, 2013 4:36 pm

NBA Datasets 2013-14 Season

Post by Madey Jay »

For the hobbyists out there...

This season I'll be posting structured play-by-play data from every regular and post season NBA game at:
http://www.cs.umd.edu/hcil/eventflow/NBA/nbaData.shtml

The datasets are scraped from ESPN.com. I started generating them in order to test the software that my research group is working on, but I figured that other people might be able to make use of them as well. Most notably, I'll be posting an xml breakdown of each possession, including the events that occurred and the players on the court. More details can be found on the site.

Feel free to use the datasets as you please, and keep me posted if you do anything "cool" with them (drop me a line at madeyjay@umd.edu). Certainly my advisor would be thrilled to know that the weekend I spent building the parser was a worthwhile venture.

Cheers,
Megan
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: NBA Datasets 2013-14 Season

Post by v-zero »

Awesome, awesome gift to the community! Can I ask how you went about determining who was on the court during each possession? On that note, what sort of error rate do you think these have on that?
Madey Jay
Posts: 8
Joined: Sat May 25, 2013 4:36 pm

Re: NBA Datasets 2013-14 Season

Post by Madey Jay »

The line-ups were no peach. It's essentially done in two phases: the first phase constructs an initial 5-player line-up based on the player associated with each event (if there is one). The second phase uses the substitution events to swap players in and out of that line-up. The whole process resets at the start of each period.

Actually, I was surprised at how accurate the line-up construction is - or at least, appears to be. The only indication I get that something has gone wrong is when a player subs in, but the player who subs out isn't in the line-up. I've run the line-up constructor now on about 100 games though, and only three of them have had this problem. In most cases, it's due to an anomalous series of substitutions, early in the period (when the initial line-up is being constructed), involving a player with a weird name (thanks for nothing Luc Richard Mbah a Moute). So not terribly often.
v-zero
Posts: 520
Joined: Sat Oct 27, 2012 12:30 pm

Re: NBA Datasets 2013-14 Season

Post by v-zero »

Yeah, that's the hard bit in parsing play by play. I have come across a whole host of data-gimping in the thirteen years of data I have pushed through my code, such that it now has a special function which *just* sorts out weird lineup situations, using a combination of multiple passes and cross-referencing. Amusingly enough I also spent a whole weekend writing my parser.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: NBA Datasets 2013-14 Season

Post by EvanZ »

Having written parsers for all the major sites, I would recommend using NBC's play-by-play. It's the cleanest data source I found.
Jacob Frankel
Posts: 35
Joined: Mon Apr 08, 2013 6:45 am

Re: NBA Datasets 2013-14 Season

Post by Jacob Frankel »

I'm getting an error when I click on the XML link. Anybody else?

Thanks for putting this together and making it public!
Madey Jay
Posts: 8
Joined: Sat May 25, 2013 4:36 pm

Re: NBA Datasets 2013-14 Season

Post by Madey Jay »

Whoops. Corrected.
bbstats
Posts: 227
Joined: Thu Apr 21, 2011 8:25 pm
Location: Boone, NC
Contact:

Re: NBA Datasets 2013-14 Season

Post by bbstats »

Sounds fun!...am I missing something though? I don't see any data.
Madey Jay
Posts: 8
Joined: Sat May 25, 2013 4:36 pm

Re: NBA Datasets 2013-14 Season

Post by Madey Jay »

Just fixed it. There was a rogue comma in this morning's upload.
Post Reply