Page 1 of 1

Big Data

Posted: Sun May 18, 2014 4:23 pm
by rlee
NBA Conference Finals, Kings, Celtics, Big Data: http://basketballintelligence.wordpress ... -big-data/

Re: Big Data

Posted: Mon May 19, 2014 8:23 pm
by nileriver
Interesting article. However, it is starting to bother me that people are just throwing the term "big data" around. Even though SportVU contains a lot of information, it should not be considered "big data." It seems that because the data sets that most are used to in the world of basketball are so small, that they feel like they need to use the buzz-word big data. From my understanding, data obtained through the SportVU system can be handled by traditional database systems. If it ever gets beyond that point, then we could start using that term and have to look towards other tools such as Hadoop to manage it. Just one of my pet-peeves, but the article is still worth a read :D .

Re: Big Data

Posted: Mon May 19, 2014 10:15 pm
by mtamada
But doesn't SportsVU track player locations 25 times per second? And there are several data items to record per player: their location on the court in two dimensions; which direction they're facing; and some sort of vertical measure, either the player's height or height off the ground or a measure of whether they're crouching or standing with arms vertical -- that's probably a minimum of two vertical measures. Oh, and their velocity in at least two dimensions. And also who the player is. Let's conservatively call it 8 pieces of data.
Times 10 players on the court.
Times 25 observations per second.
Times 2,880 seconds in a regulation NBA game.
Times 1,230 regular season games.
Plus overtime and playoffs.

That's around 8 billion data points. And that's not even accounting for the ball's location.


Nonetheless you're correct that it's not really the volume of data that's really distinctive about the video data. It's the complexity; accounting for 10 bodies plus the ball and their locations and directional vectors simultaneously. Most of us don't learn statistics which can deal with data of that complexity, one has to apply spatial statistics and I suspect techniques from geography and from whatever computer science or machine learning fields delve into these sorts of issues. That's different stuff from what basketball analytics was looking at prior to the availability of video data. Different enough to merit the moniker "big data" IMO because I suspect that the successful analysis will use analytic techniques that are different from the ones heretofore.

Re: Big Data

Posted: Mon May 19, 2014 10:33 pm
by nileriver
This quote is from wikipedia, so it isn't the most reliable, but it is an accurate description of what big data is:
The term "Big Data" was coined by Haseeb Budhani, presently founder and CEO of BubblewrApp, in 2008 while at Infineta Systems.[citation needed] "Big Data" caught on quickly as a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
When systems like SQL Server, MySQL, and traditional Oracle cannot scale to handle the needs of the data, then it become "big data." It came about due to the need to use new tools to handle the sheer size of the data as well as if it is unstructured.

Also, you are talking about 8 billion data points. I have been told that some telcoms such as Sprint record 100 billion records every day. It is all a matter of perspective, and I really think those around the league have a skewed perspective due to how simple and small the data used to be.

Re: Big Data

Posted: Mon May 19, 2014 11:17 pm
by J.E.
mtamada wrote:But doesn't SportsVU track player locations 25 times per second? And there are several data items to record per player: their location on the court in two dimensions; which direction they're facing; and some sort of vertical measure, either the player's height or height off the ground or a measure of whether they're crouching or standing with arms vertical -- that's probably a minimum of two vertical measures. Oh, and their velocity in at least two dimensions. And also who the player is.
The raw SportVU data contains nothing but name and XY coordinates. Doesn't include 'crouching' or 'standing', no data on arms nor the direction they're facing