Page 2 of 3
Re: SQL, Databases, and Basketball Stats
Posted: Fri Nov 02, 2012 1:55 pm
by mystic
A mod just seperated this from the original thread. A reasonable decision, because the talk about SQL etc. pp had little to do with the Rockets job offer, wouldn't you agree?
Re: SQL, Databases, and Basketball Stats
Posted: Fri Nov 02, 2012 2:07 pm
by bbstats
Ah I didn't realize I created a fuss. Didn't pay attention 'til it had been relocated. Carry on!
Re: SQL, Databases, and Basketball Stats
Posted: Sat Nov 03, 2012 7:37 pm
by JohnHasADHD
I"m a bit late - but if you want to do data anlaysis of any significance, I would think SQL would be vital and important - and it's not just the concept of creating the databases.
When you know your database (or create it) you can get the data you want when you want it how you want it.
I have been working slowly on getting shot information and boxscore information for every player and every game this off season, so far I believe I've successfully downloaded the stats for the last two NBA seasons (though I might have to tweak the shot information to indicate at what point in the quarter the shot was taken)
As lark, and a sixers fan, I wanted to take a look at 4th quarter shots last season...Louis Williams really took a lot of them so I was just curious who might be taking them this year and wanted to see how it had happened last year. If you don't know SQL - how would you do that quickly and easily?
In this day and age with the volume (and granularity) of data, you can't do advanced statistical analysis without know SQL.
Of course you need to know other stuff too, I can normalize the crap out of any data, query any database to give me exactly what I want, but the anlysis portion of it still comes slow to me, advanced statistics is where i need to advance, but you definitely need to know SQL
Re: SQL, Databases, and Basketball Stats
Posted: Sat Nov 03, 2012 7:56 pm
by mystic
JohnHasADHD wrote:
As lark, and a sixers fan, I wanted to take a look at 4th quarter shots last season...Louis Williams really took a lot of them so I was just curious who might be taking them this year and wanted to see how it had happened last year. If you don't know SQL - how would you do that quickly and easily?
Klick me!
Took me about 10 sec to get that information. The good thing today is that bbref is actually knowing how to handle a database, thus, in order to get information like you desired, you really don't need to know SQL.
Re: SQL, Databases, and Basketball Stats
Posted: Sat Nov 03, 2012 8:10 pm
by JohnHasADHD
Interesting little toy - not exactly what i looked at when i was looking at my data, but interesting none the less
Re: SQL, Databases, and Basketball Stats
Posted: Sun Nov 04, 2012 6:17 pm
by v-zero
John, I don't accept your argument. Yes databasing is very useful for data mining, but the fact of the matter is that you can mine data without proper databasing - however I agree that it is optimal for that purpose. However, what if you have almost no interest in data mining? Believe what you will but I think you'll find that you struggle with analysis because you are blinded by data. Data mining is very prone to over-fitting issues, and beyond that is often little more than pattern detection. Proper analysis requires a lot more than a nice SQL database with lots of numbers. Analytics should be all about building a suitable frame into which to slot those numbers and bring forth some genuine insight.
Anyway, I am waffling.
Re: SQL, Databases, and Basketball Stats
Posted: Sun Nov 04, 2012 8:42 pm
by JohnHasADHD
Well, I don't struggle with analysis because I am blinded by data, I struggle with analysis because while teaching my self databases and ruby (but not perl) comes easy to me, teaching myself statistical analysis comes harder for me.
The reason one might need SQL for a job like advertised (as the original post asked) is that if you don't know sql you might not as easily work with the raw data that the job provides you. It's possible that the job listed required not only analysis of the data but organization of the data to make it able to be analyzed by the whole statistical department. Without a SQL background organizing the raw data can be rather difficult (I know you can do a lot with excel and vbscript but I've never found it easier than with a database, the ONLY time i've built a complex excel workbook over a database is in realation to my day job because I don't know how to program in the native language for Microsoft Dynamics - and building a production worksheet based on past and projected sales and current raw and finished good inventory just worked better in excel - but at the same time there's nothing really advanced in the 15 worksheets involved calculation wise, just a lot of cascading calculations
Re: SQL, Databases, and Basketball Stats
Posted: Mon Nov 05, 2012 12:53 am
by v-zero
Yeah, I agree with all that. I just didn't like what seemed to be a "you can't do analysis without SQL" suggestion.
Re: SQL, Databases, and Basketball Stats
Posted: Mon Nov 05, 2012 4:47 am
by mikez
SQL's nice because it's widely known/used/accepted, but it's not the only environment in which you can operate a database. We found it easiest to find a good person to develop (and then, later, operate/manage/continue to develop) our initial database in SQL vs other alternatives, and I imagine Houston did the same when Daryl left here to go there.
But you could, for example, use 4D, some NoSQL system, or some other environment to build a big database - it's just harder to find coders in other environments, especially when you consider that your database may need to interface smoothly with external vendors. SQL (in whatever flavor) is simply just a more widely used/understood option, though as Evan notes this may be changing.
All this, of course, is assuming you actually need to store large amounts of data in (and run many varied queries from) some sort of non-flat-file format. Many analyses we do work perfectly fine with easy-to-acquire data stored in flat delimited files or even directly in Excel, etc. So for the purposes of the majority of non-professional analysts, SQL may not ever be needed, especially given the excellent resources (e.g. basketball-reference, basketballvalue, etc.) now available online that were not available when we first started doing this stuff.
In addition, plenty of people analyze data provided to them by their database people without actually knowing any SQL. This is true both in and out of the NBA - in many industries I imagine it describes the vast majority of analysts. It was certainly true in the consulting firm where I used to work; we just had a database guy who would get us files that we then could analyze in SAS or SPSS or Excel or whatever else we were using on a particular project.
There's no question, though, that in nearly any analytical field, employment is more likely if you're familiar with the database resources used by your potential employer. For the majority of NBA teams doing this stuff, for various reasons, it's probably some flavor of SQL.
-MZ
Re: SQL, Databases, and Basketball Stats
Posted: Tue Nov 06, 2012 11:58 pm
by kpascual
I believe SQL is absolutely valuable, though not essential, for doing data analysis. It really depends on use case: if you're working on data sets where the # of records is around the thousands or hundreds of thousands of records, Excel is just fine. Same if your analysis has relatively narrow or well-defined scope.
You start to see the value of SQL when your data doesn't fit into 2 dimensional squares, or more accurately, tends to be more about relationships across objects (hence the term "relational database"), or when you don't necessarily know what you're trying to solve.
I'm personally not a huge fan of the NoSQL movement as it stands now. I dabbled with Mongo when determining the backend of my vorped website, but just thought a relational DB was the overall a better option. At my prior job, I found that even when using the newfangled Hadoop map/reduce technologies, I kept wanting a SQL interface instead of the pseudo-scripting language they created. In the end, data analysis is about getting data and (hopefully) answers as painlessly as possible, and I find SQL to be the least painful interface to doing so. But YMMV.
For background, I've been doing data analysis for various Silicon Valley companies over the past few years, and it's not a stretch to say I speak more SQL than English day-to-day. In fact my basketball site is really just an exercise in data warehousing (fancy term for a big database). I'd be happy to share any knowledge or do a write-up if anyone's interested in expanding into this realm.
Re: SQL, Databases, and Basketball Stats
Posted: Wed Nov 07, 2012 12:33 am
by Crow
When will team and player pages update at vorped.com?
I hadn't noticed this before:
http://vorped.com/bball/index.php/referee
Have you (or anyone else) done anything with RAPM or APM factors in relation to other stats in SQL?
I alluded to it before but if one thinks the stats are inteconnected then I would think it would be helpful to work in a relational database over a non-relational database, but am I being naive / overly simplistic in that view?
Re: SQL, Databases, and Basketball Stats
Posted: Thu Nov 08, 2012 5:54 am
by kpascual
ESPN decided to eff with me and not provide play-by-play data for certain games. Since my shot charts are actually tied to play-by-play events, things kind of went to hell, thus I've spent the last 2 weeks integrating NBA.com play by play so it won't happen until NBA.com messes with me.
TL;DR Should be fixed by the weekend.
I haven't personally done any RAPM stuff, since it seems many here are doing good work in that area.
I think a relational database can help in so many ways, with the most prominent being in organizing and ensuring the accuracy of your data.
I sense an aversion to using databases, and I just don't get it... it might be an aversion to doing command-line-y things or to writing "real" code, or to learning how exactly to install a database on your computer (which to be fair can be a huge PITA).
But if you have the slightest curiosity about SQL/databases, I say dive in headfirst... it'll be time well-spent. It's a worthwhile skill to have if you want to do data analysis for a living.
Re: SQL, Databases, and Basketball Stats
Posted: Thu Nov 08, 2012 11:45 am
by DSMok1
kpascual, if you could write/direct folks to a good tutorial for getting started in SQL for sports stats, I'm sure you'd get a lot (okay, maybe an overstatement) of interested readers.
Re: SQL, Databases, and Basketball Stats
Posted: Thu Nov 08, 2012 5:25 pm
by Crow
Thanks Ken for the reply.
I have been able to do most of what I wanted in Excel & Access to date. I was intimidated by the command line syntax; but, as I said earlier, after looking through some SQL books I have overcome that initial intimidation. I will probably take a course on it / read more in the future. I will still need a database and research questions that require it to get more experience but those can be assembled. I tend to be more interested in season stats than play by play details but I can certainly see that the play by play level analysis would be important to be able to do.
Re: SQL, Databases, and Basketball Stats
Posted: Thu Nov 08, 2012 8:48 pm
by EvanZ
sqlfiddle is a nice playground for experimenting/learning SQL without even having to install your own server (which is not a big deal, but anyway).
Like I mentioned earlier, that Coursera database course can really get a person quickly up to speed with the basics of SQL/relational theory.
Related to the play-by-play issues, does anyone know if Aaron B. is done with basketball-value? If so, I need to write my own parser as well.