SQL, Databases, and Basketball Stats

Home for all your discussion of basketball statistical analysis.
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: SQL, Databases, and Basketball Stats

Post by Crow »

On your last question, I wondered the same but figure give it a week to 10 days and we'll know one way or the other.

Couldn't get sqlfiddle to load right now. May check it later.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: SQL, Databases, and Basketball Stats

Post by J.E. »

EvanZ wrote:Related to the play-by-play issues, does anyone know if Aaron B. is done with basketball-value?
Word on the street is that he recently was employed by a new NBA team (76ers). They might not be as cool with him putting information online as much as the Grizzlies were
kpascual
Posts: 50
Joined: Thu Mar 01, 2012 7:02 pm

Re: SQL, Databases, and Basketball Stats

Post by kpascual »

Here's a pretty basic SQL tutorial that hopefully will get you started as painlessly as possible. Ignore me if you already know SQL or aren't interested.

Installation

You can use any open-source database you wish (MySQL, Postgres), but here is the quickest way to get started, IMO.

1. Download the Firefox browser (http://www.mozilla.org/en-US/firefox/new/)
2. Install the SQLite Manager Add-On for Firefox (https://addons.mozilla.org/en-us/firefo ... e-manager/)

This will give you an environment to practice your SQL skills. SQLite, as its name implies, is a very lightweight type of SQL database.

Creating a database

1. In your Firefox browser, go to Tools -> SQL Lite Manager.
2. Create a database by either clicking the New database icon (the piece of paper icon) or going to Database -> New Database from the menubar. Name your database, and choose a location to house it... it doesn't really matter where you put it.

This creates a file that "represents" your database. Take note: a database is really just a special kind of file, another way to organize data.

Creating tables

Databases are organized into tables, similar to Excel's worksheets-within-a-workbook organization. However, database tables must have well-defined data and columns: you can't just change the data type in a database table 2nd column like you can in an Excel spreadsheet.

Example: suppose we want a listing of NBA players' names and heights. In database-speak, we would create a table called "player", with columns for player names and heights. Here's how:

Code: Select all

CREATE TABLE player (player_id INTEGER PRIMARY KEY, name VARCHAR(50), height INTEGER);
Within SQLite Manager, click on Execute SQL and enter this into the box labeled Enter SQL. Then click Run SQL. This creates an empty table named player, with columns player_id, name, and height.

The uppercase words after the column names describe the data type of that column. For example, VARCHAR stands for 'variable character' and tells us that the 'name' column contains characters (letters and numbers and other symbols), also known as a 'string' in programming-speak. This differs from the height column, which has a data type of 'integer', or INT for short.

The words PRIMARY KEY indicates the unique identifier within the table. What does this mean? Suppose we have a player named 'Jason Williams' in our list. Are we talking about Duke's Jason Williams (before he went by "Jay"), or White Chocolate Jason Williams? Player names aren't unique, so we need to create another column that uniquely identifies each player. The player_id column accomplishes this, and the PRIMARY KEY clause indicates to the database that this column must be unique. In our table player, we would have a row for Duke's Jason Williams, with an id = 1, and another row for White Chocolate Jason Williams, with an id = 2, for example.

Adding data to a table ("INSERT")

We have a table, but now we need to put data into it. To do this, we introduce SQL's INSERT statement:

Code: Select all

INSERT INTO player (player_id, name, height) VALUES (1, 'LeBron James', 80);
Put that into that same text box called Enter SQL, and Run SQL. This will add a new row (also known as a 'record') into the player table.

To prove this happened, go to the Browse & Search tab, and click Tables -> player in the lefthand directory box thingy. You should see a record for LeBron James.

Or we can do this using SQL, with the SELECT statement. Back in the Execute SQL tab, execute this command:

Code: Select all

SELECT * FROM player
You should see the exact same thing. The asterisk (*) is shorthand for "give me all the columns in the table". Let's add another player just for kicks:

Code: Select all

INSERT INTO player (player_id, name, height) VALUES (2, 'Dwyane Wade', 76);
Note that for character strings, you need to enclose them in double or single quotes. Double quotes tend to be more reliable, since sometimes you'll have a string that has an apostrophe in it (ex. Amar'e Stoudemire), and the SQL interpreter will think the apostrophe is actually the end of your string.

Joins

Now let's suppose we have a list of all boxscores, and we wanted to add this info into our database. First we would create a table called boxscore, with fields for points, rebounds, and assists for each player:

Code: Select all

CREATE TABLE boxscore (id INTEGER PRIMARY KEY AUTOINCREMENT, player_id INTEGER, points INTEGER, assists INTEGER, rebounds INTEGER);
Shameless plug, but I just so happen to have a list of Lebron's box scores http://vorped.com/bball/index.php/playe ... Bron-James, so let's add a few in our new table:

Code: Select all

INSERT INTO boxscore (player_id, points, assists, rebounds) VALUES (1,28,8,10),(1,21,3,7),(1,27,12,7);
In this code, I've added 3 rows in one statement: all you have to do is comma-separate your rows, and make sure each row is enclosed in parentheses. Also note that between the table name and the VALUES clause is a parenthesis-enclosed list of column names. This lets you insert a row of values where you only have data for a subset of your columns.

We don't have player names in our boxscore table, but let's say we wanted that. To do this, we would JOIN data from the boxscore table with data from the player table.

Code: Select all

SELECT player.name, boxscore.points, boxscore.rebounds, boxscore.assists
FROM player
  INNER JOIN boxscore ON boxscore.player_id = player.player_id
Since we have player_id in both tables, we can link them up and take columns from both tables. Here we took the player name from the player table, and points/rebounds/assists from our boxscore table.

You'll notice that boxscore has the player_id field found in the player table. When a primary key from one table is used in another table, that field is called a foreign key. In JOINs, you typically join foreign keys from one table to their corresponding primary key in another table. This functionality defines relationships between different types of data, in this case, between player attributes and player scores. JOINs are why databases are typically called "relational" databases.

Filtering data ("WHERE")

Now what if we wanted only boxscores where Lebron had 10 or more assists? You would use the WHERE clause to filter a result set. We take the same query, and add the following WHERE clause:

Code: Select all

SELECT player.name, boxscore.points, boxscore.rebounds, boxscore.assists
FROM player
  INNER JOIN boxscore ON boxscore.player_id = player.player_id
WHERE boxscore.assists >= 10
It reads pretty logically, doesn't it?

There's a lot more about databases/SQL I can go into, but the very basic technical/syntactical aspects of SQL are pretty much here. Please let me know if there are any questions or points of confusion.
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: SQL, Databases, and Basketball Stats

Post by Crow »

That was nice of you to provide. Thanks. I will bookmark and come back to it when ready.
Post Reply