{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Creating tables in SQL\n", "---------------------\n", "\n", "Before we actually get into basic SQL queries (**asking questions _of_ data in tables**), we'll look at some of the basics about how to **create** tables.\n", "\n", "**NOTE: Make sure to have a copy of the database file, \"dataset_1.db\", from the last lecture downloaded and in this directory for the below to work!**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "u'Connected: @dataset_1.db'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%load_ext sql\n", "%sql sqlite:///dataset_1.db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Activity 2-1:\n", "------------\n", "\n", "Schemas & table creation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that the database we just loaded has one table, `precipitation_full`, having the following schema:\n", "\n", "> * `state_code`\n", "> * `station_id`\n", "> * `year`\n", "> * `month`\n", "> * `day`\n", "> * `hour`\n", "> * `precipitation`\n", "> * `flag_1`\n", "> * `flag_2`\n", "\n", "Each tuple in this table describes one hour of rainfall (`precipitation`- in hundredths of an inch) at one station (`station_id`) in one state (`state_code`). Note that tuples with `hour=25` record the total rainfall for that day, and that we can ignore the values of attributes `flag_1` and `flag_2` for now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, however, let's see how to view the **schema** of existing tables on your own; there are several ways, including but not limited to:\n", "* DESCRIBE tablename\n", "* SHOW CREATE TABLE tablename\n", "* SHOW COLUMNS tablename\n", "\n", "Unfortunately, support for these varies widely between DBMSs, and is also limited by our IPython interface (for example sqlite, which we are using, does not support the above; it does have a `.schema tablename` command, however this doesn't work in IPython notebooks...)\n", "\n", "One that does work for us here though is:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
cidnametypenotnulldflt_valuepk
0state_codeINT0None0
1station_idINT0None0
2yearINT0None0
3monthINT0None0
4dayINT0None0
5hourINT0None0
6precipitationINT0None0
7flag_1VARCHAR(1)0None0
8flag_2VARCHAR(1)0None0
" ], "text/plain": [ "[(0, u'state_code', u'INT', 0, None, 0),\n", " (1, u'station_id', u'INT', 0, None, 0),\n", " (2, u'year', u'INT', 0, None, 0),\n", " (3, u'month', u'INT', 0, None, 0),\n", " (4, u'day', u'INT', 0, None, 0),\n", " (5, u'hour', u'INT', 0, None, 0),\n", " (6, u'precipitation', u'INT', 0, None, 0),\n", " (7, u'flag_1', u'VARCHAR(1)', 0, None, 0),\n", " (8, u'flag_2', u'VARCHAR(1)', 0, None, 0)]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql PRAGMA table_info(precipitation_full);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A bit verbose, but gets the job done!\n", "\n", "And, we can get the exact statement used to create the table as follows (**a great way to find guidance here!!**):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
sql
CREATE TABLE precipitation_full(state_code INT, station_id INT, year INT, month INT, day INT, hour INT, precipitation INT, flag_1 VARCHAR(1), flag_2 VARCHAR(1))
" ], "text/plain": [ "[(u'CREATE TABLE precipitation_full(state_code INT, station_id INT, year INT, month INT, day INT, hour INT, precipitation INT, flag_1 VARCHAR(1), flag_2 VARCHAR(1))',)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT sql FROM sqlite_master WHERE name = 'precipitation_full';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without going into full detail (yet), the above table contains one record for each hour at each station, and contains the amount of precipitation that was measured during that hour.\n", "\n", "Suppose that Database Systems course has been repurposed as a rain measurement corps to assist with the department that collected this data! Based on what we've covered so far, the above example, and the internet, create a table for storing the staff assignments. Table requirements:\n", "* Everyone in the class will be holding a cup in the rain for a specific several-hour shift at a specific station; this assignment will remain the same every day\n", "* Each person will have one off-day per week\n", "* Each person's cup might be of a different size, measured as a float value\n", "* The Dept. of Interior data servers can't handle the full dataset we would generate, and require a random subsample- so some people will be randomly chosen to stand in the rain without a cup. These assignments need to be recorded somehow in the table too.\n", "* Some people in the class have very long names.\n", "\n", "Type your create table statement here:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "(sqlite3.OperationalError) table rain_corps_assignments already exists [SQL: u'CREATE TABLE rain_corps_assignments(\\n student_id INT PRIMARY KEY,\\n name VARCHAR(1000),\\n station_id INT,\\n state_code INT,\\n start_hour INT,\\n end_hour INT,\\n holding_cup BOOLEAN,\\n cup_size FLOAT,\\n off_day INT\\n)'] (Background on this error at: http://sqlalche.me/e/e3q8)\n" ] } ], "source": [ "%%sql\n", "CREATE TABLE rain_corps_assignments(\n", " student_id INT PRIMARY KEY,\n", " name VARCHAR(1000),\n", " station_id INT,\n", " state_code INT,\n", " start_hour INT,\n", " end_hour INT,\n", " holding_cup BOOLEAN,\n", " cup_size FLOAT,\n", " off_day INT\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 1 }