{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "When we want to work with SQL easily in an Ipython notebook, we'll need the ipython-sql [extension](https://github.com/catherinedevlin/ipython-sql) ans you can install it via `pip` as follows:\n", "\n", "`pip install ipython-sql`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we'll load the ipython-sql [extension](https://github.com/catherinedevlin/ipython-sql) as follows:\n", "\n", "**Note: DO NOT PANIC** \n", "* Don't worry if you get (a) a big red-highlighted warning or (b) a note that the extension has already been loaded! As long as your SQL commands work, it's loaded properly!\n", "\n", "* If you don't understand all the SQL queries... very soon, you will!" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sql extension is already loaded. To reload it, use:\n", " %reload_ext sql\n" ] } ], "source": [ "%load_ext sql" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll load an SQLite database stored as a file as follows:\n", "\n", "**NOTE: We load a file below (here, \"dataset_1.db\", which must be in the same directory as the notebook. You'll use this file throughout the next few lectures and PS1. Make sure to download the from the webpage!!**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "u'Connected: @dataset_1.db'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql sqlite:///dataset_1.db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets look at one table from the US National Oceanic and Atmospheric Administration (NOAA) Rainfall dataset- `precipitation_full`- having the following schema:\n", "\n", "> * `state_code`\n", "> * `station_id`\n", "> * `year`\n", "> * `month`\n", "> * `day`\n", "> * `hour`\n", "> * `precipitation`\n", "> * `flag_1`\n", "> * `flag_2`\n", "\n", "Each tuple in this table describes one hour of rainfall (`precipitation`- in hundredths of an inch) at one station (`station_id`) in one state (`state_code`). Note that tuples with `hour=25` record the total rainfall for that day, and that we can ignore the values of attributes `flag_1` and `flag_2` for now.\n", "\n", "Let's take a look at some sample rows of the table via SQL queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that you can also load an empty in-memory database using:\n", "\n", "``%sql sqlite://``\n", "\n", "Now let's try out some queries! (Don't worry, if you're new to SQL, we'll go over all the syntax in more depth next lecture)\n", "\n", "**`%sql` is used for single line SQL commands:**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
state_codestation_idyearmonthdayhourprecipitationflag_1flag_2
18072013911None[
1807201391250I
1807201393024None]
18072013930250I
1630320139110g
" ], "text/plain": [ "[(1, 807, 2013, 9, 1, 1, None, u'[', u''),\n", " (1, 807, 2013, 9, 1, 25, 0, u'I', u''),\n", " (1, 807, 2013, 9, 30, 24, None, u']', u''),\n", " (1, 807, 2013, 9, 30, 25, 0, u'I', u''),\n", " (1, 6303, 2013, 9, 1, 1, 0, u'g', u'')]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT * FROM precipitation_full LIMIT 5;" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
COUNT(*)
74251
" ], "text/plain": [ "[(74251,)]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT COUNT(*) FROM precipitation_full;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And **`%%sql` is used for multi-line SQL commands:**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUM(p.precipitation)
31982
" ], "text/plain": [ "[(31982,)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT SUM(p.precipitation) \n", "FROM precipitation_full p, states s \n", "WHERE p.state_code = s.code AND s.abbrev = 'CA';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also work with the output of the queries we issue:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///dataset_1.db\n", "Done.\n" ] } ], "source": [ "result = %sql SELECT * FROM states;" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[u'code', u'name', u'abbrev']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.keys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note that the first row is the header row with column names!*" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, u'Alabama', u'AL')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result[1]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "u'Alabama'" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result[1].name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now feel free to have fun with the dataset- we'll see more of it later though!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 1 }