{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Lecture 5: SQL\n", "======================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quantifiers: existential and universal\n", "------------------------------------\n", "\n", "A massive amount of user interviewing has suggested that something called \"shmear\" (or *schmear*) is of critical importance to market success. You decide to look for competitors that have some shmear on the menu:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
made_by
Bobs Bagels
BAGEL CORP
" ], "text/plain": [ "[(u'Bobs Bagels',), (u'BAGEL CORP',)]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT DISTINCT made_by FROM bagel WHERE name LIKE '%shmear%';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A condition of this form (*\"there exists some r s.t. C(r)\"*) is known as an **existential** quantifier. As is apparent above, these are fairly easy to write in SQL. A **universal** quantifier on the other hand (of the form *\"C(r) for all r\"*) is a bit harder but still simple enough.\n", "\n", "So, for example, to find competitors with products that *all* have shmear in them:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
made_by
Bobs Bagels
" ], "text/plain": [ "[(u'Bobs Bagels',)]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT DISTINCT made_by\n", "FROM bagel\n", "WHERE made_by NOT IN (\n", " SELECT made_by\n", " FROM bagel\n", " WHERE name NOT LIKE '%shmear%');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NULL values in SQL\n", "-----------------\n", "\n", "Let's look at an odd pair of queries:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bagel_namefranchisedatequantitypurchaser_age
Plain with shmearBobs Bagels11228
Egg with shmearBobs Bagels2647
Plain with shmearBAGEL CORP21224
Plain with shmearBAGEL CORP3117
Plain with shmearBobs Bagels424None
" ], "text/plain": [ "[(u'Plain with shmear', u'Bobs Bagels', 1, 12, 28),\n", " (u'Egg with shmear', u'Bobs Bagels', 2, 6, 47),\n", " (u'Plain with shmear', u'BAGEL CORP', 2, 12, 24),\n", " (u'Plain with shmear', u'BAGEL CORP', 3, 1, 17),\n", " (u'Plain with shmear', u'Bobs Bagels', 4, 24, None)]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT * FROM purchase WHERE bagel_name LIKE '%shmear%';" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bagel_namefranchisedatequantitypurchaser_age
Plain with shmearBobs Bagels11228
Egg with shmearBobs Bagels2647
Plain with shmearBAGEL CORP21224
Plain with shmearBAGEL CORP3117
" ], "text/plain": [ "[(u'Plain with shmear', u'Bobs Bagels', 1, 12, 28),\n", " (u'Egg with shmear', u'Bobs Bagels', 2, 6, 47),\n", " (u'Plain with shmear', u'BAGEL CORP', 2, 12, 24),\n", " (u'Plain with shmear', u'BAGEL CORP', 3, 1, 17)]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql SELECT * FROM purchase \n", "WHERE bagel_name LIKE '%shmear%' \n", " AND (purchaser_age >= 5 OR purchaser_age < 5);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that `NULL` values are treated specially. In SQL, there are actually three effective boolean values- `TRUE`, `FALSE`, and `UNKNOWN`. Any comparison operation between a `NULL` value and a constant will return `UNKNOWN`- and in SQL, relations are only passed through when the condition over them returns `TRUE`. We can of course handle them specially though:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bagel_namefranchisedatequantitypurchaser_age
Plain with shmearBobs Bagels11228
Egg with shmearBobs Bagels2647
Plain with shmearBAGEL CORP21224
Plain with shmearBAGEL CORP3117
Plain with shmearBobs Bagels424None
" ], "text/plain": [ "[(u'Plain with shmear', u'Bobs Bagels', 1, 12, 28),\n", " (u'Egg with shmear', u'Bobs Bagels', 2, 6, 47),\n", " (u'Plain with shmear', u'BAGEL CORP', 2, 12, 24),\n", " (u'Plain with shmear', u'BAGEL CORP', 3, 1, 17),\n", " (u'Plain with shmear', u'Bobs Bagels', 4, 24, None)]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql SELECT * FROM purchase\n", "WHERE bagel_name LIKE '%shmear%'\n", " AND (purchaser_age >= 5 OR purchaser_age < 5 \n", " OR purchaser_age IS NULL);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens when there are nulls in a join?" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
name
Plain with shmear
Egg with shmear
eBagel Expansion Pack
" ], "text/plain": [ "[(u'Plain with shmear',), (u'Egg with shmear',), (u'eBagel Expansion Pack',)]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql \n", "SELECT DISTINCT b.name \n", "FROM bagel b, purchase p \n", "WHERE b.name = p.bagel_name AND b.made_by = p.franchise;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're missing bagels which were never purchased!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inner and Outer Joins\n", "--------------------\n", "\n", "A join query using a `WHERE` clause like the one just showed is actually an `INNER JOIN`, and can also be written as follows:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
name
Plain with shmear
Egg with shmear
eBagel Expansion Pack
" ], "text/plain": [ "[(u'Plain with shmear',), (u'Egg with shmear',), (u'eBagel Expansion Pack',)]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql \n", "SELECT DISTINCT b.name \n", "FROM bagel b\n", " INNER JOIN purchase p ON b.name = p.bagel_name AND b.made_by = p.franchise;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An `INNER JOIN` on tables `A` and `B` with join condition `C(A,B)` returns only relations `(a,b)` such that `C(a,b) = TRUE`. If, as in our example above, there is no `b` such that `C(a,b)` is true, then `a` is simply not returned in thr output multiset.\n", "\n", "We can use an `OUTER JOIN` instead, however, which comes in three varieties: `LEFT`, `RIGHT`, and `FULL`. \n", "\n", "In our current situation, what we needed was a `LEFT OUTER JOIN`. A left outer join will also return `(a, NULL)` for left relations `a` such that there is no `b` for which `C(a,b) = TRUE`:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
name
Plain with shmear
Egg with shmear
eBagel Drinkable Bagel
eBagel Expansion Pack
Organic Flax-seed bagel chips
" ], "text/plain": [ "[(u'Plain with shmear',),\n", " (u'Egg with shmear',),\n", " (u'eBagel Drinkable Bagel',),\n", " (u'eBagel Expansion Pack',),\n", " (u'Organic Flax-seed bagel chips',)]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql \n", "SELECT DISTINCT b.name \n", "FROM bagel b\n", " LEFT OUTER JOIN purchase p ON b.name = p.bagel_name AND b.made_by = p.franchise;" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 1 }