{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%load_ext sql\n", "%sql sqlite://" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Utility functions...\n", "from IPython.core.display import display_html, HTML\n", "def to_html_table(res, style=None):\n", " html = ''\n", " html += ''.join(res.keys) + ''\n", " html += ''.join([''.join([str(cell) for cell in row]) for row in list(res)])\n", " return html + ''\n", "def display_side_by_side(l, r):\n", " s = \"display: inline-block;\"\n", " html = to_html_table(l, style=s) + ' ' + to_html_table(r, style=s)\n", " display_html(HTML(data=html))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MVDs\n", "===\n", "\n", "This notebook is meant to go into a little more depth about them.\n", "\n", "**Note that there are some nice viualizations in the lecture slides that we won't reproduce here, so please look those over as well! Also check out Activity-11-3...**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Formal definition\n", "\n", "Given a relation $R$ having attributes $A$, and two sets of attributes $X,Y\\subseteq A$, the **_multi-value dependency (MVD)_** $X\\twoheadrightarrow Y$ holds on $R$ if, for **any** tuples $t_1,t_2\\in R$ s.t. $t_1[X] = t_2[X]$, there exists a tuple $t_3\\in R$ s.t.:\n", "\n", "* $t_3[X] = t_1[X] = t_2[X]$\n", "* $t_3[Y] = t_1[Y]$\n", "* $t_3[A\\setminus Y] = t_2[A\\setminus Y]$\n", "\n", "where $A\\setminus B$ = all elements of $A$ not in $B$.\n", "\n", "So let's consider a toy example at this point:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%sql\n", "DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT);\n", "INSERT INTO R VALUES (1, 1, 0);\n", "INSERT INTO R VALUES (1, 0, 1);\n", "SELECT * FROM R;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's consider the first two rows as $t_1$ and $t_2$ respectively; what is the $t_3$ that the definition tells us we must add if we want the MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ to hold? Let's add it in:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql INSERT INTO R VALUES (1,1,1); SELECT * FROM R;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However what about if we consider the first two rows as $t_2$ and $t_1$ respectively? What row would the definition tell us we'd have to add in?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql INSERT INTO R VALUES (1,0,0); SELECT * FROM R;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, if we checked every other pair like this, we'd see that we are done- the MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ holds!\n", "\n", "## A second definition\n", "\n", "We see that this suggests another (equivalent) definition, which we'll phrase slightly less formally:\n", "\n", "**For an MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ to hold, for any pair of tuples that agree on attributes $A$, we also must find the corresponding _\"swapped\"_ pair: a pair of tuples that also have the same value of $A$ but have their $B$ and $(A\\cup B)^C$ attributes swapped.**\n", "\n", "(*where $(A\\cup B)^C$ just means the attributes not in $A$ or $B$*)\n", "\n", "This definition should make sense, and might even feel more intuitive, given the toy example above!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Addressing one point of confusion:\n", "\n", "Remember, an MVD holds over a **relation** (not a single tuple, not a single pair of tuples $t_1,t_2$...). Again, look at the definition- an MVD, say $\\{X\\}\\twoheadrightarrow\\{Y\\}$, holds over a relation $R$ if **_for any possible_** pair of tuples $t_1,t_2$ in $R$ such that $t_1[A]=t_2[A]$, the condition in the definition holds...\n", "\n", "\n", "### Another clarification of terminology:\n", "So how can we ever _test_ if an MVD holds over a relation $R$, based just on one _instance_ of $R$? Couldn't someone always add in more tuples and violate it? Aren't we being a little sloppy when we say an MVD 'holds' on an _instance_ of a relation just based on checking the instance..?\n", "\n", "This is correct. Recall that we have the same situation with FDs- we need to have _external information_ (such as a set of functional dependencies) to _prove_ that an MVD holds in general over a relation. We **can** prove that it is violated or not violated by an instance of $R$, and that it thus could or could not potentially hold in general.\n", "\n", "So when we say an MVD (or an FD) _holds_ on an instance, based only on checking the instance, we implictly mean the above- that based on the instance we see, it _could_ hold in general." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### A third definition...\n", "\n", "At this point (especially if you're still reading) you may suspect a connection to cross-products and joins. Consider decomposing our toy example in its **original state (before we added in the two rows to make the MVD hold)**. Let's decompose it into two tables, split by the attribute sets of the MVD (recall it was $\\{A\\}\\twoheadrightarrow\\{B\\}$):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%sql\n", "DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT);\n", "INSERT INTO R VALUES (1, 1, 0);\n", "INSERT INTO R VALUES (1, 0, 1);\n", "SELECT * FROM R;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql DROP TABLE IF EXISTS R1; CREATE TABLE R1 AS SELECT A,B FROM R GROUP BY A,B;\n", "%sql DROP TABLE IF EXISTS R2; CREATE TABLE R2 AS SELECT A,C FROM R GROUP BY A,C;\n", "r1 = %sql SELECT * FROM R1;\n", "r2 = %sql SELECT * FROM R2;\n", "display_side_by_side(r1,r2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's take the join to recompose:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT r1.A AS A, r1.B AS B, r2.C AS C\n", "FROM R1 r1, R2 r2\n", "WHERE r1.A = r2.A;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Woah! The MVD that we decomposed along holds for the join of the decomposition! We see that this is another definition for an MVD, in terms of the _'local joins'_: an MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ holds if whenever we take a block of rows with the same value for the $A$ attributes, decompose it into $R_1(A,B)$ and $R_2(A,(A\\cup B)^C)$, and then take the join of these tables, we end up with the same rows we had before!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.15" } }, "nbformat": 4, "nbformat_minor": 1 }