{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%load_ext sql\n",
"%sql sqlite://"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Utility functions...\n",
"from IPython.core.display import display_html, HTML\n",
"def to_html_table(res, style=None):\n",
" html = ''\n",
" html += ''.join(res.keys) + ''\n",
" html += ''.join([''.join([str(cell) for cell in row]) for row in list(res)])\n",
" return html + ''\n",
"def display_side_by_side(l, r):\n",
" s = \"display: inline-block;\"\n",
" html = to_html_table(l, style=s) + ' ' + to_html_table(r, style=s)\n",
" display_html(HTML(data=html))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MVDs\n",
"===\n",
"\n",
"This notebook is meant to go into a little more depth about them.\n",
"\n",
"**Note that there are some nice viualizations in the lecture slides that we won't reproduce here, so please look those over as well! Also check out Activity-11-3...**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Formal definition\n",
"\n",
"Given a relation $R$ having attributes $A$, and two sets of attributes $X,Y\\subseteq A$, the **_multi-value dependency (MVD)_** $X\\twoheadrightarrow Y$ holds on $R$ if, for **any** tuples $t_1,t_2\\in R$ s.t. $t_1[X] = t_2[X]$, there exists a tuple $t_3\\in R$ s.t.:\n",
"\n",
"* $t_3[X] = t_1[X] = t_2[X]$\n",
"* $t_3[Y] = t_1[Y]$\n",
"* $t_3[A\\setminus Y] = t_2[A\\setminus Y]$\n",
"\n",
"where $A\\setminus B$ = all elements of $A$ not in $B$.\n",
"\n",
"So let's consider a toy example at this point:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT);\n",
"INSERT INTO R VALUES (1, 1, 0);\n",
"INSERT INTO R VALUES (1, 0, 1);\n",
"SELECT * FROM R;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's consider the first two rows as $t_1$ and $t_2$ respectively; what is the $t_3$ that the definition tells us we must add if we want the MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ to hold? Let's add it in:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%sql INSERT INTO R VALUES (1,1,1); SELECT * FROM R;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However what about if we consider the first two rows as $t_2$ and $t_1$ respectively? What row would the definition tell us we'd have to add in?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%sql INSERT INTO R VALUES (1,0,0); SELECT * FROM R;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, if we checked every other pair like this, we'd see that we are done- the MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ holds!\n",
"\n",
"## A second definition\n",
"\n",
"We see that this suggests another (equivalent) definition, which we'll phrase slightly less formally:\n",
"\n",
"**For an MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ to hold, for any pair of tuples that agree on attributes $A$, we also must find the corresponding _\"swapped\"_ pair: a pair of tuples that also have the same value of $A$ but have their $B$ and $(A\\cup B)^C$ attributes swapped.**\n",
"\n",
"(*where $(A\\cup B)^C$ just means the attributes not in $A$ or $B$*)\n",
"\n",
"This definition should make sense, and might even feel more intuitive, given the toy example above!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Addressing one point of confusion:\n",
"\n",
"Remember, an MVD holds over a **relation** (not a single tuple, not a single pair of tuples $t_1,t_2$...). Again, look at the definition- an MVD, say $\\{X\\}\\twoheadrightarrow\\{Y\\}$, holds over a relation $R$ if **_for any possible_** pair of tuples $t_1,t_2$ in $R$ such that $t_1[A]=t_2[A]$, the condition in the definition holds...\n",
"\n",
"\n",
"### Another clarification of terminology:\n",
"So how can we ever _test_ if an MVD holds over a relation $R$, based just on one _instance_ of $R$? Couldn't someone always add in more tuples and violate it? Aren't we being a little sloppy when we say an MVD 'holds' on an _instance_ of a relation just based on checking the instance..?\n",
"\n",
"This is correct. Recall that we have the same situation with FDs- we need to have _external information_ (such as a set of functional dependencies) to _prove_ that an MVD holds in general over a relation. We **can** prove that it is violated or not violated by an instance of $R$, and that it thus could or could not potentially hold in general.\n",
"\n",
"So when we say an MVD (or an FD) _holds_ on an instance, based only on checking the instance, we implictly mean the above- that based on the instance we see, it _could_ hold in general."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### A third definition...\n",
"\n",
"At this point (especially if you're still reading) you may suspect a connection to cross-products and joins. Consider decomposing our toy example in its **original state (before we added in the two rows to make the MVD hold)**. Let's decompose it into two tables, split by the attribute sets of the MVD (recall it was $\\{A\\}\\twoheadrightarrow\\{B\\}$):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT);\n",
"INSERT INTO R VALUES (1, 1, 0);\n",
"INSERT INTO R VALUES (1, 0, 1);\n",
"SELECT * FROM R;"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%sql DROP TABLE IF EXISTS R1; CREATE TABLE R1 AS SELECT A,B FROM R GROUP BY A,B;\n",
"%sql DROP TABLE IF EXISTS R2; CREATE TABLE R2 AS SELECT A,C FROM R GROUP BY A,C;\n",
"r1 = %sql SELECT * FROM R1;\n",
"r2 = %sql SELECT * FROM R2;\n",
"display_side_by_side(r1,r2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's take the join to recompose:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%sql\n",
"SELECT r1.A AS A, r1.B AS B, r2.C AS C\n",
"FROM R1 r1, R2 r2\n",
"WHERE r1.A = r2.A;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Woah! The MVD that we decomposed along holds for the join of the decomposition! We see that this is another definition for an MVD, in terms of the _'local joins'_: an MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ holds if whenever we take a block of rows with the same value for the $A$ attributes, decompose it into $R_1(A,B)$ and $R_2(A,(A\\cup B)^C)$, and then take the join of these tables, we end up with the same rows we had before!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.15"
}
},
"nbformat": 4,
"nbformat_minor": 1
}