{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Class 6: Advanced `pandas`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently, `pandas`' `Series` and `DataFrame` might seem to us as no more than tables with complicated indexing methods. In this lesson, we will learn more about what makes `pandas` so powerful and how we can use it to write efficient and readable code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "````{note}\n", "Some of the features described below only work with pandas >= 1.0.0. Make sure you have the latest pandas installation when running this notebook. To check the version of your pandas (or any other package), import it and print its `__version__` attribute:\n", "```python\n", ">>> import pandas as pd\n", ">>> print(pd.__version__)\n", "'1.2.0'\n", "```\n", "````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Missing Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last question in the previous class pointed us to [working with missing data](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html). But how and why do missing data occur?\n", "\n", "One option is pandas' index alignment, the property that makes sure that each value will have the same index throughout the entire computation process." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 NaN\n", "1 5.0\n", "2 9.0\n", "3 NaN\n", "dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "\n", "A = pd.Series([2, 4, 6], index=[0, 1, 2])\n", "B = pd.Series([1, 3, 5], index=[1, 2, 3])\n", "A + B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The NaNs we have are what we call missing data, and this is how they are represented in pandas. We'll discuss that in more detail in a few moments.\n", "\n", "The same thing occurs with DataFrames:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
0611
195
\n", "
" ], "text/plain": [ " A B\n", "0 6 11\n", "1 9 5" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = pd.DataFrame(np.random.randint(0, 20, (2, 2)),\n", " columns=list('AB'))\n", "A" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BAC
0681
1281
2837
\n", "
" ], "text/plain": [ " B A C\n", "0 6 8 1\n", "1 2 8 1\n", "2 8 3 7" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = pd.DataFrame(np.random.randint(0, 10, (3, 3)),\n", " columns=list('BAC'))\n", "B" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " A B C\n", "0 14.0 17.0 NaN\n", "1 17.0 7.0 NaN\n", "2 NaN NaN NaN\n", "\n", "Returned dtypes:\n", "A float64\n", "B float64\n", "C float64\n", "dtype: object\n" ] } ], "source": [ "new = A + B\n", "print(new)\n", "print(f\"\\nReturned dtypes:\\n{new.dtypes}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "Note how `new.dtypes` itself returns a `Series` of dtypes, with it's own `object` dtype.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataframe's shape is the shape of the larger dataframe, and the \"extra\" row (index 2) was filled with NaNs. Since we have NaNs, the data type of the column is implicitly converted to a floating point type. To have integer dataframes with NaNs, we have to explicitly say we want them available. More on that later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to introduce missing data is through reindexing. If we \"resample\" our data we can achieve the following:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
a1.2223740.9410360.380220
c-0.9503801.3593081.245010
e-0.0381100.3233880.881577
f0.706950-0.555918-0.000683
h1.451893-1.8160210.791840
\n", "
" ], "text/plain": [ " one two three\n", "a 1.222374 0.941036 0.380220\n", "c -0.950380 1.359308 1.245010\n", "e -0.038110 0.323388 0.881577\n", "f 0.706950 -0.555918 -0.000683\n", "h 1.451893 -1.816021 0.791840" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f', 'h'],\n", " columns=['one', 'two', 'three'])\n", "df" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
a1.2223740.9410360.380220
bNaNNaNNaN
c-0.9503801.3593081.245010
dNaNNaNNaN
e-0.0381100.3233880.881577
f0.706950-0.555918-0.000683
gNaNNaNNaN
h1.451893-1.8160210.791840
\n", "
" ], "text/plain": [ " one two three\n", "a 1.222374 0.941036 0.380220\n", "b NaN NaN NaN\n", "c -0.950380 1.359308 1.245010\n", "d NaN NaN NaN\n", "e -0.038110 0.323388 0.881577\n", "f 0.706950 -0.555918 -0.000683\n", "g NaN NaN NaN\n", "h 1.451893 -1.816021 0.791840" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])\n", "df2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But what is `NaN`? Is it the same as `None`? To better answer the former, let's first have a closer look at the latter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `None` object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`None` is the standard null value in Python, and is used extensively in normal usage of the language. For example, functions that don't have a `return` statement, implicitly return `None`. While `None` can be used as a missing data type, it's probably not the best choice." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, None, 3, 4], dtype=object)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vals1 = np.array([1, None, 3, 4])\n", "vals1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `dtype` is `object`, because the best common type of `int`s and a `None` is a Python `object`. This slows down computation time on these arrays:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dtype = object\n", "43.1 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", "\n", "dtype = int\n", "341 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)\n", "\n" ] } ], "source": [ "for dtype in ['object', 'int']:\n", " print(\"dtype =\", dtype)\n", " %timeit np.arange(1E6, dtype=dtype).sum()\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you recall from a couple of lessons ago, the performance of `object` arrays is very similar to that of standard lists (generally speaking, the two data structures are effectively identical)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another thing we can't do is aggregation:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for +: 'int' and 'NoneType'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[18], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mvals1\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msum\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/numpy/core/_methods.py:49\u001b[0m, in \u001b[0;36m_sum\u001b[0;34m(a, axis, dtype, out, keepdims, initial, where)\u001b[0m\n\u001b[1;32m 47\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_sum\u001b[39m(a, axis\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, dtype\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, out\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m, keepdims\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m,\n\u001b[1;32m 48\u001b[0m initial\u001b[38;5;241m=\u001b[39m_NoValue, where\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m):\n\u001b[0;32m---> 49\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mumr_sum\u001b[49m\u001b[43m(\u001b[49m\u001b[43ma\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43maxis\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mout\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkeepdims\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43minitial\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mwhere\u001b[49m\u001b[43m)\u001b[49m\n", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'int' and 'NoneType'" ] } ], "source": [ "vals1.sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `NaN` value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`NaN` is a special floating-point value recognized by all programming languages that conform to the IEEE standard (which means most of them). As we mentioned before, it forces the entire array to have a floating point type:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vals2 = np.array([1, np.nan, 3, 4])\n", "vals2.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating floating point arrays is very fast, so performance isn't hindered. NaN is sometimes described as a \"data virus\", since it infects objects it touches:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + np.nan" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nan" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "0 * np.nan" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(nan, nan, nan)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vals2.sum(), vals2.min(), vals2.max()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.nan == np.nan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy has `nan`-aware counterparts to many of its aggregation functions, which can work with NaNs correctly. They usually have the same name as their non-NaN sibling, but with the \"nan\" prefix:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8.0\n", "2.6666666666666665\n" ] } ], "source": [ "print(np.nansum(vals2))\n", "print(np.nanmean(vals2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, pandas objects account for NaNs in their calculations, as we'll soon see.\n", "\n", "Pandas can handle both `NaN` and `None` interchangeably:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 NaN\n", "2 2.0\n", "3 NaN\n", "dtype: float64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ser = pd.Series([1, np.nan, 2, None])\n", "ser" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `NaT` value\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When dealing with datetime values or indices, the missing value is represented as `NaT`, or not-a-time:\n", "
" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothreetimestamp
a1.2223740.9410360.3802202018-01-01
c-0.9503801.3593081.2450102018-01-01
e-0.0381100.3233880.8815772018-01-01
f0.706950-0.555918-0.0006832018-01-01
h1.451893-1.8160210.7918402018-01-01
\n", "
" ], "text/plain": [ " one two three timestamp\n", "a 1.222374 0.941036 0.380220 2018-01-01\n", "c -0.950380 1.359308 1.245010 2018-01-01\n", "e -0.038110 0.323388 0.881577 2018-01-01\n", "f 0.706950 -0.555918 -0.000683 2018-01-01\n", "h 1.451893 -1.816021 0.791840 2018-01-01" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['timestamp'] = pd.Timestamp('20180101')\n", "df" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothreetimestamp
a1.2223740.9410360.3802202018-01-01
bNaNNaNNaNNaT
c-0.9503801.3593081.2450102018-01-01
dNaNNaNNaNNaT
e-0.0381100.3233880.8815772018-01-01
f0.706950-0.555918-0.0006832018-01-01
gNaNNaNNaNNaT
h1.451893-1.8160210.7918402018-01-01
\n", "
" ], "text/plain": [ " one two three timestamp\n", "a 1.222374 0.941036 0.380220 2018-01-01\n", "b NaN NaN NaN NaT\n", "c -0.950380 1.359308 1.245010 2018-01-01\n", "d NaN NaN NaN NaT\n", "e -0.038110 0.323388 0.881577 2018-01-01\n", "f 0.706950 -0.555918 -0.000683 2018-01-01\n", "g NaN NaN NaN NaT\n", "h 1.451893 -1.816021 0.791840 2018-01-01" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])\n", "df2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Operations and calculations with missing data" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
00.3070070.608862
10.989388NaN
20.0681040.519292
30.8818030.572062
40.1766680.570929
\n", "
" ], "text/plain": [ " one two\n", "0 0.307007 0.608862\n", "1 0.989388 NaN\n", "2 0.068104 0.519292\n", "3 0.881803 0.572062\n", "4 0.176668 0.570929" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = pd.DataFrame(np.random.random((5, 2)), columns=['one', 'two'])\n", "a.iloc[1, 1] = np.nan\n", "a" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwothree
00.1789860.3828300.402055
10.1948000.3869100.853929
20.8032600.155697NaN
30.8088050.5321030.392492
40.1111700.9757440.667562
50.9837060.3106770.742454
\n", "
" ], "text/plain": [ " one two three\n", "0 0.178986 0.382830 0.402055\n", "1 0.194800 0.386910 0.853929\n", "2 0.803260 0.155697 NaN\n", "3 0.808805 0.532103 0.392492\n", "4 0.111170 0.975744 0.667562\n", "5 0.983706 0.310677 0.742454" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = pd.DataFrame(np.random.random((6, 3)), columns=['one', 'two', 'three'])\n", "b.iloc[2, 2] = np.nan\n", "b" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaNNaN
20.871363NaN0.674988
31.690608NaN1.104164
40.287838NaN1.546673
5NaNNaNNaN
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN NaN\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 0.287838 NaN 1.546673\n", "5 NaN NaN NaN" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a + b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we see, missing values propagate naturally through these arithmetic operations. Statistics also works:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
count5.0000000.04.000000
mean0.903998NaN1.079379
std0.559622NaN0.360647
min0.287838NaN0.674988
25%0.485993NaN0.912516
50%0.871363NaN1.047928
75%1.184189NaN1.214791
max1.690608NaN1.546673
\n", "
" ], "text/plain": [ " one three two\n", "count 5.000000 0.0 4.000000\n", "mean 0.903998 NaN 1.079379\n", "std 0.559622 NaN 0.360647\n", "min 0.287838 NaN 0.674988\n", "25% 0.485993 NaN 0.912516\n", "50% 0.871363 NaN 1.047928\n", "75% 1.184189 NaN 1.214791\n", "max 1.690608 NaN 1.546673" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(a + b).describe()\n", "# Summation - NaNs are zero.\n", "# If everything is NaN - the result is NaN as well.\n", "# pandas' cumsum and cumprod ignore NaNs but preserve them in the resulting arrays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also receive a boolean mask of the NaNs in a dataframe:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
0FalseTrueFalse
1FalseTrueTrue
2FalseTrueFalse
3FalseTrueFalse
4FalseTrueFalse
5TrueTrueTrue
\n", "
" ], "text/plain": [ " one three two\n", "0 False True False\n", "1 False True True\n", "2 False True False\n", "3 False True False\n", "4 False True False\n", "5 True True True" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = (a + b).isnull() # also isna(), and the opposite .notnull()\n", "mask" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Filling missing values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest option is to use the `fillna` method:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaNNaN
20.871363NaN0.674988
31.690608NaN1.104164
4NaNNaN1.546673
5NaNNaNNaN
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN NaN\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 NaN NaN 1.546673\n", "5 NaN NaN NaN" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed = a + b\n", "summed.iloc[4, 0] = np.nan\n", "summed" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.4859930.00.991692
11.1841890.00.000000
20.8713630.00.674988
31.6906080.01.104164
40.0000000.01.546673
50.0000000.00.000000
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 0.0 0.991692\n", "1 1.184189 0.0 0.000000\n", "2 0.871363 0.0 0.674988\n", "3 1.690608 0.0 1.104164\n", "4 0.000000 0.0 1.546673\n", "5 0.000000 0.0 0.000000" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed.fillna(0)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993missing0.991692
11.184189missingmissing
20.871363missing0.674988
31.690608missing1.104164
4missingmissing1.546673
5missingmissingmissing
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 missing 0.991692\n", "1 1.184189 missing missing\n", "2 0.871363 missing 0.674988\n", "3 1.690608 missing 1.104164\n", "4 missing missing 1.546673\n", "5 missing missing missing" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed.fillna('missing') # changed dtype to \"object\"" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_1151799/1540442133.py:1: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.\n", " summed.fillna(method='pad') # The NaN column remained the same, but values were propagated forward\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaN0.991692
20.871363NaN0.674988
31.690608NaN1.104164
41.690608NaN1.546673
51.690608NaN1.546673
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN 0.991692\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 1.690608 NaN 1.546673\n", "5 1.690608 NaN 1.546673" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed.fillna(method='pad') # The NaN column remained the same, but values were propagated forward\n", "# We can also use the \"backfill\" method to fill in values to the back" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_1151799/1451676853.py:1: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.\n", " summed.fillna(method='pad', limit=1) # No more than one padded NaN in a row\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaN0.991692
20.871363NaN0.674988
31.690608NaN1.104164
41.690608NaN1.546673
5NaNNaN1.546673
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN 0.991692\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 1.690608 NaN 1.546673\n", "5 NaN NaN 1.546673" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed.fillna(method='pad', limit=1) # No more than one padded NaN in a row" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaN1.079379
20.871363NaN0.674988
31.690608NaN1.104164
41.058038NaN1.546673
51.058038NaN1.079379
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN 1.079379\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 1.058038 NaN 1.546673\n", "5 1.058038 NaN 1.079379" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed.fillna(summed.mean()) # each column received its respective mean. The NaN column is untouched." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Dropping missing values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We've already seen in the short exercise the `dropna` method, that allows us to drop missing values:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaNNaN
20.871363NaN0.674988
31.690608NaN1.104164
4NaNNaN1.546673
5NaNNaNNaN
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN NaN\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 NaN NaN 1.546673\n", "5 NaN NaN NaN" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaN1.079379
20.871363NaN0.674988
31.690608NaN1.104164
41.058038NaN1.546673
51.058038NaN1.079379
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN 1.079379\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 1.058038 NaN 1.546673\n", "5 1.058038 NaN 1.079379" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filled = summed.fillna(summed.mean())\n", "filled" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onetwo
00.4859930.991692
11.1841891.079379
20.8713630.674988
31.6906081.104164
41.0580381.546673
51.0580381.079379
\n", "
" ], "text/plain": [ " one two\n", "0 0.485993 0.991692\n", "1 1.184189 1.079379\n", "2 0.871363 0.674988\n", "3 1.690608 1.104164\n", "4 1.058038 1.546673\n", "5 1.058038 1.079379" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filled.dropna(axis=1) # each column containing NaN is dropped" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [one, three, two]\n", "Index: []" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filled.dropna(axis=0) # each row containing a NaN is dropped" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Interpolation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last way to to fill in missing values is through [interpolation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html).\n", "\n", "The default interpolation methods perform linear interpolation on the data, based on its ordinal index:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaNNaN
20.871363NaN0.674988
31.690608NaN1.104164
4NaNNaN1.546673
5NaNNaNNaN
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN NaN\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 NaN NaN 1.546673\n", "5 NaN NaN NaN" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
onethreetwo
00.485993NaN0.991692
11.184189NaN0.833340
20.871363NaN0.674988
31.690608NaN1.104164
41.690608NaN1.546673
51.690608NaN1.546673
\n", "
" ], "text/plain": [ " one three two\n", "0 0.485993 NaN 0.991692\n", "1 1.184189 NaN 0.833340\n", "2 0.871363 NaN 0.674988\n", "3 1.690608 NaN 1.104164\n", "4 1.690608 NaN 1.546673\n", "5 1.690608 NaN 1.546673" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "summed.interpolate() # notice all the details in the interpolation of the three columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also interpolate with the actual index values in mind:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
2018-01-011.0
2018-01-04NaN
2018-01-055.0
2018-01-07NaN
2018-01-088.0
\n", "
" ], "text/plain": [ " 0\n", "2018-01-01 1.0\n", "2018-01-04 NaN\n", "2018-01-05 5.0\n", "2018-01-07 NaN\n", "2018-01-08 8.0" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create \"missing\" index\n", "timeindex = pd.Series(['1/1/2018', '1/4/2018', '1/5/2018', '1/7/2018', '1/8/2018'])\n", "timeindex = pd.to_datetime(timeindex)\n", "data_to_interp = [1, np.nan, 5, np.nan, 8]\n", "df_to_interp = pd.DataFrame(data_to_interp, index=timeindex)\n", "df_to_interp" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
2018-01-011.0
2018-01-043.0
2018-01-055.0
2018-01-076.5
2018-01-088.0
\n", "
" ], "text/plain": [ " 0\n", "2018-01-01 1.0\n", "2018-01-04 3.0\n", "2018-01-05 5.0\n", "2018-01-07 6.5\n", "2018-01-08 8.0" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_to_interp.interpolate() # the index values aren't taken into account" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
2018-01-011.0
2018-01-044.0
2018-01-055.0
2018-01-077.0
2018-01-088.0
\n", "
" ], "text/plain": [ " 0\n", "2018-01-01 1.0\n", "2018-01-04 4.0\n", "2018-01-05 5.0\n", "2018-01-07 7.0\n", "2018-01-08 8.0" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_to_interp.interpolate(method='index') # notice how the data obtains the \"right\" values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas has many other interpolation methods, based on SciPy's. " ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
01.00.25
12.1NaN
2NaNNaN
34.74.00
45.612.20
56.814.40
\n", "
" ], "text/plain": [ " A B\n", "0 1.0 0.25\n", "1 2.1 NaN\n", "2 NaN NaN\n", "3 4.7 4.00\n", "4 5.6 12.20\n", "5 6.8 14.40" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_inter_2 = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],\n", " 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})\n", "df_inter_2" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
01.0000000.250000
12.100000-2.703846
23.451351-1.453846
34.7000004.000000
45.60000012.200000
56.80000014.400000
\n", "
" ], "text/plain": [ " A B\n", "0 1.000000 0.250000\n", "1 2.100000 -2.703846\n", "2 3.451351 -1.453846\n", "3 4.700000 4.000000\n", "4 5.600000 12.200000\n", "5 6.800000 14.400000" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_inter_2.interpolate(method='polynomial', order=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Missing Values in Non-Float Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Starting from pandas v1.0.0 pandas gained support for NaN values in non-float columns. This feature is a bit experimental currently, so the default behavior still converts integers to floats for example, but the support is there if you know where to look. By default:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 NaN\n", "3 4.0\n", "dtype: float64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nanint = pd.Series([1, 2, np.nan, 4])\n", "nanint # the result has a dtype of float64 even though all numbers are integers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can try to force pandas' hand here, but it won't work:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "scrolled": true }, "outputs": [ { "ename": "ValueError", "evalue": "cannot convert float NaN to integer", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[51], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m nanint \u001b[38;5;241m=\u001b[39m \u001b[43mpd\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mSeries\u001b[49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mnan\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m4\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mint32\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/series.py:584\u001b[0m, in \u001b[0;36mSeries.__init__\u001b[0;34m(self, data, index, dtype, name, copy, fastpath)\u001b[0m\n\u001b[1;32m 582\u001b[0m data \u001b[38;5;241m=\u001b[39m data\u001b[38;5;241m.\u001b[39mcopy()\n\u001b[1;32m 583\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 584\u001b[0m data \u001b[38;5;241m=\u001b[39m \u001b[43msanitize_array\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mindex\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 586\u001b[0m manager \u001b[38;5;241m=\u001b[39m _get_option(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mmode.data_manager\u001b[39m\u001b[38;5;124m\"\u001b[39m, silent\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m 587\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m manager \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mblock\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/construction.py:651\u001b[0m, in \u001b[0;36msanitize_array\u001b[0;34m(data, index, dtype, copy, allow_2d)\u001b[0m\n\u001b[1;32m 648\u001b[0m subarr \u001b[38;5;241m=\u001b[39m np\u001b[38;5;241m.\u001b[39marray([], dtype\u001b[38;5;241m=\u001b[39mnp\u001b[38;5;241m.\u001b[39mfloat64)\n\u001b[1;32m 650\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m dtype \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 651\u001b[0m subarr \u001b[38;5;241m=\u001b[39m \u001b[43m_try_cast\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcopy\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 653\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 654\u001b[0m subarr \u001b[38;5;241m=\u001b[39m maybe_convert_platform(data)\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/construction.py:818\u001b[0m, in \u001b[0;36m_try_cast\u001b[0;34m(arr, dtype, copy)\u001b[0m\n\u001b[1;32m 813\u001b[0m \u001b[38;5;66;03m# GH#15832: Check if we are requesting a numeric dtype and\u001b[39;00m\n\u001b[1;32m 814\u001b[0m \u001b[38;5;66;03m# that we can convert the data to the requested dtype.\u001b[39;00m\n\u001b[1;32m 815\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m dtype\u001b[38;5;241m.\u001b[39mkind \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124miu\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[1;32m 816\u001b[0m \u001b[38;5;66;03m# this will raise if we have e.g. floats\u001b[39;00m\n\u001b[0;32m--> 818\u001b[0m subarr \u001b[38;5;241m=\u001b[39m \u001b[43mmaybe_cast_to_integer_array\u001b[49m\u001b[43m(\u001b[49m\u001b[43marr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 819\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m copy:\n\u001b[1;32m 820\u001b[0m subarr \u001b[38;5;241m=\u001b[39m np\u001b[38;5;241m.\u001b[39masarray(arr, dtype\u001b[38;5;241m=\u001b[39mdtype)\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1657\u001b[0m, in \u001b[0;36mmaybe_cast_to_integer_array\u001b[0;34m(arr, dtype)\u001b[0m\n\u001b[1;32m 1650\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m np_version_gt2:\n\u001b[1;32m 1651\u001b[0m warnings\u001b[38;5;241m.\u001b[39mfilterwarnings(\n\u001b[1;32m 1652\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mignore\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 1653\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mNumPy will stop allowing conversion of \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 1654\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mout-of-bound Python int\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 1655\u001b[0m \u001b[38;5;167;01mDeprecationWarning\u001b[39;00m,\n\u001b[1;32m 1656\u001b[0m )\n\u001b[0;32m-> 1657\u001b[0m casted \u001b[38;5;241m=\u001b[39m \u001b[43mnp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43masarray\u001b[49m\u001b[43m(\u001b[49m\u001b[43marr\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdtype\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1658\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 1659\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m warnings\u001b[38;5;241m.\u001b[39mcatch_warnings():\n", "\u001b[0;31mValueError\u001b[0m: cannot convert float NaN to integer" ] } ], "source": [ "nanint = pd.Series([1, 2, np.nan, 4], dtype=\"int32\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To our rescue comes the new `pd.Int32Dtype`:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 \n", "3 4\n", "dtype: Int32" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nanint = pd.Series([1, 2, np.nan, 4], dtype=\"Int32\")\n", "nanint" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It worked! We have a series with integers and a missing value! Notice the changes we had to made:\n", "1. The `NaN` is `` now. It's actually a new type of `NaN` called `pd.NA`.\n", "2. The data type had to be mentioned explictly, meaning that the conversion will work only if we know in advance that we'll have NA values.\n", "3. The data type is `Int32`. It's CamelCase and it's actually a class underneath. Standard datatypes are lowercase.\n", "\n", "Caveats aside, this is definitely useful for scientists who sometimes have integer values and do not want to convert them to float to supports NAs." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "scrolled": false, "tags": [ "remove-input", "remove-output" ] }, "outputs": [ { "data": { "application/papermill.record/image/png": "", "application/papermill.record/text/plain": "
" }, "metadata": { "scrapbook": { "mime_prefix": "application/papermill.record/", "name": "fig1" } }, "output_type": "display_data" }, { "data": { "application/papermill.record/image/png": "", "application/papermill.record/text/plain": "
" }, "metadata": { "scrapbook": { "mime_prefix": "application/papermill.record/", "name": "fig2" } }, "output_type": "display_data" }, { "data": { "application/papermill.record/image/png": "", "application/papermill.record/text/plain": "
" }, "metadata": { "scrapbook": { "mime_prefix": "application/papermill.record/", "name": "fig3" } }, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiIAAAGzCAYAAAASZnxRAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAACBHklEQVR4nO2deXyU1b3/PzOTPSEJ2SCQAAmLiAiCFlkUsFL3pW6tv6sVrW21YtVa+yrYFvVeLd6ut7UtLrcVrW1t3drqbVXccEFEBQShIlsgyBIgkJA9mXl+f+DEJGT5nuc5a+b7fr3yUpKTmZMz5znnc77bCXme54FhGIZhGMYAYdMdYBiGYRgmcWEhwjAMwzCMMViIMAzDMAxjDBYiDMMwDMMYg4UIwzAMwzDGYCHCMAzDMIwxWIgwDMMwDGMMFiIMwzAMwxiDhQjDMAzDMMZgIcIwljNixAhcffXV2t5v9uzZmD17trb36y9UVFQgFAphyZIlUl9X9+fPMLphIcIwn7J+/XpceeWVGDp0KFJTUzFkyBBceeWV2LBhg+muSWfDhg248847UVFRofR9Zs+ejfHjx/v63eXLl+POO+/EoUOH5HbKQhLpb2WYrrAQYRgATz/9NCZPnoyXX34Z11xzDX7729/i2muvxSuvvILJkyfj73//u+kuSmXDhg246667uhUiL774Il588UX9nerC8uXLcddddyXE5tzb37px40Y89NBD+jvFMJpIMt0BhjHNli1b8JWvfAXl5eV4/fXXUVhY2P6zm2++GaeeeiquvPJKrF27FmVlZQZ72jP19fXIzMyU8lopKSlSXsdWGhoakJGRYbobZFJTU013gWGUwhYRJuH5yU9+goaGBjz44IOdRAgAFBQU4IEHHkBdXR1+8pOftH//6quvxogRI456rTvvvBOhUKjT9x5++GF8/vOfR1FREVJTUzFu3DgsXrz4qN/1PA933303SkpKkJGRgdNOOw3r168/qt2SJUsQCoWwbNky3HDDDSgqKkJJSQkAYPv27bjhhhtwzDHHID09Hfn5+bjssss6WT6WLFmCyy67DABw2mmnIRQKIRQK4bXXXgPQfYxIU1MT7rzzTowZMwZpaWkoLi7GxRdfjC1btvQ4rj0RCoVw44034m9/+xvGjx+P1NRUHHfccXj++ec7jeN3v/tdAEBZWVl7Hzv+HY899hhOPPFEpKenIy8vD5dffjkqKys7vVfcNfT+++9j5syZyMjIwO233w7gSOzFeeedhxdffBEnnHAC0tLSMG7cODz99NNH9Xnr1q247LLLkJeXh4yMDEydOhX/93//1+ffunbtWlx99dUoLy9HWloaBg8ejK9+9as4cOAA+W/tLkaE0p/XXnsNoVAIf/3rX3HPPfegpKQEaWlpOP3007F58+Y++84wumCLCJPwPPvssxgxYgROPfXUbn8+c+ZMjBgxAs8++yx++9vfCr/+4sWLcdxxx+GCCy5AUlISnn32Wdxwww2IxWKYN29ee7uFCxfi7rvvxjnnnINzzjkHq1atwhlnnIGWlpZuX/eGG25AYWEhFi5ciPr6egDAu+++i+XLl+Pyyy9HSUkJKioqsHjxYsyePRsbNmxARkYGZs6ciZtuugm/+tWvcPvtt+PYY48FgPb/diUajeK8887Dyy+/jMsvvxw333wzDh8+jKVLl+LDDz/EyJEjhcfkzTffxNNPP40bbrgBAwYMwK9+9Stccskl2LFjB/Lz83HxxRfj448/xp///Gf84he/QEFBAQC0C8V77rkHP/zhD/GlL30JX/va17Bv3z7cd999mDlzJlavXo3c3Nz29zpw4ADOPvtsXH755bjyyisxaNCg9p9t2rQJX/7yl3H99ddj7ty5ePjhh3HZZZfh+eefxxe+8AUAwN69ezF9+nQ0NDTgpptuQn5+Ph555BFccMEFePLJJ3HRRRf1+HcuXboUW7duxTXXXIPBgwdj/fr1ePDBB7F+/XqsWLECoVCoz7+1K6L9uffeexEOh3HbbbehpqYGP/7xj3HFFVfgnXfeEf7cGEYJHsMkMIcOHfIAeBdeeGGv7S644AIPgFdbW+t5nufNnTvXGz58+FHt7rjjDq/rY9XQ0HBUuzPPPNMrLy9v/3dVVZWXkpLinXvuuV4sFmv//u233+4B8ObOndv+vYcfftgD4J1yyileW1tbn+/19ttvewC8Rx99tP17TzzxhAfAe/XVV49qP2vWLG/WrFnt//7973/vAfB+/vOfH9W2Y1+7Y9asWd5xxx3X6XsAvJSUFG/z5s3t3/vggw88AN59993X/r2f/OQnHgBv27ZtnX6/oqLCi0Qi3j333NPp++vWrfOSkpI6fX/WrFkeAO/+++8/qm/Dhw/3AHhPPfVU+/dqamq84uJib9KkSe3fu+WWWzwA3htvvNH+vcOHD3tlZWXeiBEjvGg06nme523bts0D4D388MPt7br7PP785z97ALzXX3+9z7813s+Onz+1P6+++qoHwDv22GO95ubm9ra//OUvPQDeunXrjnovhjEBu2aYhObw4cMAgAEDBvTaLv7zeHsR0tPT2/+/pqYG+/fvx6xZs7B161bU1NQAAF566SW0tLTgW9/6VifXzi233NLj6379619HJBLp8b1aW1tx4MABjBo1Crm5uVi1apVw3wHgqaeeQkFBAb71rW8d9bOubigqc+bM6WRJmTBhArKzs7F169Y+f/fpp59GLBbDl770Jezfv7/9a/DgwRg9ejReffXVTu1TU1NxzTXXdPtaQ4YM6WRByM7OxlVXXYXVq1djz549AIB//vOfmDJlCk455ZT2dllZWfjGN76BioqKXrOqOn4eTU1N2L9/P6ZOnQoAvj8P0f5cc801neJ+4pY/ylgzjA5YiDAJDVVgHD58GKFQqN1sLsJbb72FOXPmIDMzE7m5uSgsLGyPU4gLke3btwMARo8e3el3CwsLMXDgwG5ft7vA2cbGRixcuBClpaVITU1FQUEBCgsLcejQofb3EmXLli045phjkJQkz5M7bNiwo743cOBAHDx4sM/f3bRpEzzPw+jRo1FYWNjp69///jeqqqo6tR86dGiPAbijRo06SkyNGTMGANpjNLZv345jjjnmqN+Nu7Lin113VFdX4+abb8agQYOQnp6OwsLC9s/N7+ch2p+uYx2fT5SxZhgdcIwIk9Dk5ORgyJAhWLt2ba/t1q5di5KSkvYNrSdLQDQa7fTvLVu24PTTT8fYsWPx85//HKWlpUhJScE///lP/OIXv0AsFvPd946n7Tjf+ta38PDDD+OWW27BtGnTkJOTg1AohMsvvzzQe8mmqyUnjud5ff5uLBZDKBTCv/71r25fJysrq9O/uxsnXXzpS1/C8uXL8d3vfhcnnHACsrKyEIvFcNZZZ2n7PIKMNcPogIUIk/Ccf/75eOCBB/Dmm292MnfHeeONN1BRUYFbb721/XsDBw7stuZD19Pos88+i+bmZvzjH//odDLt6j4YPnw4gCOn/fLy8vbv79u3T+jk+uSTT2Lu3Ln42c9+1v69pqamo/oq4lIZOXIk3nnnHbS2tiI5OZn8e0HpqY8jR46E53koKytrt174ZfPmzfA8r9N7ffzxxwDQnhU1fPhwbNy48ajf/eijj9p/3h0HDx7Eyy+/jLvuugsLFy5s//6mTZuOaivyefjtD8PYCrtmmITntttuQ0ZGBq677rpOaZXAEdP69ddfj+zsbNx4443t3x85ciRqamo6WVJ2796NZ555ptPvx0+jHU+fNTU1ePjhhzu1mzNnDpKTk3Hfffd1avs///M/Qn9LJBI56qR73333HWWpidccoRQLu+SSS7B//378+te/PupnKk/VPfXx4osvRiQSwV133XXU+3ued9Rn2Bu7du3q9JnV1tbi0UcfxQknnIDBgwcDAM455xysXLkSb7/9dnu7+vp6PPjggxgxYgTGjRvX7Wt399kD3X+mIp+H3/4wjK2wRYRJeEaNGoVHH30U/+///T8cf/zxuPbaa1FWVoaKigr87ne/w8GDB/H44493ism4/PLL8b3vfQ8XXXQRbrrpJjQ0NGDx4sUYM2ZMpyDEM844AykpKTj//PNx3XXXoa6uDg899BCKioqwe/fu9naFhYW47bbbsGjRIpx33nk455xzsHr1avzrX/8Siks577zz8Ic//AE5OTkYN24c3n77bbz00kvIz8/v1O6EE05AJBLBf//3f6OmpgapqanttU66ctVVV+HRRx/FrbfeipUrV+LUU09FfX09XnrpJdxwww248MILRYabzIknnggA+P73v4/LL78cycnJOP/88zFy5EjcfffdWLBgASoqKvDFL34RAwYMwLZt2/DMM8/gG9/4Bm677TbSe4wZMwbXXnst3n33XQwaNAi///3vsXfv3k5Ccf78+fjzn/+Ms88+GzfddBPy8vLwyCOPYNu2bXjqqacQDnd/nsvOzsbMmTPx4x//GK2trRg6dChefPFFbNu2jfy3dlekzm9/GMZazCTrMIx9rFu3zvuP//gPb/DgwV44HPYAeGlpad769eu7bf/iiy9648eP91JSUrxjjjnGe+yxx7pN3/3HP/7hTZgwwUtLS/NGjBjh/fd//3d7SmzHdM1oNOrdddddXnFxsZeenu7Nnj3b+/DDD49K34yn77777rtH9engwYPeNddc4xUUFHhZWVnemWee6X300UdHvYbned5DDz3klZeXe5FIpFMqb9f0Xc87kob6/e9/3ysrK/OSk5O9wYMHe5deeqm3ZcuWXse0p/TdefPmHdW2uz7+13/9lzd06ND2z6PjeD311FPeKaec4mVmZnqZmZne2LFjvXnz5nkbN27s9f07vt+5557rvfDCC96ECRO81NRUb+zYsd4TTzxxVNstW7Z4l156qZebm+ulpaV5U6ZM8Z577rlObbpL3925c6d30UUXebm5uV5OTo532WWXebt27fIAeHfccQfpb+1uXCj9iafvdv17uusnw5gk5HkcscQw3fHoo4/i6quvxpVXXolHH33UdHcYyYwYMQLjx4/Hc889Z7orDJPQsGuGYXrgqquuwu7duzF//nyUlJTgRz/6kekuMQzD9DvYIsIwTELCFhGGsQOOamIYhmEYxhhsEWEYhmEYxhhsEWEYhmEYxhgsRBiGYRiGMYbVWTOxWAy7du3CgAEDfN/yyTAMwzCMXjzPw+HDhzFkyJA+i+xZLUR27dqF0tJS091gGIZhGMYHlZWVKCkp6bWN1UIkfkV7ZWUlsrOzDfeGYRiGYRgKtbW1KC0tbd/He8NqIRJ3x2RnZ7MQYRiGYRjHoIRVcLAqwzAMwzDGYCHCMAzDMIwxWIgwDMMwDGMMFiIMwzAMwxiDhQjDMAzDMMZgIcIwDMMwjDFYiDAMwzAMYwwWIgzDMAzDGMPqgmY6aWyJ4j+f+xArtlYjJRLGRZOG4qunlCMlibWabFraYvjdm1vwzOpdaI3GML08Hz847zikp0RMd61XGlui+NE/N6DiQANG5Gfg9nPGWd9n4Mh4/+HtCmyvbsDwvAx8ZdoIntdMO67Oa6b/EPI8zzPdiZ6ora1FTk4OampqlFZW/fqj72Lphqpuf3bdzDIsOGecsvcOwr7aZsz40Uto6fC90QWZePKGGcjJSDbWr95Y9M8NeOD1bd3+7AvjivDQVZ/T3CMac3+/Ess+3nfU92eOzMGjXz/FQI9o3Pyn1fj72l1Hfd/med3YEsWNS17Dy1ub2r+XkRzG0m/PxtC8dHMd64dc8/BKvLrx6Hl9StkAPHbdTAM9orFx12Gc+avXO33vsaum4JRxhYZ61Dfvbz2ISx5c3ul7f7r6ZEwfW2CoR2oR2b+VCpHFixdj8eLFqKioAAAcd9xxWLhwIc4++2zS7+sQIr2JkDg2LtoT7nwBtU1tPf58eH46ln338xp71De9iZA4NoqRMd//J1qiPT8mSWFg84/O1dgjGiPm/1+vP7dxXvf1PKZEQvj4nnM09ohO140mMzmMF79tr3jqaw0JAdh2r3vzuoL7bAUi+7dS+2xJSQnuvfdevP/++3jvvffw+c9/HhdeeCHWr1+v8m3JNLZE+xQhAPDA69vQ0hbT0CMafS0gALD9QCNm/eQVTT3qm5a2WJ8iBACWbqhCY0tUQ49oTL7r+V5FCAC0xYDxd76gqUc0+lr4APvmNeVQ0BL1MOb7/9TUIzoj5v/fUafd+tYYZvz4FSv7e+4vX+tzDfEAjLas75R5TWmjExf7rBulQuT888/HOeecg9GjR2PMmDG45557kJWVhRUrVnTbvrm5GbW1tZ2+VDLz3hfJbS9a/HrfjTSwr7a5zwUkzvYDjahpaFXcIxoX//Y1cttZP35JXUcEqK5rQXUjTRTVNbVhX22z4h7R+NYf3yK3nXbvUoU9oUM9FABHxMgn1Y2Ke0Snr03ENvFU19SG9bvrSW1bLRrrjbsOk9tOsORgcOp/0QXGuB/+S2FP7EZbxFo0GsXjjz+O+vp6TJs2rds2ixYtQk5OTvtXaWmpsv40tkSxr4F+Glz/Sb0Vp8cLfv2GUPsrHhBrr4KWthg+3EVfzKrq2qywilx6P31DB4BzfvGyop7QaWmL4dl1h8jtD9S1oY4obFXyg2fWCbX/vCXWPupGY5N4uuGP7wu1t2Wsu8aE9EZtUxuq61r6bqiQuqY2VNL0HgCgoTVmzWFGN8qFyLp165CVlYXU1FRcf/31eOaZZzBuXPd+6QULFqCmpqb9q7KyUlm/7vjbh8K/s3jZxwp6IsZuwYn64V7zi98Dr20R/p0f/v0DBT0RY+v+BqH2+xrNx30/+Lr4WN/42EoFPRHjmdWfCLVvNj/UwhvNrB/bsaG/tXm/UHsbxtqPZffiX78mvyMCiAo+wI7DjAmUC5FjjjkGa9aswTvvvINvfvObmDt3LjZs2NBt29TUVGRnZ3f6UsWz68QWPgC4/xXxRV4mNlgJ/PDQm1uFf+fZ1bsV9IRONOZv9TVtNXv4rb7jcLryxuaDCnpCJxrz4GfUTFtybvyT2EbTBv/zSiZ9hDx1i2kX71W/796d3xsVh8z2WVTwAXYcZkygXIikpKRg1KhROPHEE7Fo0SJMnDgRv/zlL1W/bZ80top/4MRwAWXc9ay/IF/Ti8hhHxtGs2Ev2Ksf0eIVuvLQG5sl90SMg/Xin7Vpeet3rL/1x3cl90SMt7ceEP6d19bvVdATOn4PM1f97/K+Gylkwy618YIq8CP4AHcPnEHQXtUoFouhuTkx/WBBeWH9Hl+/N/d3b0vuiRguavz/es6f6PvDcnGLhEzMRzGJ858+x3r55mrJPRGjuU18Zt/xnFgsjGz8HmbW7aqT3BMxWn1ObFObepDDnw1uad0oFSILFizA66+/joqKCqxbtw4LFizAa6+9hiuuuELl2/ZJkMlp0vRe2+hvcq8XiDaXTRBTtMmx3l3T1Hejbqg6bM5dEGSsTZ7CPjnoL47JhtgFUXbWmA2g9HuYcVHgAsCdz5oRfnMffsf37z63xqxb2gRKhUhVVRWuuuoqHHPMMTj99NPx7rvv4oUXXsAXvvAFlW/bJ//p81QAAL9701yciF9Tnw8vlDSWdVO1kYrJsW71OdgmF+w3N/kf64X/WCuxJ2L4ndcmMR2f4pe6JjvS+UUIIpJf2WDGFfbvAK6kpsTzzKgVIr/73e9QUVGB5uZmVFVV4aWXXjIuQgDg1Y3+fNIA8OR76jJ5VGHyVpGfvviR7981OdYO7o24f5l/4fbCOn8n5UTlpj+vMt0FX/h1cQDmAm2DHBxrGs0Ixr6KIDKdScibrw75dHEAwK5DZtJhgywCJk/p2w8I5Dd2wdRYB8XUgv3RHv8uuDpD0cFBLQumXErv7/CfaWTS5RgK8LvLN4lngcggyMHRlBxgGSJGQgqRaNT/QmBqDQlidgfMLX5tAU4GpsY66Ob2ZgB3VBAaWvxv6qEgO1QAgloWTMUANASYIw+/JZ7OLosgG+Rf39surR8iBDk4mkDGQcRV159fElKIBMHUgu2nUFVHTC1+kZD/h9LUWAcxBQPA4mWbJPVEjFiABdDUre9BLAuAuRgAL8BYP7/OTDBiUIH93nYzWUqhmP8TiQnjZNBDIwDc9Kf3JPTEHRJTiAQ4aUcMbY6bq/y7OABzi1+QdcDUWAcxBQPAvwO4SExhypQcxLIAgHzvkmySA6ycFQfMpMLe7TNNOo6peItwgLGOevpdpUFiteK8W2G2yKBuEk6IRGMeghj6TLkLGlqCmSe3VwcTMn4J4AUztjkGNYu2tBkKew8w1k1tZmJbglgWACBsSKwGiUX0U39EBsu3iBdg64jnmel30Gmp21UaJFYrTrMF95rpJOGESFCzWWvMTLxFAA8HgGCxGn6Jxjy0uLg5esGERMTA7hhUYANmghGDut9MjXWQeW2KwwFTd02MNQAECH0CANz/ut5qx0FiteIkG3KVmiLhhEjQWAvATLxFLKB9IGbgNLPcx10LR72Ggc0xEsQWDCAS0v9YyRjrJ97fIaEnYgRJJwWAcKA8EH8EHetUQz7HaDSgwDYwr1vaYoGvINiyT68rLCxhrfU8Q6Y+QyScENmyL7iL4oUP9ddcCAc8OppYRJ58f2fg1zARqZ8U8OQXDmq+8sETEmqurN9VI6EndGRYFhsNWCeDzuuQoSjsWEDR1hYgaNQvS3xc4tiVtiD+YR8EPMcAALwESwBOOCHSKsF/v6dWf32LoJtjxMAn/e/dwTe2oFkVfgjqCw96yvfDKgnjdLBBb/nx3/u4lbkrJlylQed1k6EYkaCGmPpWT7ur1G9J+o6kJun1c0jwzBiLRTRFwgmRFBlmUQPrSNDsAhNpbPXNwZ/I+mb9gZ9Bi3vVtcS0L9h1EsZa88ERT68KbjED9LtKg87rhlb98wMAUpOCL/e6XaV7a/3d+dQRnQaoaMyTcqWGqVhEUyScEAlWW9AMLW2xwPcPHGqKal/8ZETZ63ant7TFpCwkuhfssAR1rHus9x2WY4HR7SqVMa/NBAYH/4B1xxHFDLiDgiAjViuOycJ3ukk4IdJsKrUyADL8pIC5Es1B0G1WlTXWuhdsGadd3WPdGpVTl8KEqzQoJgKDDwa4mj7Oht3+L3NLBGTExcUxVfvJBAknRFy8i0iGnxQws/gFJaw5ZVDWWOtesGWcdnXHUAYNwG7HwWda9/xoaYuhUUJsiozU1P7MBokB36ZqP5kg4YRIpqla1gGQ4ScF3DzNpAcpYekDWWPNC3bfBA3Adhnd80OWpc9F0VejsfpudX2ztNfSHbNlkoQTIg0Sgvp0I8tP6uLmmKLZXSDNJ+3ggt2kOTguxUQqly1onh+yLH2mUo+DUN8S0xb42SYxDs/UFRcmSKiVoKUthkMSrjvXqbBlEtIcqBs00wcwUC7d0Yc/6IVmAHCwoU1rQPNhBw8FstC9ocuy9Om2UMpCV+CnTPGgO2bLJG7OKp/IMk/qVNgy0bmIRGMeDgVN9QGwp1aeqZNChqTayjo3mmjMw0EJY+1BX0BzNOahXkZ6kgEOSQj61L2hy7L06bZQyhDYAPDiej23NMsIGo+jOz7OJAklRGSZJwHgkeWSfK4UJM1HnYuIrDS2Os2iT9YY6dxoZKYM6gpoltlnnbS0xdAgIehT94YuC50WSlkCGwBqGvUcaGQeQFy1Pvkhcf5SAHsPyzFPAsA7W4PdZCmCrFO6zkVEZhqbznx6WWOkc6OROdb/lnBzKAWZfdbpKpVlVW3VHYkoaX/UWVJfplhNibgn/FwVq35IKCGSJtGBt+uQvtoFsiZkQ6s+IfLvPfLS2HSZVQF5C61O0SdzrD1Pz0Yjo/x/HJ2u0hc3yLGq6r5LRNZhJqqx/oFMsapL+Mmo1RJn1yF5B2fbSSghIlNh6jzRyNrUdAarehKDHnWZVQEgXZKPt1rigtQXMsda18lRRvn/juiymtU2yvlcdZ/QZa19VXUt2gKapQpsDcJPVq2WOIea2pyMRfRDQgmRloDXYHdE54lmd42cjVhnRkSyxNRMnYt2oySr0aFGfRkoMsdalyVHRpn0jugq8y6r9smuGr2nXVmfa8wDVmzR45Z2TWBLq9XSAa2xiAZJKCEi0yKga3NsaYuhXtJ1rjpPMzJFny7rUzTm4ZMaOfef6FywZY61zhgAmciM/+oNWWN9qFHvaVfm5/rG5ippr9UbMgW2jjVEZjJEHJ2xiCZJKCHi2sQG5KpsnZujTNGny/okO5ND14Itc6xluab6RLKXUFe/ZY61ztNunSSXEgB8sOOQtNfqDdcs2LJqtXREZyyiSRJKiLg2sQF5wXFxXDzNpCYlSXut3nhK0rX0cXQt2FLHOlnPWFODJ6l2R10ZBjLHWtdpt6UthhoJhRzj7JdYxrw3ZIo+HWtIjBjonZkSRm4abb5qz64yhJ5VxxJcdM3ICo6Ls7ZSXgBYb8gUfSmaah1XVjdIfT1dC7bMIkq6xpoqHEIhkMqh61qwZY71bk1xIo8sr5D6erLje3pCbpyZ+nlNFdfF2alACDjU1Le1Q3d2lSkSSoi46JqRfTFYY6uemgsyRV+LppTBJsnpzboWbJnjo2usD9TRRBp1CHUt2DLHR5foW7lNruVFl4VSpujbXycn9qs3qOJaxHrnYv0TPySUa0bmpW+6Fj7qw0idrs2SAl/7Qq5rRs80bW6jzY8k4v6ha8GWuaHpuBgxGvOwt45m6aPOVl0Ltsyx1iX6ZFtedAkomeOzq7ZJeaA+NTOppS1KPsgmimsmYYRINOaholreA9ms6Z4M6sMYIn6Suia2zA1NlxChWnGoy7CLC/b26kblC/YKgdgI6hDqmtcuWkSolcKpT5kuASVzfKIx9YH61MykxrYY+SCbKK6ZhBEisjMidChsgP4whiwyYUdjHrZLFH26LpCjWnGo3dG1YNc3y4sj0pFZ9ZbAxXopREOHrgXbRYsI9X2SiaY+XQcD2eOjOlCfmrmVnhQmW0t1WVVNkzBCRHZGhA6FDQhYRCxyF6zYekDqtrC7Rk8KWyrx6EgN29Fx4pVt6QPUL9gf7DxEbpufmUpqp8s1s11iQLOuDZ06D8PEp1ZXv2UKbEB9oL5IjAj1M9FlNTNNwggR2RkRgJ5UWOpEjBB3Rx2LiMiJl8L2A+rdBQCQTkxdpVpodJx4Vdxiq3rBrjpME5YhAKlEk4iOSqUtbTFUN7jncqTPQ6L1VYOFUoXAVh2oX00MwG6JRsmfiS6rmWkSRohQMyIGpNCHREcqLHUiUjM0dCx+IideCjEAyyWLm+6gZ7nQ2uk4zci29AFAk+oy78RhHpydgjBRYOuoVCo7DVaXy5E6D6nzX0fmnYjATiUaw1ok3gPTlWjMwx5iAHYoFGKLSBcSRohQP9CRhZnkYETlCzYEzKrERU3H4teoIPPiyVWV0l+zK83EjYxqnNFxmhGx9I0qTCe1S5WY8dQd1FicgRkpOHZwDvl1VVcqlZ0Gq2P9AOjzkDyvFW7ocUQEdl5mCqmdyk1dRDgVZKaSP5Md1VxZtV9B/eBbY8BoSxZsgN7vqEWnGeqGPjSbtoAAwM6D6h9Iar+py7CO0wzV0pefkYyiAbR5rVqsUovdtUSjuPTEEvLrqq5UKjsNVsf6Ach37za0qBdQIgJ7YAZtHVF5MBARThOH5SKNWPzsQENrQtzAmzBCJIUYEZ6SFEKBJQs2ID9gS8dphjrWRTlpGDzA/Gmm/T2I/ab2RE8RJVpvhuWnk0WoarFKT5MOYfqoAvLrqq5USk2DpXp3dblmZG/AlQcblMdsiQhs6qau0i0tIpxOHVWE0rwMcvtEuIE3YYQINRAxPTnJmgU7GvOwgxiwRc3k0FGwSmSsywuzSG31iD5iQTPiSfaTGvUp3iJjTRWhqsUq1TWTHAkjEg5hRF4aqb1qsUrd0KlxLbsVXJLWHVSxSs0aa9OQMSgisKkCQ6UQoQqncAiYOjIfl0y2x9JnAwkjRKiBWJ7nWbNgr9h6gFxZMpmYOrZDQ8EqkbG2SfRtqqKdatKJJ7CYpz7IVmSsbQmQE904MlOTSe1Vx+RQN0fqoaCyWr1lAaCL1cwUemr/W1v2+e0OCRGBTT2kqDzMUJ+Z4XnpiITtsvTZQMIIkWZi5cXmaMwKhQ0Ay7fQNzGqnzSqoWAVNdaiuS1mleijvsOwPJrrDlAfZCsyr21JGRTthy0CSmRzpKDDsgAAW/fVkdplp9GFyE4F5RA64tphhjqnMz4Ve5FwCGX5xFhETWneJun/f+GnpBEtBmlJEWuEyE6BiOlTRtIVtur6JyJCxJaxFql9cub4YnKciOogWxfnNfVOn3g7WwSUihR51ZaFlrYY9hymxSqlpySRNwTV8U9CAtuCw4wfsTw4m+ZyZCHSnyBf4+lZYeoDgH2HaSa5pDDwg/OPI7+u+gqD9MBgWzZHkdon18woJ2dWqT6li5wcbRnruiaaPz0e1GpLv6kCO4Nalx7AJ4qFqkjtk9K8DAwnntJVpx67JrD9iGVb9hkbSBghkplG8zNnpiWTHzLVDyP19UvzMpCeEkFRFu1vVN1v1/y7AN1sWzQgBSlJYWsyq0ROjjaMdTTmkU/o8aBWGzYagC5EUpLC5Nuw6UX0/LGygu76uXRyKYpzaPOaKhT84prA9mMRsWWfsYGEESJTyvLI7aj5/arrAFAf9uJPTXwjiRkoqvstsojY8jBSx3rUp2PsWr/TkiJW9Fmk8NOQ3CObog0CChDLPhlZlElqq9rF0dBMz+aYProA+cTiYNR2flEhsKsb5JZC6Igfi4gt+4wN9P+/8FPmTi/r068firezZOET2dABexZsocBgSx5G0QXYxX5T+1LXpC6oT6Tw08nl+QDo1irVmVUilr4sYuCnaqEai9GexVGFmYiEQzhQTxNG1HZ+ERHY1PVmU1WdsiwlPxYRW9ZrG1C6Si5atAif+9znMGDAABQVFeGLX/wiNm7cqPIteyQlKYxvzCzrtc03ZpYhJSlMXhxU1wEQ2dABe07pdU20k0daUsSah1F0AXax3+Ew7XHful9dWmnlQXq2xdzpR55XG4IRAUF3gSVClbqGxIWTLWuIyFiXDqQVB1OZTu/HImLLWNuA0qdg2bJlmDdvHlasWIGlS5eitbUVZ5xxBurr61W+bY8sOGccrpt5tGUkBOC6mWVYcM44AHQ1rroOgMipALDjlC5SjwMWuWZE++FivzOIt4O1xTxlaaXUuTckJw0pSXbFiOwhHjxsiccBxNcQ0faqEDmEiRQHU5VO78ciYsN6bQv0xHEfPP/8853+vWTJEhQVFeH999/HzJkzVb51jyw4Zxy+c8ZY/OHtCmyvbsDwvAx8ZdqI9kUPAAoH0NKq4nUAZoymp86K4KJrRqQeR2ZaMjyi9USluwBwU/QBwO5DtKyLtKQIpozIx9INtNTtNzZXKZnX1Dk9vEOtFqrAqCNWxvVDNOZh+wH6WNsiVEXXEFtiRESsqtNHFSAE2h1QqtLp99U1k9px1kz3aJVaNTVH0kbz8roPHG1ubkZtbW2nLxWkJIVx7anl+M8Lx+PaU8s7iRAAKBEoWKWyDoCLrhmRehxTyvKscBcA4guwDYtIS1sMu2tprpmCrBTMnT6C/NqqUrxFaszEsWGOiFQ5LshKcdKyAIi7KFUgalWNhENG0+mjMQ87D9GESEdRbcN6bQvahEgsFsMtt9yCGTNmYPz48d22WbRoEXJyctq/SktLdXWvE9MFioOprDDoollVpB7H3OllVrgLAPEF2IZFRKRGREleBlKSwsZTvP0IERvmiEiV45K8DGssC6Jrgg3zWtSqCsBoOr1IJljHy+5sWK9tQZsQmTdvHj788EM8/vjjPbZZsGABampq2r8qK9WWx+6JqeX5VlQYdNGsKlqPY8qIfPJrq7Q+iS7ANiwiIjUiZowsBACMKhpAaq+q33586SJzRFXVYJHCYzNGFlphWQDE1wQb5rWI6IuXZTApoEQywS6d/NnhmvrZtBDFu8toESI33ngjnnvuObz66qsoKek5sCg1NRXZ2dmdvkwQCYesqDDoollVtB6HiLvAJuuTDaKPWiMiKRzC1JH5Qv1R1W8/2QU2uJRiRJdPcuTIWNtgWQDE14SCrFRSe2o7P4hcbRHPrDIZs0XNBIvXaolDrWuyZmeNlssRTaJUiHiehxtvvBHPPPMMXnnlFZSV9Z4+axM2VBh00awqutGlJIVRnE37HZXWp4IsWh/i7WwQffmZNDfLCaU5iHx6JazpfouU///s/827lPYTgxHLC47U47AlmFl0TaB6LlTGT1KvtuiYWWUyZov6GY4uymp/DgE7XI62oPQpmDdvHh577DH86U9/woABA7Bnzx7s2bMHjY1q71eQgemTIwA0E90cNplV/Wx0xbnmrU+iC5kNtWaofR7aYXxNi1W/N9iadilRxyPz083FhmBmQN2asIuYreUH6lgPzvnMKmNyXlNd6HkZncW0LW5pG1AqRBYvXoyamhrMnj0bxcXF7V9/+ctfVL6tFEyfHKMxD6t30rKGqj/tgw0+Rz8Lgg2nR4E7EQHQN9TtB9Rlcoj2GTAvVkXjnuKYPhi4aJ0E1I3bBwrdBX7mqMl5LepCjyPiclR9OaJplLtmuvu6+uqrVb6tFEwvJCu2HgD1Oc9IPbIp2uBz9LMg2HB63LiHFlsQN9FPLMkltVdZzZHqLujYzvSG7nfRNn0wUBX0uaeG9hn6RVXF4JaoOneBnzlqcl77FUEpSWGU5dNqVqm+HNE0/b9km09Mnxz9RI7b4HP0syCYFn3RmIePiXUL4mmlIsW+VFVz9DNupjd0v8+V6TkiOm7Uooi7apqUWih319BO0vFxGzrQfA0lP3PU5Lz2a+UDgOIcWnl61ZcjmoaFSA+YPjmKmOLikeM2+Bz9LAimRZ9I3YKSTxfqqeX5fV6iGEdVNUc/42Z6Q99/mDYWXRdt03NEdNxEiiI+snybrz71hWg1WECshpIqd4GfOWpyXvu18gHmn0dbYCHSA6ZPjtR0wZGFGe2R4zb4HP08WKZFn4j1KV4HIBIOYUwR7TSj6g4U0UwfwOyGHo152LSPNu+6pvmaniOi4yayoa/cVu2rT30hUg12QmkOgCMCm5jYpAzXYkSCvLdpgW0LLER6wLRSpfr/B3UwAdvgc/TzYJkWfVRRlhTuXAfgmME5pN9TVXPBT2yNyToRyzfvF7Y8xTE9R0SF0NTyfISJG3qDojtyRK5bOHVUEYAjAvuEYblK+kPFj8A2Oa+DuGZsqNtiAyxEesC0UvUrhEz7HP0sIqZFH3UhmVSa26kOAO2aLZF2OjDXZ78VKAHzc0RUCEXCIUwqoRVkzFNkxaFet9Cx4B3QOd3bDFSTjPizuIsYMyNCENeMDXVbbICFSA+4ZgqOY3rB9rOImBZ9VIZ0WaCpYs6mQDOTffZbgRIwP0d8paUTU7xVWXGo1y2U5qV3Edhm8ZMNdqCemDFYKT9j0PTc7A+wEOkB06ZgP5YFwPxD4WcRcdU8aV70iWOyEJvfCpSA+TniYmAwtc/F2TR3ri78jJvJjMEgh9Ygbp3+BAuRHjBfOdOPedL8gi2aLngEsy4OP4XBAPOiz0+/qYXYKqsbpZ8cqQv2mKKsbr5rdo64FhjctS8y2unCz7iZzBgMcmh10aqqAhYiPWBywQb8WRYAsz5HP+mCgPmH0e9YmxZ9fvpNLcSm4uTo8oJNL2n+2YNl2r1rQ6FAP/gRUCYzBv0dvnr+HvV3+xMsRHrA5IIN+J+gJk19ftIFAfPWJ79jbTrQzE+/RQqxyT45Bll0Tc6RaMzDKuKtvgc6xS2YvxjRRXbV0D7DjgLKVMag38NXb9+j/m5/goVID4gs2G9srpL+/n4nqMmTo0g9jni6IGDe+uTqYlDXRAtG7NjvqeX5oP4VO6tpwaVUgoyzyTni57oFwN3TrsnDTDTmYc2OQ75+10TGoN/DVxzTVlVbYCHSA1PL88mDs5Z4WhLBr3/X5OJHNXmmRDqnC5q2Pvkda9ML9paqelLbjv2OhEMYWZRJ+j3ZYjWIq8LkHPFz3QJgXuD6jX0yeZhZsfUA2nw+LibWPj+1Wjpi2qpqCyxEeiASDmF4vrnr6f36d00uftRqsBNKcjplRZi2Pvkda9MLNnXWleR1PilmpdGsC7LndRBXhUmXkp/rFgDzp12/sU8mDzMioq/rvTgm1j6/tVqYzrAQ6YXiHJoQsclkb3Lxoy58KV3SOE1bn/ziyoI9Y2Rhp3+bEqtBxkuk9LjsYESqRavjdQuf/ib1HYT7RMHveJs8zIh8dl3ntYm1z9VaLbbBQqQXzJYN9tfOpKnP78Jn2vrkFxcW7K5uMMDcvA4yXpFwCBO78bF3h6maC+OHdO6f6Uwfv+PtQrn07ua1ibVPV60WetaWm7AQ6RVzJxq/ZlWTBNlo2PokBnXBntjFDQaYE6tB61qkEj972Ru730OB6WBVv+PtQtxCd/PaRMxW0DlNdQt/sFN+RVibYCHSCyZPNH5z000GUAZ5KE370/1AXYipn6UKupalN4u/In1xTG3sfg8FVNG8p0bNYcLVOiIUupvXZtbrYHO6a5xLT7RE1QTq2wILkV4wtfAFyU2nPmQb99YJ96svgi187l0gx6cZMYJa+VyLbSkcQDPH76ppQksbNQm0f+PX+gSYWa+DzunpI80FYdsEC5FeMLXwBclNp94EuXlfnVWbo2l/uh/4NCNG0I3CtdiWkjy6NeqR5duE+mQDKuIWgmzsJtbroHPaZBC2TbAQ6QVTC5/fwmAAUDqQVtQnGoNVm6PR+ic+F1SR04yKtGO/mHLf+SnA1hHXYltE5sfKbdVCfVKJSUtfkHXAxHodVPxEwiGcMCxXWn9chYVIL5ha+IJkRFwyuYT8PjaZ+kxZn4JUcnQ17diE9clvATYb8OtynFqeD2rGZkMzTaSJ4Fdgm7T0BVkHTKzXMi4WHGpVHJcZWIhYSJCMiOmjCsgfqk2mPlPWpyCVHF1NOzZhfQpSgM1VIuEQJhNTjvMlz+tozMOq7Yd8/a6IJeftrXTrLQXXbgyWERBsMsHAFliI9ILtE6S7yPFIOITJw3P1dyYgpqxPQSo5Am6mHZuwPgUpwOYy9KwluWvI8s37yXFmXef11PJ8pBA/etlhZv0506cnXIyPkw0LkV5wdYKwqY9OkEqOgJtpxyb6vLPav7vRZUytIU+t2klu23VeR8IhnDehmPS7OenJQv1SCfVA6NdlpQrT9WZsgIVIL/AE0Ycp61OQSo6AG8WfumKiz/sO0652LyvI7LEUtu0Wyu4wtYZUHqTdnBwJo9t5PYho6TvUaM8h7EB9K6ndmkp5QbZB0o3jmL4c0QZYiPSCqQkiY3K7hu3Wp+7icRg61I02M7XnZ8nUHAnyPJpaQ1IjtKV9VGFWt/N6F9FSSG2ng4xe5k5HZN7QLKMCtotWVdmwEOkF6gffQqzdQcXF8u6Ae8WIRLCrQqkeZNaJkLEhu1ZZFTC3yVCDOY8ZlCX1fU0yZQTdpScrY1DGnHTRqiobFiK9ECaegNdKzqe3fVPuiY17aCmqthQjSlRM1ImQkQ3hWmXVI5ipGJyIQZ9zp48gt5WVMRi0Ng5zBBYivWAqnz7ogmvClx6Nefi4iuaXbu6mnLUp61MiYmJey9gYTVkXgmw2trscbSOIVTUlKYzyAn2WS5dr49gGC5FeMHUPQNDTo4nFb8XWA+QzXUk3G6Ep61Mi4ur9FiYuGQy62bhq3TRFEKsqABw/NFdib3onEWvjqIKFSC+Yugcg6OnRxOInUifi0smlR32P723Rh6v3W5hwKQXdbNjlSCeoVVU3iVobRwUsRHohEg5hIrEyok0pgyYWP+qGlRQGpo8++kTu6indRVy938KEWA262XBGBJ2gVlXdBLmKg+kMC5E+SCVu1jb5eE0sflQhNqk0t9t0wanl+aDKop3VtFOTDkzVthhCXIh7aqe76J2MlHQTYjX4ZmMmWNVFglpVdRPkKg6mMyxE+sBFH6/N6WA9pcFGwiGMLMokvYZNos9UMOJeYhxEbrodQXIyUtJNuJSCbjYcrEonqFXVVhIx9V8UFiJ9wD5efWSlJZHa2ST6TAjVaMzDc2t3k9rWNNKqTapGxjjZ7FLqabOh/t27a2mVZ/szQa2qjLuwEOkD9vHqw0XRZ6LPK7YeQAtR19iyXssaJ9fuUUpPponryupGzgYjkqgWBpkFBm2DhUgf2Ozm6G+4KPpM1D8R8aVPK7fDhO3a9e6ymFiSS2ons+w44xYmssFsg4WIhei6HdI2hW1C9AUNojRR/0RGtL7uINtErPQJADMEYhne2FylsCeJgYsXI3LpAhYi1hGNeViz41Cg12CFTSdoEKWJRURGtD4HUephank+eZFdW0kr5sX0jIvzmksXsBCxjhVbD6CNqA162gRZYdMJGkRp8yLSmy9dd5CtLiufbUTCIQzPpz2PNgVhu4rOeS3rlnRXCwzKhIWIZcio1mfz5mgbQS+tcnUR0RlkK8PK5zLFOTQhYlMQtqvonNeybkm3ORtMFyxELEOG/9/VzVE3Mi6tcnUR0RkYLMPK5zImgrDXfXJI2mv1hm1xZjrHWqb1xbVsMNmwELEMGf5/VzdH3ci6tEr3IhK0qiqgNzA40e/k0B2E3dIWw9b9wQSCq3FmOsfaxXIDtsJCxFH6yqVPdIVNwdUNklot1Zaqqol+J4fuTI5HlleQ23KcmX8SNSVdBSxEGGvQvWC7ukEeaqRF/FPbqSbR7+TQncmxsoIuDBI1zkyGSylRU9JVoFSIvP766zj//PMxZMgQhEIh/O1vf1P5dozj6F6wXd0gdx+ilQOntrOF/loxU3eGUkMz7XWSwv0vzsxVl1Kio1SI1NfXY+LEifjNb36j8m2YfoKtFwzatkG6WLQpkdEdS5CfmUxqd0Jp/4szY5eSmygVImeffTbuvvtuXHTRRSrfxgpsix53EQ7+sg+e18HRnTVDtQr0FUfmYpxZf3cp9VdoNzJporm5Gc3Nn+Vc19bWGuzNEagP9erKQ4jGPKtM+K5h4t6WREXUhG3LvHbRL8/3Vekj7lKipIvb5FJKdKwKVl20aBFycnLav0pLS013iWzqa4sByzfRszD6GzIqZ5q4tyVRcdWEXZybJrWdDtiVpg+dLiW2FsrDKiGyYMEC1NTUtH9VVlaa7pKQqe/JVeb7awJZlTNd3RxdxFUTtu7UZRkC28X7T1xGh0spGvPwwU7a3UD9sUifbKwSIqmpqcjOzu70ZZqp5fmgWkx3JqipT1blTFc3RxdxNStCZ+qyLIFtaxB2X+h2g7l0H9GKrQfQEqUtejbVILIVq4SIjUTCIYwp6rmqZkdSkxJzOGUVBnN1c3QRV7MidhE/9/e2VQd+L1kC29UgbJ1uMNfuI6KueWlJYatqENmK0p2zrq4Oa9aswZo1awAA27Ztw5o1a7Bjxw6VbyudYwbnkNrJvCvCJWQVBnN1c9SNjBLvgJtZETrrRMgS2K4GYet0g7l2HxF1zZtgWQ0iW1EqRN577z1MmjQJkyZNAgDceuutmDRpEhYuXKjybaXDUe+9I7MwmIubo25cK/EuE51xRLIEtqtB2DrdYK5dt0Bd84pz7AmathmlQmT27NnwPO+oryVLlqh8W8ZSbCsM5iqulXiXic44IlkC29UgbKobjNquN1y9boGRQ2IGNTCMw/TXEu8UbIwj6ktgi4inNzZXBe2Ok7h63QIjBxYiDOMYOw82kNr1x7oULsYRTS3PJy+0aytpKaGJSqJbVftr7RIWIgyjCRmLiGvZBSpwLY4oEg5heD6tz7al8DJ6SPTL+liIWIbuXPr+qrApUA0GfbXTuYi4ll3AHKE4h/ZZ2JbC6yIuXgPgahyRLFiIWITM026iK2wK++ua+25EaKdzEXEtu8BlZKVJA/ovvktkXLwGINGLObIQsQiZp91EV9gUdtfQrEF9mct1LiIuZhfItvLpOvHKTJPmEgD6cDG9fWp5Pqi2sJ3VtBgxl2AhYhEyT7uJrrD7IhrzsP0AbYPsy1yuM5PDtewCFTEtuk68rqZJu1QqXQUufm6RcAgjizJJbfvjnUQsRCxC5mnXxjRHm1ix9QCodSwnlPZeWdfGTA5bsgtUxLToOvHKTJPWdQMvBzPrrX8ik6y0JFK7/hjQzEKEgC5TsMzTro2bo02IWJ9OHVXUZxvXMjl0oSKmRdeJV6Z40HUDr4lg5kQOeJeJq3cSyYCFCAHbgp+op13eHHvGxVgLF1Exzi6eeHXdwCtT+HHAu14SOaCZhQgBF4OfADfT2HThWqxFHJmZHDqgivPTxhZZNc6y0XXalSn8dAa8uzavVZDIAc0sRAi4GPwE2GfJcRFbYi3iuCaKqf2YPGyg4p6YRddpV6bA1hnwbmJes0vJHmjRMQmOi6ZgwL1Ni+kb10Sxa/1VhW2nXYrAjge8U2JOgga8y5wnoi4lP5a4/mLBicY8rNxWjarDTSgakIYpZXlGLJMsRCRim8LmTaD/YeLCuyDzWkV/2eWoh3jA+3vbDyl/L5nzRNSlNGM03fITpz8c8p7/cDfu/McG7Kn9bEwHZ6fhzgvG4azxxVr7wq4ZAq4Gbemy5JioW2Cb6NOFzEwOHfNaRdoquxz1oSvgXeY80eFScv2Q9/yHu3H9Y6s6iRAA2FPbhOsfW4XnP9yttT8sRAhwldKekV23wFXR5yKuzmsXT6O66ogwemoombBMyiIa8zD/6XW9tlnw9Dqt6ysLEQJcpbRnZNctcHVzdBFX57WLp1FqfZCNe+sU96T/o6OG0s6DtDLrNgrLFVsP4FBDa69tDja0YsVWfesrCxECXKW0Z2QXrHJ1c3QRV+e1i8HjzVFaHd/N++rY0icBlS4l16vXvk08wFHbyYCFCAGuUtozsgtWubo5ukgkHMKk4bS02UQM/JQZh1Q6MIPULhpDQlr6XLofx0T1Wpl4ILoJie1kwEKECFcp7R7ZhcFY9OnlpBE0IZJogZ/RmIcPdtaQ2lI2m0sml5DfO9Esfa5ZGFRcW6CT7LRkqe1kwEKE0YJIYTAWffpwMfBTR/ruiq0H0BKliWySy3FUAXmxTTRLn2sWBtevh1i94yCpXU1j73EkMmEhwjAScLW2hYuBn1TrTGOr/3tbqKfetKQwabOJhEOYPDzXd3/6M65ZGFy+tiAa8/DqRprFTWfXWYgwjARcrW2hOvBTRQXKvAxaGfTXP97nO/CTeuqdIHAXEVv6usc1C4PpawuCxC6t2HoAzW20wOlp5eKF3vzCQoRhJOCii0MHKsalYABNiDS2xnwHflJjn4pz7BKWLuLaBZSqrIg6aihRM2Golj5ZsBBhGAnocnG4lF0AqBmXwdn0zd+mwE8uahYMWy6gVFXMTEcNJWomzOyxhVpFHwsRi3Btk2E+g+q6eG9bte/3cC27AFCzaE8py0MqMcfbpsBPalEzajvGDKoEpY4aStRMmEmlem/DZiFiCSY3mUS9t0UmOsyqrmUXAGoW7Ug4hAkluaS2NgUHN7XRgmep7Zj+hY4aStRMGJ0ZMwALEWtQscnwvS09I9v6pMOs6lp2gUpcrH+SlhSR2k4H7E7Sh44aSlRdrlu/sxAhojo9U8Umw/e2dI8K65MOs6pr2QUqcTE4uCCLFmRLbdcdsgU2u5P0ojqzysZiZgALETKq0zNVbDJ8b0v3qLA+6TCrupZdoBIX65+oPo2qENhUN9HuWvtumVWFipR0XbBrxnFUn8BUbDKu3tviovXJptL0tmQXAEATsaiY6KLt4sV3qlEhsNOTk0jtKqsbE8a966I1Lg67ZhzHlhOYyCZj0+YogovWJ4ALVnUlGvPw2sc0S1tehn2LtmuoENgTiUHBbbHEce/ashf4gV0zjuPqCczFzdFF6xNzNEeqONLGWjQuwtWS+ipRIbBnjGb3bldU1RHRAbtmGIaILScOm1wcLiJyQh+cIzbWrpbUV4mKO1Bcde+qxOVMInbNMAwRV61PqnEtSI66MaUnhzGlLE/otV3206tCxR0orrp3qZaw3TX2WS1Uwq4ZhmEC4drmSz0Rjh+SLewCs8VqJoLqk7SqMXHRvUsNxl3ro4aSaweCjtj63LAQkUyiVSnlsvT6MLmIqJzXflxgLlrNqLU2Nu6t8/X6LscuyIZauqCpTfxiRNcOBB2xdY6wECHCVUqPxvTdJ7aJPtUBlCoWEZXzWuXJ0cVg1eYo7fr1zfvqfK0hLscuyGZqeT5SIrTPXjTI1larAgVbY6tYiBBRXaWUujbYtIaouvvEVdGn+iFXsdGonNcqT462Lqi9UTowg9QuGkPCpMKqIhIOYWJJDqmtaJCtrVYFCrZac1iIEBGpUvr2Vnq2QJz9dc1S2+lA1d0nrpamt/Uh7w2V1XdVnhxVj7UKa84lk0vIbRMlFVYlqrLeXLY8cfqu4xwx9dHa+jmku3gzp6rCYK6WpnfRZKsyPVPlyVH1WKsQOtNHFZAX3ERJhQU4zkwnnL7rOJFwCOdNKCa1zUkXT33qzzdzihYGc7V2gYsBlCrTM1WeHFWP9art1aR2IkInEg5h8vBcX/3pr5iOM/ODy1kznL7bDxhELLrk5xRWkEU7WVHb2YSoidTV2gWu4mJ6pspg1WjMwysfVZHailpzXBxrlaiKM1PJ3hqauPXjFlQdhG2r1ZaFiAAqT2EuZgGohBdsfaiaeypPjiqDVVdsPYBWWoILhlgUDOsiquLMVBGNeXhu7W5SWz9xFqqDsG0NtNUiRH7zm99gxIgRSEtLw8knn4yVK1fqeFuGYQioWvxUBpSqfG3XNkeXURVnpooVWw+ghRim5+eaKup83VvjL2nB1mwz5ULkL3/5C2699VbccccdWLVqFSZOnIgzzzwTVVU00ydjLy77SpnPULWpqzQDq3xt1zZHl3HtAsq3BbL0ppXTg+7j1DbRrCj/XLfbV+kCWzP7lAuRn//85/j617+Oa665BuPGjcP999+PjIwM/P73v1f91k7hYuS4rZO6v6Jqjqja1FWagVW6SV3bHBMBWy6g9ECbG2lJYV8iNQTafPJTERZI0PTdlpYWvP/++5gzZ85nbxgOY86cOXj77bePat/c3Iza2tpOX4mAi5HjgL2BT/0RlXNE1aauMmvGhpgqWzZHRh/UbJJzji/2JVKnCYgXP6ULEjJ9d//+/YhGoxg0aFCn7w8aNAh79uw5qv2iRYuQk5PT/lVaWqqye8KoWvxsiBz3Uy5d5YnXho1GFX7GWmSOJAJUH3Zjqz11dwC3i2Ex9EPVoJxUX6+vunQBp+8SWLBgAWpqatq/KisrTXepE6oCfVQGx6ksl65yUVUZVKXKxaFyrEXmiC1pjipjiPIyaAv96x/vs+oaAOrFd9R2jF5UZ52oLl1gqxVbqRApKChAJBLB3r17O31/7969GDx48FHtU1NTkZ2d3enLJlTFRKgMjuNy6Z1R6eJQOdYipx9bMjlUxhAVDKAJkcZWf750VaisoOxinJlr6LBoqSxdkJDpuykpKTjxxBPx8ssvt38vFovh5ZdfxrRp01S+tRJUqUmVwXFcLr0zKt1gKseaOkdsyuRQefoanE23hNk0r1VVULYhzsy227BV4HqmYGNLG6md7vTdJNVvcOutt2Lu3Lk46aSTMGXKFPzP//wP6uvrcc0116h+a+mYLuHtJzgu7nOkbL6JUC5dpRvMhrG2KZND5elrSlke0pLDaCJUHttl0S2oBVk0Sw61XRyVAlvU5WjL/FOBy5mC0ZiH1z6mifK8jH6WvvvlL38ZP/3pT7Fw4UKccMIJWLNmDZ5//vmjAlgZNXC59M6odIPZMNY2ZXKoNGNHwiHMGk2zQOk+3fWGqqwFlQLbVfeuCmyNsaCwYusBNBPVqqgQDoqWYNUbb7wR27dvR3NzM9555x2cfPLJOt6W+RQul/4ZqmtE8Fh/hmoz9uTheaR2Np5OZaNSYLvq3lWBrTEWFESKsQ0m3qsmC6uyZhjGFmyyLLiKajO2y6dT2VCtPqeNLRIW2FPL80GNWNlZ3SD02q7hcvo1tRhbVmoEU8poIl8WLEQE6M+1LRhGNqu2V5Pa+RUKpmO2bIIq5iYPGyj82pFwCCOLMkltbUo7VrFeuxysSq0Ncsa4wdrjfFiICGDrhUF9wQKK0U005mHZJlrcgl8zdn+e16IZKKqtQ1lptLwG0bRjG25oFil653KwqupibEFgISKAq5PQVQHFuMvKbdWkjBYAGOJz3rk4r1UVvVMdu6Aq7Vjlmqqi6J3L7kCb41tYiAjg6iRUVogtAeoGiMKn9CPsqaUvZn4LsLl4MFCVgaI6dkFV2rHKNVVF0TubN/O+sFm4sxARQN3lYHLbdUXFw25DASXAviJKNj/sPaHilL7/cDOpXXqyv1tKATcPBq5moKhKO1a5sasoeudysKrNwp2FiAJEN8f9dbRFm9quKyoElOqL+lTe26ISmx/2nlBxSq9uoM3VmWMKfQfGuRisqvpSM9dQubFPKctDKnGwqWPtcrCqzcKdhYgA1M3xw121QpujyvsnVKGygBLgbhElmx/2nlBxSqeeYNOTxWIKXMeGoneJQiQcwvFDc6S+posHjTg2u5VYiAhA3RwbW2NYuY2WugioCwRTicoCSoC7Jmw+pR9BhwlbVTyO6tgnLnr3Ga6NtYsHjTg2u41ZiAggsjnuqaE/YAVZNPVMbacD1RVK2YStDxWndB0mbBULqy2xT4mAi2Nts1WhL2y25rAQEWBqeT7Z5ygSz9GfMy38VihlE7ZeZJ8cdSx6Kt5DdewT8xkujrXNVoW+sNmaw0JEgEg4hNljaPEO1Q32meZcg03Y7qJj0VPxHqpjnwC3My9komOsZWOTVUE0KcJmaw4LEUHSiMF1IjEAXI+D6W/oWPRUxOOojn0C6GXQbSqXrgIdYy0bHQJbVcagzQKYhYhhXPSTMvpZ98kh010QwuZFrzdUxz4BbmbJqUDHWMtGh8BWlTFoc+oxCxHDuOgnZfTS0hbD1v1uWc10LHomY6uC3M7sYpacSWy6CVtHjIiqjEGb3EpdYSFiGBf9pIxeHlleQW5ri1jVsei5Gjiooly6auHnqoVLNnuJ2ZBB5rWqjEEOVu1HyD6F2eQnta1cuooTry3xOCJjvbKCbn71K1Zlj7WORc/mE15vqCiXrnosqPEqG/fW+Xp9F4jGPDy/fi+pbU1jq+/3UZUxyMGq/QjZpzAdflIVwU8u1onQEY+jYqwbmmmxAklh/2JV9ljrWPRsPuHpRvVYNEdpNylv3ldn1ZULMlm5rRoNLbRxCBrSoiJj0GYLIgsRQWSfPKiWzOIc/xNTRfCTi3UidMTjqBjr/MxkUrsTSv2LVdljrWPRc7GKrSpUC7/SgRmkdtEYrLpyQSYiN0pPK6fHeejCZgsiCxFBZJ88VF94B6gJfnKxToSOeBwVY021sgQ5Rckea5sWPdtcjirYebCB1M5vDMclk0vIbW26ckEmOm6UVonNFkQWIoLIPoXpSOVTEfzkYp0IHfE4rpamlz3WLtdbcA0dLsfpowrIm4VN81omOm6UVgnHiDA9oiOVT0Xwk4tR9Dricbg0/RFcrrfgGjpcjpFwCJOH5/r63f6C6zdK27xmsxAxjK4L77hcOp2gdQt4rO2rt/D2VrpbzjV0lQBI9Hlt80ZOgQuaMT3Sny+8YxIXHTEiU8vzkUI8fLromaHGtthUAqA/Y/NGTsGmuK2usBBhGEY6OmJEIuEQzptQTGqbk07LPNKB7NgWF0ul2wZF9Nm8kVNYtb2a1I6DVRmGOQqqpdcmi7CuwLhBxLR2m2qJmIptsalUui5kij6bs076IhrzsGwTzYXHwaoMwxyFjhRvKlR3ga7iSS7WElF1lwhzNDJFn81ZJ32xcls1mlppxdiGcEEzxhVsKZWeCOhI8ZbtLnDdjK0SV1O8XQzWlCn6bK5M2hcixdhM3GnGQkQQDi7VU7dAhP5esEpHirdsd4HLZmzVuJriTb1vhtpOBzJFn8vi2vZibCxEBJF/1wztfS06ZGipWwBwwao4OlK8ZbsLXDZj68DFVFgdljnZyBR9OsW17AOv7cXYWIgIIlsV2+T/p6KtbgEXrAKgxwon213wieKS46pgl2PPyLbM6RprWaJPp7g2dQmlqWJsLEQEoardVTsOktq5eMrQVbeAg/r0IfPkGI15+GBnDaltEIsZIFek2eZytI2CrFRp7Vwca50xMvIvV7U7voeFiCBUZfnqR1Ukd0FdUxvp9YL4/2Wjq26Bq0F9snHt5Lhi6wG0RGlzJGhgnMyToy6Xo6tQDW6Udi6Otc6CZokWY8VCRBCZ7oJozMOWqnrS6wUt8W6CoHULXA3qk4mLJ0eq6y4tKXhgnMyToy6XI+PmWOsMVpWdlm57VVgWIoLIdBes2HoAVIdLSV4G+X37Ey4G9clE5ORoC1Tr1AQJlT5lnhy5VLo+dI61LPedjVYKasag7Rk/LEQEkekucPFUwOhFZI7YYsKmuu6Kc4LXW5B5cuRS6frQOday3Hc6g1VlZwzaKKI6wkJEEJnuAj6BMX0hEvtii1i13QwclEQslS6CzLo+MsZaljVAZ8Cn7IxB29PpWYj4QJa7gE9gTF9Q54hNYtV2MzCjFtvq+siyBugU2LIzBjlrhukR6mdeTLzYSwb9uUqpbTUiZI61TWLVdjMw4w+qu8C2uj6y3Hc6BXaiZQyyEDGIzmJmMn2OLpredWafmKgIa5O7wHYzMOMPkRik/ljXR6fATrSMQRYiBtFZzEymz9FF07vOugUyx9rFKwBsNwMz/hBxF/SHU3pXdAtsmRmDth8eWYgYRMdlZnFk+hxdNL3rzFCSOdYuXgFg+6LX39DlchRxF/RHXBbYth8eWYgYRMdlZnGmlucjmfhp7+pD0btoeteZoSTTv+viFQC2L3r9CZ0ux0RzF3TFZYFt++GRhYhBdFxmFicSDuHzY4tIbfvKp3fxZKAzQ0nmgq3TaiYLGxe9/hqErbtUeiIXGNxbI7d4mE5sPzyyEEkgJg/PI7Wz8UHShaygT1kLtk6rmSxcLvzkGlwUUQ/RmIfn1u4mta1pbFXcG3FsPzwqEyL33HMPpk+fjoyMDOTm5qp6G6fRHYho40mV6R2dVjNZuFz4yTb6suRwUUQ9HLnIkdbWkix6p1AmRFpaWnDZZZfhm9/8pqq3cB7dgYiyL1Ji1ONi1oxOZBd+0oUsS47uooi2n6xVIWJ5mlZOn5PMEZJUvfBdd90FAFiyZImqt3AeFwMRGb1w1kzvxAODKXESNqWUilpyZowOtrnJcjnur6NZS6ntbOLDXTU9/owtT2qxKkakubkZtbW1nb76My4GIjJ6sVGs9uUu0Jk1EwmHMGn4QFJbm9xXrlpybJyPfUH93Lfsa0BLW6zbn7l+HYftGT9WCZFFixYhJyen/au0tNR0l5TiYiAioxedYlWWu0B3LNJJI2hChHoLqw5cLeHt4uFJJFvokeXbuv0+dYM+qYyWEKAb21PqhYTI/PnzEQqFev366KOPfHdmwYIFqKmpaf+qrKz0/Vou4GIgImDfnS1A/03P1ClWZQV+6k4VlHe7Ku39ZIQ/uFqToyArVWo7HYhYn1Zuq+72+7Zv5H2xanv3f1dXTCUqCMWIfOc738HVV1/da5vy8nLfnUlNTUVqqj0TWDU2buh9obOAEiB+Su/JLGq7abIn6AIruFidPrIAv3l1C6ntW1v29Ri3oDugUZYFZuOenmMEOiIrHmdobjre235IymvpgnomsunsNLU8H+EQQMnebmhu6/b7LmccRmMeXvmoitTWVB0RISFSWFiIwkLORZeB7g1dFtoLKEkK6nPxRBONeVhVSdscD0jYHF0N/JSRDRaNefi4qoH0Os09xBEwdHQewiLhECaX5uC9HX0/S/k9WHJsLwjWGyu2HkArccoOMeS+VBYjsmPHDqxZswY7duxANBrFmjVrsGbNGtTV1al6S+vo7TQrsqHbxFub9BZQkhXU5+KJZsXWA6RTHABkpAZPgHPVXSCDFVsPgPo4llhmNXMNE4ewoQMzSO16suS4nLbsQtE7Zem7CxcuxCOPPNL+70mTJgEAXn31VcyePVvV22pBhrtAZHLIsCzI4oOdh0jtksJy0thkndJdPNGIzJEpkoLkZLgLXHSDiYz1pZP7dxB9b8jYkHVbVRMdF1KPlVlElixZAs/zjvpyXYQAcoL6REzbNpVmbmzt3ofaldK8dClpbJFwCBNLc0hte1v8dJ9oZLyfyByZO72M3FY1LrrBqGOdFAamB6zp4TIy6oi4cELvT7iQemxV+q4ryHAXUCeHbQVyqGl5xdnyfI2pxPe0qYiSjAU7RvTLjCzMQEqSPY+y7RH63UF9HieV5lpZJ0IX1Pogu2t7tiy6cEJ3CVkZg7KK3vnBntXLIXTWALCtQI6J2icuFlGS0WdqdsagAfbUx3AhQj8IJhdrG0hPpnnzK6sbe6w1QxXYEyxa+0y4GxPpQkcWIj7QGdRn28JnovaJi0WUZPTZRQHmQoQ+45+JJbmkdm2xnt3SVIGdErFnezLhbuzvFzp2xJ5P2jGCXvPOl5nRcbGIErUvLdGed20XBZgJ/7/LGQ2uIXLnTU9uaRcFtomsO1kZgy4Ej7MQMYSLl5mZwsUiSmGiSXltL2bV/Eza6YraTgcm/P/9+SI225DhlnZRYJvIupMVAuBC8DgLEUPYfCroj+XSdVexlWFWPVBP2zip7XRgIkLf5mepvyEji83FO7ZMWN1khQC4UEOJhYghTJwKEin4qSMmCijJMKu6uMGauBzMxRO2LD45SKsGK9MrFTSLzdU7tkwQNAQAcKOGEgsRQ5g4FSRS8FNHTBRQSlQTtgkzsIsxRDLQfQVAHBcFciLjQgwVCxFjUNW+vFOByCn9jc20FEwXMBFAKcOs6mKMiAkzsIsxRDLQfQVAHBcFMmM3LER8ElRlmghWnVqeT/7A1xJPWi5gqoBSULOqizEiNpuB+1vsk4krAIDEtUAx6mAh4pOgkfomzJuRcAjD82mbY38yq7pQ4rg7XDSBmzADJ2rsk6krABLVAsWog4WIT4JuEqbMm8U5NCGSiGZV24rHsQmchozYJ5vr+vRkyaGKOdlXALgQcyAb3Vl3iQYLEZ8E3SRMpbCxWdUdXIwRMYGMDCUTrlJdlpzxQ2jptlT6c92W7kRfNOZh3Sf9x1VtIyxEfBJ0QzeVwuaiWTXoCczm025vfLy3ltTOphgRE8jIUDLhBnM1i81FlyF1HV1deego0bdyWzWaiWl3srLuEg0WIj4JuqGbyP+XgYl+U09WG/fW9fD77lWxjcY8fFxFG+vmNuLlLhL5cJc9J0QZGUom3GCySnjrxkWXIVUgtMWA5Zs6BwHv6eUm4a7IyrpLNFiIGMBU/n9QTPW7uZf7WDqyeV9dtyZsF09wyzfvB1XLlRi48XPLvga0GBBAPRE0Q8mEq1TnLd4ycdG9KyL6nlxV2enf+w/T1rL05LDUrLtEgoWIAUzl/wfFVL9LB2aQ2kVj6NaE7eIJ7qlVO8ltL51cKu19RUzLjyzfJu19TbOrhnbqlekq1XmLt0xcdO9OLc8nV2Ta2UX0VTfQhMjMMYVWZd25BAsRA5jK/w+KqX5fMrmE3LY7E7aLd1tUEl1g4RAwXeBG1L4QOTmu3FYt7X1NYuIKgDgySnjrJmjMlokMlEg4hDFFtANNapcMI2rNm/Rkew4yHXEh44eFiGK686Wbyv8Piql+Tx9VQJ6o3fXRVGBwkAU7NUL7i0cXZUk9hU0tzwf15Rqa26S9b1CCjLWJKwBkYCoIO0jWjEnRd8xgWvZQV5eSy+nKJsdbBBYiPgniS48R/Ruy8/+DYqpuQSQcwuThudJeTxdBFmzqWOdlJAv1qS8i4RAmE29XzbcoBiDIWJu4AkAGpoKwqbFUu7sJ8jQp+lx0KQXFFZFtzy7nGEF86dSFYdCANKE+UdCh7mXXLQDcNGEHCZKlBuhS24kwlBiTY9OCHWSsTV0BEBRTQdjpybT4r8rqxqOCx10VfaYIul67Mt4sRHwSxJduMoujPxcjso0gQbIuBtiaJMh4uXoFgKk5MrEkl9SuLXZ0/RNXRZ8pgpYucGW8WYj4JIgv3eQm42IqK+CmnzZImiNXVRVDR0qpbVcAmArCniEQHN01eNxV0WeKoKULXBlvFiI+CeJLN5nF4epJ20VLThCftIs37wLmIvQT0f9vKgh7ank+qKvDzmpa9ldXbBN9pghauoCK6fFmIRIA/7506sIgf9UMcnI0WSrdVUsOhe7utzD59/q1PrkSod8Vk/PaRUtfJBzCyKJMUlubDgYujnXQ0gWuwEIkAH4ntsmS49TD0e6aozdHk/0OYskxVU4/yKVmJi1Xfq1PJiP0g2wyJue1i5Y+AMhKowWs2nQwcHGsg5YucAUWIgHwO7FNnnaDbI4m++3XkmOynH6QS81Mnt78fs4mI/SDbDIm53WgbB+Dhar8CmW2qorhaukCUViIBMDvxDZ52g2yOZrst98YAJPl9INcaka9yE7FhXd+P2eTEfpBNhmT89rve5t2g/k9GLhoVTV9e3eQ0gWm+06FhUgA/E5skxkRIpvjG5urOv3bxVLpJsvpBwnqMylE/G4yJiP0g4gJk/Pa71iLuMFU4Pdg4KJV1fTt3a66HUVgIRIAvxPbZEbE1PJ88oe+totLw1SUfhBMltMPEtRXVUu0LlCvbxVAdQaKigh96rPY0k06pMl57XesRQS2ioqZfjdHF62qpl06rrodRWAhEgAXJ3YkHMLwfNrC1PX9XcwuMFWWPo6foL6Wthj21dPucaFWuRTBxeyCMNHCsqbyULf1FlxDRGCrqJjpd3N00apquuSBq25HEViIBMDFUwEAFOfQhEjX9+/P2QUqytID/j7rR5ZXkF9/ArGWjQguZhdQT/1tMWD5Jro1wVaoa4+qipl+N0eT1ifqmHUNAjYtnlx1O4rAQiQAfhfsZuJDrGpy+HUpuZhdYDpYy89Yr6ygFyY6dVSRcJ/6whVzbkdEYp+eXFWpsCdq6O4WbwqqKmaaPkz54UB9K6ndmsrOGYOmXdJBaj+Z7jsVFiIB8LNgR2MePiCmk6qbHNRd1x5Ljt/3Nh2s5cd919BMm1dJYTWnXepY1xP7qYOp5fnk8n87u7g1TIpVv7d4U/tCtX6K4j+gmfb6KsY6I5U2r7vekWMyTfoI/tZrl2AhEgA/m6PpaHfATUuO32BE06d7P+67WIyWCVNekKHktFtIvPV5Sw/3W5ggEg5hTBGt0nFql1ggk2LV7y3epgW2383RZL+njKCL9ng6vek0acBNV6koLEQC4CcN13S0O+DfkrN6hzlLjt9gRNMmZD+LCDUlN0ty3ZM4JXluxluMGZRNatf1uTUpVv3e4m1aYLtYzHHu9BHktvFgYJPVguOY/qx1wEIkANT02o+rPrui2XS0O+Bvc16+eT+oFStUPJB+gxFNB2v5WURM1hAB3I238JsWX9dEy1BSIVb93uJtWmC7WMwxJSmMsnyatS9uoTRZLThOkDEzHSNHhYVIAPxc0RwjmrKTFUW7A/78u0+t2kl+fRUPpP/N0dwFg4C/RYRaG0RFDREgWLyFSfxa+rZU1ZN+T4VY9XuLt2mB7XdzNB2oX5xDc9/FLTkmqwXHCRKsunEPzYrNBc0cxs8VzdQPvLwgU4n//wji/t1K4sVxkTCUPJB+N0fTvnQ/iwi1NoiKGiJAsHgLk/iN2aIatEvyaGMiir9bvM0KbOq83nu4qf3/Tbt3AXGxSj04TlCUnXQEf/E40ZiHj6to67Yq6yoVe1YRBxG5ojleLp36IGQSI7z94Me/mxqhTZVRhVlKHki/m6N5/6r4ImJDQTE/8RamzcB+YrbeEohxUWV69/N5mxbY1Jitjtk+pt27gLhYpY5fCnF99EOQ27Cpj1qJovGmwkIkANNHFZDPG/Fy6aZ9u4C/zZlqKj1mUJavPpFeezDNhN3xtGZ6vFUGq6o8xfiJtzC9Ofrp8wc7D5F+R1WqNOBvjpgW2H6yfUy7dwFxC6XpcRZ57d21TZ3+LRLfcunkUqE+yYaFSAAi4RBGCJZLN3nhXRx/m7NZU/ARxK0LpsfbxWBVwF+/TS/aft6/sZUWqFqal67M9O6n36YFtp9sH9Pu3SOIrSGmxxmgu2C3H2jolDFIjW9JCgPTR9M/TxWwEAnI4GxaFHbctWHywrs4fjZn06fdI68tfnI0Pd5+ioOlRIjBqsR2fvCzAJtetFX2uZj4nPvBT79NB6v6yfYx7d4FxNcQ0+MMABNLckntYl7njEFqfMsJpbkK41tosBAJiGgJXdOnRsBf2rEN/XbxlE41BW/dX99+mmmJ0hYQajs/+BGrphdtF/sM+LXambVQRsIhTCqhxRHlfdpv6t85pkide1d0PbChRPoMAWtFx4xBG+JbqJjvgeOITmzTp0bAX9qxDf3204fdxPLMqvpNDeprjX5WVrq+mXYnhkqLiB9LEvU+FFWLtj/rl3mXo4vxOACQSnQZxPtt2joJiK8hpgOwAf8Zg6YPYSIoEyIVFRW49tprUVZWhvT0dIwcORJ33HEHWlrcLUPbHaIT23TMAuAv7diGfosGmrW0xbC7ljbfVJ14RYL63tqyD9GYh4rqpr4bQ61FRHQRa2mLYet+szVF/Cy8Nmzofvq9u4Y21jZZKG3YGEXXEBvmh9+MQRsOj1SUCZGPPvoIsVgMDzzwANavX49f/OIXuP/++3H77bereksjiG7QNpwK/KQd29Bv0UCzR5ZXkF9ZVY0IkaC+Tw42YvlmeqR7cY49cQsiY60qNZMag9CxkqoNm6PoWEdjHioOmLX0ibx2vJ0dG6PYGmLD/AD8pdPbcHikoqYiEoCzzjoLZ511Vvu/y8vLsXHjRixevBg//elPu/2d5uZmNDd/pixra2tVdU8aohu0DRM7nnZMeSTjacc29Fs00GxlxYE+Wn6GqnTBqeX5iACkolme5wmlOJ5criqzQHwRs2Gsw2GaENm870g8TiQcsmJzFB3r5Zv3k7fTCcSqrX4QtS7YsDGKriE2zA/A30HQjsMjDa0xIjU1NcjLy+vx54sWLUJOTk77V2mp2dxmCi7GiPhJO7ah36Jj3UC8pl5ljYhIOISRRZmktvvrWlBZTUtxBIC508v8dqtPRBcxG8aaes17x+wCG4rHiY61iFg9dVSRrz7RELMu2LAxiq4hNognwM1AfRG0CZHNmzfjvvvuw3XXXddjmwULFqCmpqb9q7LSngu1eoJqDo63s2HhA8TTjm3ot6gYisVoQbnlBRlK09ey0miGx6a2KJpaaYtCfkYyUhSWVxdfsJNJ7U8oVVcKW+Sa93h2ATVwm9rOD6JjbUc9DnHrgg0bo6j7zgbxBIjvM4Adh0cqwivZ/PnzEQqFev366KOPOv3OJ598grPOOguXXXYZvv71r/f42qmpqcjOzu70ZTui6V17ammBiCoXPkC83zYs2KKnE2pfqELBLyKLCDUTZlie2pLM4vEWtH4PyVHXb5Fr3uPZBTYs1tTX3lNzxG1tQz0OwE1rMNV9t3X/keJgNognwF8asS3WHArCK/B3vvMdXH311b22KS8vb///Xbt24bTTTsP06dPx4IMPCnfQdqiVGRtb26wJMgPEF5G6JlpKqcp+i55ObFj4ALFFxIYaIoD4gm1DdkFKUhiDB6Rgz+G+50lc8NmwWBcOoFknd9U0oaUtRrY65mXQrFR+EU+FNW9Vpbrv2mJH0uk7Bjb3hm3rNWCPNYeCsBApLCxEYSEt2OyTTz7BaaedhhNPPBEPP/wweXFziZY24sbR5lkTZAaIndKjMQ+biLc4qkyopz6Mm/fVfdoV8wsfILaI2FBVFRBfsG05OZYXZmHP4eo+28XFoQ2LdYmAdeuR5dussE4C6iyUKvs9ZUQ+lm6oIrV9Y3MVtlTVk9qqLHgH+HPN2PJMUlCmDD755BPMnj0bw4YNw09/+lPs27cPe/bswZ49e1S9pRGo16DvO9xsUZCZ2Cld5BbHzDR1pzDqnQtVh1vQ0hazYuEDxBYRWywiIvEWb2yussb6JGKhBOyoxyF6b4stY+2ihVLEfffBjkOkbDdAbWVVkdfv2M6G8aaiTIgsXboUmzdvxssvv4ySkhIUFxe3f/Un0lOIm2NdC3YQMyJUB5kBYmpZ5Kr0KWU9Z0UFhXrnAnDk5GjLgyiyiNhiERFZsNdW1ljh4gDELJTRmIftFrhKRSpn1je1OmnpA4D9h2ljrbLfKUlhFGfT5iB1vQbU1caJIyqwAXsswhSUCZGrr74anud1+9WfENkc9x+mBaqqDjIDxE7p1KvSAbUppSJ3LqzcVm3NgyiyYNtiEUlJCqMwkyayG1vbrHBxAHQLZWpSGCu2HgDVFqbS9B4JhzCaWDkToZA1lj6RINtozMOmfTQhonpuF+fSREN9C23zB9TVxokjIrDjbNlX10vLz1A9Tyj0v6ANzYhsjjUNtEVYdZAZIHZKp6rxogEpSlNKRU+OzW3EBZvYzi8qsmZUW0QAoCibtmC3tNmTXSAiREQsfaoq78bJTKWLPlssfSJBtss27iO7d0sUWxeoz2OUeHttkgYLNnVeN7R8duVCVZ35BAMqLEQCIrI5NhOVvg6FKmLqo07UUYXqbs0EfJwcLREiLmbNAGKbui2bo8hYi1j6bDrx2mLpEwmy/dnSj/pu9CmXTlZbyJI6R6hCpHSg2jpEAJCWTHtuKg8eyWITuXJBdWIEBRYiAYmEQxieRzsZUOeqDoXq4sIHAFnEYNimtihSkojWBWI7v4iIPhtu3o0jIkRsmSMiY01tqyNmS2SsbRHYIkG21IrB4RAwXcDK7Afq504VIhkp6tdrqvWp7dOLSkWuXFCdGEGBhYgEMlNpmyNViOgoMEPdyBpa2qxZ+AAxNwc1y4bazi9U0bd9fwO2W3DzbhwR64ItcQvNrcR+tMaELH3KY7YcFCIi1uBWYl+G5qQpH2vq80hF9UEGELM+vbG5yoorF0RgISIB6qZOFNhaCswQ12vsqG60ZuEDxDbHrcRgLdWndKpZtbqxjexHp25cQVDhvlNt7WslCp3WqD2FwQA3M6tEXKUh4swuGKD+EEZ9dqiCSPVBBhCzPq2trLHmegsqLEQkQD2dUg+x1PtGgkC9Qj7qAU3E6HEd7gLq5ljf3EKqsAmoty6UKgh0VPGaXRFx39mQmgkAHnHDO9xsl6VPRPTtI1an1WE1o7pKQRRaOjZ1qhCh7s86XNJTBW7abmylz+0sYpC0aliISIC8ARMnrI6Fb0oZfWIfbKTFLehY+KibY/xeDgqqo/QvmVwi/TVVB/QB9HmdHIY1qZlpybSNsaq2xSohQp3Xza0x7DxEm9s6rGZUV6lNmzrV+kS1YOuYH5FwCCW5qaS2LW2eVXObAgsRCcheXHX4HEUKVjVaZBGhvkcj1fcE9Zv69FFyg+90BPQB9Hld3dBqTWom1dIXA9DU7N68biZaTgA9VjMXN3XZwaq6NvPCLJoQSYnY476jwkJEAtQPM2KReVKkYBV1EdFiEaGmQBMXhzDUb+qRcAhjqGnHBHQE9AH0eV1LvBARUC/6RCx9VfX2uDio77GPWBsC0GM1o9aFsWlTp1qfqJ+6rs1cJLXfpjIAFFiISID6YcYsSXGMQy1YZdMDSX0PanDc8Px0LZs6tWAVhdRkuxa+emKEvg5Ljoilj5rJYdO8psaP6bKayXbN2DTWxHhPbZu5iJWDLSIJCPXDpGYu6jL1ufhA0kUf7fUyiHcFBUVmymCInDQZDGqMAdULpsOSk5IUxsB0WmaOTZY+6hi2WDTWgMhlb7R2No019ZFli0hwWIhIgPphtlrkJxV5H/IDqSG2Rb7oo/vcgyBzsUomnkKDQr3QkYrqq9I/ex+aL51qeNSx0VBjW6joGmtqvEUb2aWq/nmUPdZsEQkOCxEJyI5Ol/2g9AS15gIVHbEt1NMM+fpuTdYFmeKyJao+vRsQu9CRgq4Fm5rCS+2Ojn6LxLZQ0DXW5HgL4mOm43mUPda61mu2iDC9IvvkeLJAzngQqAs2FR13Fsh+6HVZF2SKPl3iSeRCRwq6Tl8pEaJrhvh6Ok7pIrEtFGyz9FGnv47nUfZY61qv2SLC9Irsk+Pc6WVSX68nUpPkCigddxbIPs3oqLUAyBV9usSTSBElCvpO6XItRjnpNFdPEERiWyjoEqvUz5T6yVOrEAchJSmMnFR576NrvRapDbKdeLcPW0T6ETJPjvkZyUjRtDnKVMPhkPqLwQD5pxldD6JM0Sd7o+2JSDiEoTnyYg10nb4aJcdYnXHcIKmv1xPU2BYKusSqbCGvo/YJAAyVVM8mNz1J23pNtaruPNiA6gaaRUzXQawv7OiF48g8OQ4TuNwoKDLjFooGpOipbSH55GibCZuC7I22N2RmFekSfemSF9drZpRLfb2ekGk107XByHZL66h9AshzlQ7RFB8CACHi+lrbTP/bdAm/vmAhIoFIOITBki5r0mkqkxm3MEDjnQUyT44uBqumazBfx5G5OWoTfRIv1tNroZTXb13riEy3dAh6ap8A8ua1LuskABw7WH4Mni7h1xcsRCSRlSZnEdG1WAOgpvaT0GUKBtyMt2ijFq0gMK44W9pr9YXMzVFb3ILEzWFghj6BLfNgIFIKPggy3dJDc/XUPgHkzWud1slLT5R7Z5VO4dcXLEQkIWti61qsAaBogDw3kM6TgczNUUdwHABkp8u7Sv6yE4dJe62+kLk56hJ9TRKtATo3GpkCW1e/Zbql0zRVDAbkrVey3YC9IfvOKp3Cry9YiEhC1oKt07Ig06wqcslcUGSKHl0+0jPGDZbyOrpPMTI3R131TwYNkOe317nRyBTYutx3kXAIg7PkiGydh7AGYqn8vpDpBuwL2cHjaRoKUFJhISIJF32OMs2quu7HAeQtIoA+H+nVM+Sk+A3SFBQcR+bmqCMNFgDOPE6O6AP0bjQyn32d7rs8STFbOg9hsvS1zvUakCswdVr7+oKFiCRc9DlKrROhUVzLOjnptC6kJIWRmRz8ccvL1FO6O47MhVZXGqws0Qfo3WhkPvs63XeyxkjnWNPvyOkdnZZgQO4hTKe1ry/s6YnjuOhzjIRDGCTJrJqhMZMjXcKGDuj3kRbnBD856j6BydwcdaXBpiSFkS7J7Kxzo5H1DOl238maIzrHWtZt2LKrU/eFTPeVTmtfX7AQkYSLPkcAiEiqb6Gz37LeS7ePVEa/dZ/AMiXVichKCWtLgwWAgRlyBLZOl2NRtrwiWzoFtqzDk86xPrZYTiqsrtu748g6hAH6DzW9wUJEEuGQnKHUlcXRjqRnf0iuvkJsslJhdW/qMh58nYs1IG/Bzk7Tu2BLQ6NWlRU8LnOzomDTyZqKrFRYnbE4gNyx1r3+9QYLEUkMypYTqW9LpTtRdF38BMhLhdW9qcv07+pCdu0C19B54pV9yaAupJ2sNYo+WamwOmNxALlWDN3rX2+wEJGErEh9WyrdiaLr4idAXiqsdux57snIrl3gGjpPvLIvGdSFLIE9WJJrikIkHEJOavDtT3dBMKmHGXuyd1mIyEJGpL5Nle5ESImEtPr/ZWZFOIfmxSMSDqEo0z23iqzTns4Tr6zNUTey3NJnjtd7wAgaR5ORHNZeEEzWWAN6hV9fuDfrLUVGpL7uIDNZZKXo9kmHobEIo1XoDo4DgDaLTLhUZNy1YuJg4OLzL8strSurKk7QKWKiIJissQb0C7/eYCEikaCR+mkG8rrD4eDvaeIq6QEy7vbRvI7IGGvdwXEAILHKuzaSJQT15aTpPxhovPNSGjLc0pmas6oAIBzQV2oixkJmsT7dwq83WIhIJOjEbDaQTiVHYes/GaRIqMKo2zQpY6x1B8cBgJQMb81TZGRhZuDXCIf0bzQujrUMV2m2xtu7ZRHz9K97stzSGcn6hV9v2NOTfkDQq95NnDxlKGwTAqpFwmDpNk3KGGsTMUQyRJ9ul9I3Zo4M/BomrBMuCuyUpDCCemdNrCGtAYVE1NO/YMsYa8Cue2YAFiJSCVrfwsTElqGwTSzYMmqJ6DZNBh3rrBT9wXGAHNGn26V0yujCwK8hqdafEK0S5rUJ3396SjBXmIk1JCugFSYsqUy8KHkSivXZ5gJkISKRWEDXjImJLSPwU+ddVe0EHKvUCLSbJlOSwgiyXEcMLXyehEh93S6lSDgU+ORoIgZARmyLCd9/0HgLE6LvtGOKAv1+kcTAURFkHAxMjHdvsBCRSHIkmMI2NbGzAqYMDs8L7o8XJS8j2L0taYaqQWYG2B1NWMwAoDAr+EV7JlxKSQGtRzEDsU9BY1tMCGwg+CVyJkTfwvOPC/T7lxgq9ifDGmxTMTOAhYhUCgcEW7BNTeymgLEtZx1fLKkndGaMClb8ydSmHuR9o5JK24ty8eRg8zIlYiYtNeiFZJmp+sVq0NgWU+m/QTdHE6IvqDvp2lOCxyH5IRoLvnY1W+abYSEikaALtrGJHXBOmjAF/+C8YKcZU5t6kDXE1Nrx1VPsSfMTIeiefOKwgXI6IkDg2BZDcyTo2xYOCH4ztR+CTBFTWSehUHCBnGzEn94zdvXGcYIs2GGYm9hBTNjhkJl+Bz3NBDUl+yUW4H1N1JkBjny+Qd7ZlBU46IL9pZOGS+oJnUg4FChmK2Qg5RgAMlKCBVCasga7SJaECySPHTxAQk/kwUJEIkE2ZJMCNUhqpcmgpyCXjKYaSl8LMtYmipnFcbDgZ+AF28XrFgwZ+jCqKFhsiylrcNA4IhMEDbIFgG/OGi2hJ/JgIWIJJqtXDgsQbGqy30FO2iPys+R1RICxAU4iJhePIBuc7mvp4wRZsCMhc/EWQVxwMq+JFyFIbEsI5qzBRdn+XEJpBnfOoEG2AHDKMcHT22XCQkQyLg7oWRbdOSBEgH3CRIAtAFw/y/+CbXLxCLInm7LkBFmwTSYVRAKYGU2Z3IPEtpgMm7ziZH/ut2kj8yT3hE5Qt7RJkd0TLu6bVuNX2QdNoQ1CkEJbJvsd5I4bU/csBFmwTS4eaQEWP1OWnKALtinyM/1n35kaa9s2NipfO9XfOnDfFZ+T3BMxgoy2ZZm7AFiISKfcZx2AM483Z5UIYhY12e/h+f5dSqZMwa4u2McPzfH9u7aZgSmYvPokiEvJxbEOXifUP74PjhICRoNgoZYIhNLV+IILLsCwYcOQlpaG4uJifOUrX8GuXbtUvqVxbjtjrK/f+88LJkjuiRh+t0eT/fY71qbx89CZvhIsiEvJpPjyu8BNCCC8ghLEpWRyrJN9vvfIogzJPRGjWDBOZIDphxHBAvUty9wFoFiInHbaafjrX/+KjRs34qmnnsKWLVtw6aWXqnxL48zyeSIxbUZO8emXNtlvv2NtmgIfhe9OGaW/pkWn95dwd4sJ/BYlu2H2GMk9oWN6LfCL38DP7505TnJPxPjHjacKtV962+mKekInK82/Hal0oJkK3r2hVIh8+9vfxtSpUzF8+HBMnz4d8+fPx4oVK9Da2qrybY0SCYeErQs2GOuPHWIuNdQvfk9/E4r1l6TvyN0XHi/8O7++coqCntDxO9ZphvfUs30GJbvo4sgxUxOsHb+BnzOPDZ6OGoTC7FTyHVBpSWEMzjW/kQe5yfuyz+m994mCNiNNdXU1/vjHP2L69OlITu5ezTU3N6O2trbTl4ucOkqs/sCNpwe/ATcoj1xzsvDvZJl07n5Kuo/qT3/4+gwFPaFz+rhBwr9j2icNABk+7MHnnWAmOynOXReM9/V7pmN5ctPFP++vzRqloCcC7+8z8NP0WAPAlnvPJbX76O6zFfeExh0B3Hemarb0hnIh8r3vfQ+ZmZnIz8/Hjh078Pe//73HtosWLUJOTk77V2lpqeruKeG3V54o1P5bp5mPdcjxcbX0zy6epKAnYpx//FDh3/Hzt8okEg7h9rPpi8FLt8xS2Bs61/mIE/mvCycq6AkdP26OEbnmFfa9F4vHXl0302yRKj+Bn4Xp5kVInIp7z+11Q6wgihUdBHHfmQrU7w3hHs2fPx+hUKjXr48++qi9/Xe/+12sXr0aL774IiKRCK666qoeb/5bsGABampq2r8qKyv9/2UGyUpLwoQSmqvjupll1kyMGSPFLpKbM9HsaRcA7vqi2Ik337D5Os43ZtHEZwjAqMFmiq915ZuzxU7cYdgR73C8oNvx6Rtnq+mIAF8QNL0nwY4N5uIThgi1/+e3zcdbdGTrvefi1VtnI/XTmLnUSAiv3jrbKhESxx4JF5yQJ3gf8L59+3DgwIFe25SXlyMl5eiAvJ07d6K0tBTLly/HtGnT+nyv2tpa5OTkoKamBtnZ7sUwXPDrN7B2Z8/upetmlmHBOWYDtTpS19SG8Xe+QGp7x3njcM0p5l1KAHDZr5bi3V0tpLbv3j4HhT6D6lQwYv7/9fizEIBtli2AN//5bfz9g2pS2xXzT7fCn17T0IqJ//kiqW1OWhI+uPNMxT2i8b+vb8Td/9xMamvLWDe2RHHswudJbTOSw9jwX3a4OlzkgvvewNpPxMIXZo3MxSOaXNMi+7ewhC4sLMTYsWN7/epOhABA7NOrR5ubm0Xf1kn+ceOp+PDOMzGxi6HhPyaX4uO7z7ZKhABilhxbRAgAPHHTF0jtstOSrBIhwBFz77M3nNLpe5HQEXeMbSIEAH75//o+QABHyrrbsDECR1xxw/PTSW1tESEA8LWZx5Da2TTW6SkRfGEcLfiURUgw/nDtVOHf+c1XxGMBdSBsEaHyzjvv4N1338Upp5yCgQMHYsuWLfjhD3+IvXv3Yv369UhN7XtDcN0i4ip9WXJsNFMCvVsXstOSsNaiTcZ1ehvr9OQw/m3hJjPrJ69g+4HGHn/u4ry2day//ui7WLqhqsef2zrWrnHc/P9DPbHtyIIMvHzbaUr70xGR/VuZEFm3bh1uvvlmfPDBB6ivr0dxcTHOOuss/OAHP8DQobQAQxYi5qhrasPXf/cG3q5sAHCkgM6/bpplTbxCT6ypOIQv3v9W+78Ls5Lxz5tmWWcJ6Q+s21GD83/7Zvu/c9MieP6W2daczrujpqEVX/z5i9hW99n3/vq1aZgyytzdIRQ27KzFOb9+o/3fLox1Y0sUNz36OpZubmj/3p+uPhnTx7p3q7HN9CZUO6Jb/FkhRGTAQoRhGIZheqcvMWLCAiWyf5svTsAwDMMwjG8q7j0Xyz/aj/9Y8k6n77tg7QNYiDAMwzCM80wfW+Bs7I35xHOGYRiGYRIWFiIMwzAMwxiDhQjDMAzDMMZgIcIwDMMwjDFYiDAMwzAMYwwWIgzDMAzDGIOFCMMwDMMwxmAhwjAMwzCMMViIMAzDMAxjDKsrq8avwamt7fkmWIZhGIZh7CK+b1Ous7NaiBw+fBgAUFpaargnDMMwDMOIcvjwYeTk5PTaxurbd2OxGHbt2oUBAwYgFApJfe3a2lqUlpaisrKSb/btBh6fvuEx6h0en77hMeodHp/esXl8PM/D4cOHMWTIEITDvUeBWG0RCYfDKCkpUfoe2dnZ1n2ANsHj0zc8Rr3D49M3PEa9w+PTO7aOT1+WkDgcrMowDMMwjDFYiDAMwzAMY4yEFSKpqam44447kJqaarorVsLj0zc8Rr3D49M3PEa9w+PTO/1lfKwOVmUYhmEYpn+TsBYRhmEYhmHMw0KEYRiGYRhjsBBhGIZhGMYYLEQYhmEYhjEGCxGGYRiGYYyRkELkN7/5DUaMGIG0tDScfPLJWLlypekuGeP111/H+eefjyFDhiAUCuFvf/tbp597noeFCxeiuLgY6enpmDNnDjZt2mSmswZYtGgRPve5z2HAgAEoKirCF7/4RWzcuLFTm6amJsybNw/5+fnIysrCJZdcgr179xrqsV4WL16MCRMmtFd2nDZtGv71r3+1/zyRx6Yn7r33XoRCIdxyyy3t30vkcbrzzjsRCoU6fY0dO7b954k8Nh355JNPcOWVVyI/Px/p6ek4/vjj8d5777X/3OW1OuGEyF/+8hfceuutuOOOO7Bq1SpMnDgRZ555Jqqqqkx3zQj19fWYOHEifvOb33T78x//+Mf41a9+hfvvvx/vvPMOMjMzceaZZ6KpqUlzT82wbNkyzJs3DytWrMDSpUvR2tqKM844A/X19e1tvv3tb+PZZ5/FE088gWXLlmHXrl24+OKLDfZaHyUlJbj33nvx/vvv47333sPnP/95XHjhhVi/fj2AxB6b7nj33XfxwAMPYMKECZ2+n+jjdNxxx2H37t3tX2+++Wb7zxJ9bADg4MGDmDFjBpKTk/Gvf/0LGzZswM9+9jMMHDiwvY3Ta7WXYEyZMsWbN29e+7+j0ag3ZMgQb9GiRQZ7ZQcAvGeeeab937FYzBs8eLD3k5/8pP17hw4d8lJTU70///nPBnponqqqKg+At2zZMs/zjoxHcnKy98QTT7S3+fe//+0B8N5++21T3TTKwIEDvf/93//lsenC4cOHvdGjR3tLly71Zs2a5d18882e5/EcuuOOO7yJEyd2+7NEH5s43/ve97xTTjmlx5+7vlYnlEWkpaUF77//PubMmdP+vXA4jDlz5uDtt9822DM72bZtG/bs2dNpvHJycnDyyScn7HjV1NQAAPLy8gAA77//PlpbWzuN0dixYzFs2LCEG6NoNIrHH38c9fX1mDZtGo9NF+bNm4dzzz2303gAPIcAYNOmTRgyZAjKy8txxRVXYMeOHQB4bOL84x//wEknnYTLLrsMRUVFmDRpEh566KH2n7u+VieUENm/fz+i0SgGDRrU6fuDBg3Cnj17DPXKXuJjwuN1hFgshltuuQUzZszA+PHjARwZo5SUFOTm5nZqm0hjtG7dOmRlZSE1NRXXX389nnnmGYwbN47HpgOPP/44Vq1ahUWLFh31s0Qfp5NPPhlLlizB888/j8WLF2Pbtm049dRTcfjw4YQfmzhbt27F4sWLMXr0aLzwwgv45je/iZtuugmPPPIIAPfX6iTTHWAYV5g3bx4+/PDDTv5rBjjmmGOwZs0a1NTU4Mknn8TcuXOxbNky092yhsrKStx8881YunQp0tLSTHfHOs4+++z2/58wYQJOPvlkDB8+HH/961+Rnp5usGf2EIvFcNJJJ+FHP/oRAGDSpEn48MMPcf/992Pu3LmGexechLKIFBQUIBKJHBVxvXfvXgwePNhQr+wlPiY8XsCNN96I5557Dq+++ipKSkravz948GC0tLTg0KFDndon0hilpKRg1KhROPHEE7Fo0SJMnDgRv/zlL3lsPuX9999HVVUVJk+ejKSkJCQlJWHZsmX41a9+haSkJAwaNIjHqQO5ubkYM2YMNm/ezHPoU4qLizFu3LhO3zv22GPbXViur9UJJURSUlJw4okn4uWXX27/XiwWw8svv4xp06YZ7JmdlJWVYfDgwZ3Gq7a2Fu+8807CjJfnebjxxhvxzDPP4JVXXkFZWVmnn5944olITk7uNEYbN27Ejh07EmaMuhKLxdDc3Mxj8ymnn3461q1bhzVr1rR/nXTSSbjiiiva/5/H6TPq6uqwZcsWFBcX8xz6lBkzZhxVNuDjjz/G8OHDAfSDtdp0tKxuHn/8cS81NdVbsmSJt2HDBu8b3/iGl5ub6+3Zs8d014xw+PBhb/Xq1d7q1as9AN7Pf/5zb/Xq1d727ds9z/O8e++918vNzfX+/ve/e2vXrvUuvPBCr6yszGtsbDTccz1885vf9HJycrzXXnvN2717d/tXQ0NDe5vrr7/eGzZsmPfKK6947733njdt2jRv2rRpBnutj/nz53vLli3ztm3b5q1du9abP3++FwqFvBdffNHzvMQem97omDXjeYk9Tt/5zne81157zdu2bZv31ltveXPmzPEKCgq8qqoqz/MSe2zirFy50ktKSvLuueceb9OmTd4f//hHLyMjw3vsscfa27i8ViecEPE8z7vvvvu8YcOGeSkpKd6UKVO8FStWmO6SMV599VUPwFFfc+fO9TzvSFrYD3/4Q2/QoEFeamqqd/rpp3sbN24022mNdDc2ALyHH364vU1jY6N3ww03eAMHDvQyMjK8iy66yNu9e7e5Tmvkq1/9qjd8+HAvJSXFKyws9E4//fR2EeJ5iT02vdFViCTyOH35y1/2iouLvZSUFG/o0KHel7/8ZW/z5s3tP0/ksenIs88+640fP95LTU31xo4d6z344IOdfu7yWh3yPM8zY4thGIZhGCbRSagYEYZhGIZh7IKFCMMwDMMwxmAhwjAMwzCMMViIMAzDMAxjDBYiDMMwDMMYg4UIwzAMwzDGYCHCMAzDMIwxWIgwDMMwDGMMFiIMwzAMwxiDhQjDMAzDMMZgIcIwDMMwjDH+P9jHwU1JkL0cAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from myst_nb import glue\n", "\n", "n_cycles = 10\n", "n_samples = 10000\n", "amplitude = 3\n", "phase = np.pi / 4\n", "end = 2 * np.pi * n_cycles\n", "x = np.linspace(0, end, num=n_samples)\n", "y = amplitude * np.sin(x + phase)\n", "\n", "chosen_idx = np.random.choice(n_samples, size=100, replace=False)\n", "data = pd.DataFrame(np.nan, index=x, columns=['raw'])\n", "data.iloc[chosen_idx, 0] = y[chosen_idx]\n", "\n", "# plotting\n", "fig1, ax1 = plt.subplots()\n", "ax1.set_title('Raw Data')\n", "data.raw.plot(marker='o', ax=ax1)\n", "data['lin_inter'] = data.raw.interpolate(method='index')\n", "fig2, ax2 = plt.subplots()\n", "ax2.set_title('Linear Interpolation')\n", "data.lin_inter.plot(marker='o', ax=ax2)\n", "data['quad_inter'] = data.raw.interpolate(method='quadratic')\n", "fig3, ax3 = plt.subplots()\n", "ax3.set_title('Quadratic Interpolation')\n", "data.quad_inter.plot(marker='o', ax=ax3)\n", "\n", "glue(\"fig1\", fig1, display=False)\n", "glue(\"fig2\", fig2, display=False)\n", "glue(\"fig3\", fig3, display=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`````{admonition} Exercise: Missing Data\n", "* Create a vector of 10000 measurements from a 10-cycle sinus wave. Remember that a single period of sine starts at 0 and ends at 2$\\pi$, so 10 periods span between 0 and 20$\\pi$.\n", "````{dropdown} Solution\n", "```python\n", "n_cycles = 10\n", "n_samples = 10000\n", "amplitude = 3\n", "phase = np.pi / 4\n", "end = 2 * np.pi * n_cycles\n", "x = np.linspace(0, end, num=n_samples)\n", "y = amplitude * np.sin(x + phase)\n", "```\n", "````\n", "* Using `np.random.choice(replace=False)` sample 100 points from the wave and place them in a Series.\n", "````{dropdown} Solution\n", "```python\n", "chosen_idx = np.random.choice(n_samples, size=100, replace=False)\n", "data = pd.DataFrame(np.nan, index=x, columns=['raw'])\n", "data.iloc[chosen_idx, 0] = y[chosen_idx]\n", "```\n", "````\n", "* Plot the chosen points.\n", "````{dropdown} Solution\n", "```python\n", "fig1, ax1 = plt.subplots()\n", "ax1.set_title('Raw data pre-interpolation')\n", "data.raw.plot(marker='o', ax=ax1)\n", "```\n", "```{glue:figure} fig1\n", " :figwidth: 500px\n", "```\n", "````\n", "* Interpolate the points using linear interpolation and plot them on a different graph.\n", "````{dropdown} Solution\n", "```python\n", "data['lin_inter'] = data.raw.interpolate(method='index')\n", "fig2, ax2 = plt.subplots()\n", "ax2.set_title('Linear interpolation')\n", "data.lin_inter.plot(marker='o', ax=ax2)\n", "```\n", "```{glue:figure} fig2\n", " :figwidth: 500px\n", "```\n", "````\n", "* Interpolate the points using quadratic interpolation and plot them on a different graph. \n", "````{dropdown} Solution\n", "```python\n", "data['quad_inter'] = data.raw.interpolate(method='quadratic')\n", "fig3, ax3 = plt.subplots()\n", "ax3.set_title('Quadratic interpolation')\n", "data.quad_inter.plot(marker='o', ax=ax3)\n", "```\n", "```{glue:figure} fig3\n", " :figwidth: 500px\n", "```\n", "````\n", "`````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Categorical Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far, we've used examples with quantitative data. Let's now have a look at [categorical data](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html), i.e. data can only have one of a specific set, or categories, of values. For example, if we have a column which marks the weekday, then it can obviously only be one of seven options. Same for boolean data, colors, and other examples. These data columns should be marked as \"categorical\" to reduce memory consumption and improve performance. It also tells the code readers more about the nature of that data column." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The easiest way to create a categorical variable is to declare it as such, or to convert as existing column to a categorical data type:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 a\n", "1 b\n", "2 c\n", "3 a\n", "dtype: category\n", "Categories (3, object): ['a', 'b', 'c']" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series([\"a\", \"b\", \"c\", \"a\"], dtype=\"category\")\n", "s" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DataFrame:\n", " A B\n", "0 a a\n", "1 b b\n", "2 c c\n", "3 a a\n", "\n", "Data types:\n", "A object\n", "B category\n", "dtype: object\n" ] } ], "source": [ "df = pd.DataFrame({\"A\": [\"a\", \"b\", \"c\", \"a\"]})\n", "df[\"B\"] = df[\"A\"].astype(\"category\")\n", "print(f\"DataFrame:\\n{df}\")\n", "print(f\"\\nData types:\\n{df.dtypes}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also force order between our categories, or force specific categories on our data, using the special CategoricalDtype (which we won't show).\n", "\n", "As we said, memory usage is reduced when working with categorical data:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
00.066624a
10.854047a
20.544826a
30.464085a
40.405314a
.........
99950.823350a
99960.744166a
99970.059290a
99980.256323a
99990.298485a
\n", "

10000 rows × 2 columns

\n", "
" ], "text/plain": [ " a b\n", "0 0.066624 a\n", "1 0.854047 a\n", "2 0.544826 a\n", "3 0.464085 a\n", "4 0.405314 a\n", "... ... ..\n", "9995 0.823350 a\n", "9996 0.744166 a\n", "9997 0.059290 a\n", "9998 0.256323 a\n", "9999 0.298485 a\n", "\n", "[10000 rows x 2 columns]" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_obj = pd.DataFrame({'a': np.random.random(10_000), 'b': ['a'] * 10_000})\n", "df_obj" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ab
00.066624a
10.854047a
20.544826a
30.464085a
40.405314a
.........
99950.823350a
99960.744166a
99970.059290a
99980.256323a
99990.298485a
\n", "

10000 rows × 2 columns

\n", "
" ], "text/plain": [ " a b\n", "0 0.066624 a\n", "1 0.854047 a\n", "2 0.544826 a\n", "3 0.464085 a\n", "4 0.405314 a\n", "... ... ..\n", "9995 0.823350 a\n", "9996 0.744166 a\n", "9997 0.059290 a\n", "9998 0.256323 a\n", "9999 0.298485 a\n", "\n", "[10000 rows x 2 columns]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cat = pd.DataFrame({'a': df_obj['a'], 'b': df_obj['b'].astype('category')})\n", "df_cat" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index 128\n", "a 80000\n", "b 80000\n", "dtype: int64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_obj.memory_usage()" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Index 128\n", "a 80000\n", "b 10116\n", "dtype: int64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cat.memory_usage()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A factor of 8 in memory reduction." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hierarchical Indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last time we mentioned that while a DataFrame is inherently a 2D object, it can contain multi-dimensional data. The way a DataFrame (and a Series) does that is with [hierarchical indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html), or sometimes Multi-Indexing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple Example: Temperature in a Grid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, our data is the temperature sampled across a 2-dimensional grid. First, we need to generate the required set of indices, $(x, y)$, which point to a specific location inside the square. These coordinates can then be assigned the designated temperature values. A list of such coordinates can be a simple `Series`:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(r0, c0) 1.20\n", "(r0, c1) 0.80\n", "(r0, c2) 3.10\n", "(r1, c0) 0.10\n", "(r1, c1) 0.05\n", "(r1, c2) 1.00\n", "(r2, c0) 1.40\n", "(r2, c1) 2.10\n", "(r2, c2) 2.90\n", "Name: temperature, dtype: float64" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "values = np.array([1.2, 0.8, 3.1, 0.1, 0.05, 1, 1.4, 2.1, 2.9])\n", "coords = [('r0', 'c0'), ('r0', 'c1'), ('r0', 'c2'), \n", " ('r1', 'c0'), ('r1', 'c1'), ('r1', 'c2'), \n", " ('r2', 'c0'), ('r2', 'c1'), ('r2', 'c2')] # r is row, c is column\n", "points = pd.Series(values, index=coords, name='temperature')\n", "points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important we understand that this is a series because _the data is one-dimensional_. The actual data is contained in that rightmost column, a one-dimensional array. We do have two coordinates for each point, but the data itself, the temperature, is one-dimensional.\n", "\n", "Currently, the index is a simple tuple of coordinates. It's a single column, containing tuples. Pandas can help us to index this data in a more intuitive manner, using a MultiIndex object." ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiIndex([('r0', 'c0'),\n", " ('r0', 'c1'),\n", " ('r0', 'c2'),\n", " ('r1', 'c0'),\n", " ('r1', 'c1'),\n", " ('r1', 'c2'),\n", " ('r2', 'c0'),\n", " ('r2', 'c1'),\n", " ('r2', 'c2')],\n", " )" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mindex = pd.MultiIndex.from_tuples(coords)\n", "mindex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We received something which looks quite similar to the list of tuples we had before, but it's a [`MultiIndex`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.MultiIndex.html) instance. Let's see how it helps us by `reindex`ing our data with it:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "r0 c0 1.20\n", " c1 0.80\n", " c2 3.10\n", "r1 c0 0.10\n", " c1 0.05\n", " c2 1.00\n", "r2 c0 1.40\n", " c1 2.10\n", " c2 2.90\n", "Name: temperature, dtype: float64" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points = points.reindex(mindex)\n", "points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks good. Each index level is represented by a column, with the data being the last one. The \"missing\" values indicate that the value in that cell is the same as the value above it.\n", "\n", "You might have assumed that accessing the data now is much more intuitive. Let's look at the values of all the points in the first row, `r0`:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "c0 1.2\n", "c1 0.8\n", "c2 3.1\n", "Name: temperature, dtype: float64" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points.loc['r0', :] # .loc() is label-based indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or the values of points in the second column:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "r0 0.80\n", "r1 0.05\n", "r2 2.10\n", "Name: temperature, dtype: float64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points.loc[:, 'c1']" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "r0 c0 1.20\n", " c1 0.80\n", " c2 3.10\n", "r1 c0 0.10\n", " c1 0.05\n", " c2 1.00\n", "r2 c0 1.40\n", " c1 2.10\n", " c2 2.90\n", "Name: temperature, dtype: float64" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points.loc[:, :] # all values - each level of the index has its own colon (:)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `.iloc` disregards the MultiIndex, treating our data as a simple one-dimensional vector (as it actually is):" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.4" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points.iloc[6]\n", "# points.iloc[0, 1] # ERRORS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides making the syntax cleaner, these slicing operations are as efficient as their single-dimension counterparts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It should be clear that a MultiIndex can have more than two levels. Modelling a 3D cube (with the temperatures inside it) is as easy as:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "r0 c0 z0 1.20\n", " z1 0.80\n", " c1 z0 3.10\n", " z1 0.10\n", "r1 c0 z0 0.05\n", " z1 1.00\n", " c1 z0 1.40\n", " z1 2.10\n", "r2 c0 z0 2.90\n", " z1 0.30\n", " c1 z0 2.40\n", " z1 1.90\n", "Name: temp_cube, dtype: float64" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "values3d = np.array([1.2, 0.8, \n", " 3.1, 0.1, \n", " 0.05, 1, \n", " 1.4, 2.1, \n", " 2.9, 0.3,\n", " 2.4, 1.9])\n", "# 3D coordinates with a shape of (r, c, z) = (3, 2, 2)\n", "coords3d = [('r0', 'c0', 'z0'), ('r0', 'c0', 'z1'), \n", " ('r0', 'c1', 'z0'), ('r0', 'c1', 'z1'),\n", " ('r1', 'c0', 'z0'), ('r1', 'c0', 'z1'),\n", " ('r1', 'c1', 'z0'), ('r1', 'c1', 'z1'), \n", " ('r2', 'c0', 'z0'), ('r2', 'c0', 'z1'),\n", " ('r2', 'c1', 'z0'), ('r2', 'c1', 'z1')] # we'll soon see an easier way to create this index\n", "cube = pd.Series(values3d, index=pd.MultiIndex.from_tuples(coords3d), name='temp_cube')\n", "cube" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can even name the individual levels, which helps with some slicing operations we'll see below:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "x y z \n", "r0 c0 z0 1.20\n", " z1 0.80\n", " c1 z0 3.10\n", " z1 0.10\n", "r1 c0 z0 0.05\n", " z1 1.00\n", " c1 z0 1.40\n", " z1 2.10\n", "r2 c0 z0 2.90\n", " z1 0.30\n", " c1 z0 2.40\n", " z1 1.90\n", "Name: temp_cube, dtype: float64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cube.index.names = ['x', 'y', 'z']\n", "cube" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, you have to remember that this is one-dimensional data, with a three-dimensional index. In statistical term, we might term the indices a fixed, independent categorical variable, while the values are the dependent variable. Pandas actually has a [`CategoricalIndex`](https://pandas.pydata.org/docs/reference/api/pandas.CategoricalIndex.html) object which you'll meet in one of your future homework assignments (but don't be afraid to hit the link and check it out on your own if you just can't wait)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More on extra dimensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous square example, it's very appealing to ditch the MultiIndex altogether and just work with a dataframe, or even a simple NumPy array. This is because the two indices represented rows and columns. A quick way to turn one representation into the other is the [`stack()`\\\\`unstack()`](https://pandas.pydata.org/docs/user_guide/reshaping.html) method:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rows columns\n", "r0 c0 1.20\n", " c1 0.80\n", " c2 3.10\n", "r1 c0 0.10\n", " c1 0.05\n", " c2 1.00\n", "r2 c0 1.40\n", " c1 2.10\n", " c2 2.90\n", "Name: temperature, dtype: float64" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "points.index.names = ['rows', 'columns']\n", "points" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
columnsc0c1c2
rows
r01.20.803.1
r10.10.051.0
r21.42.102.9
\n", "
" ], "text/plain": [ "columns c0 c1 c2\n", "rows \n", "r0 1.2 0.80 3.1\n", "r1 0.1 0.05 1.0\n", "r2 1.4 2.10 2.9" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts_df = points.unstack()\n", "pts_df" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rows columns\n", "r0 c0 1.20\n", " c1 0.80\n", " c2 3.10\n", "r1 c0 0.10\n", " c1 0.05\n", " c2 1.00\n", "r2 c0 1.40\n", " c1 2.10\n", " c2 2.90\n", "dtype: float64" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts_df.stack() # back to a series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to turn the indices into \"real\" columns, we can use the `reset_index()` method:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rowscolumnstemperature
0r0c01.20
1r0c10.80
2r0c23.10
3r1c00.10
4r1c10.05
5r1c21.00
6r2c01.40
7r2c12.10
8r2c22.90
\n", "
" ], "text/plain": [ " rows columns temperature\n", "0 r0 c0 1.20\n", "1 r0 c1 0.80\n", "2 r0 c2 3.10\n", "3 r1 c0 0.10\n", "4 r1 c1 0.05\n", "5 r1 c2 1.00\n", "6 r2 c0 1.40\n", "7 r2 c1 2.10\n", "8 r2 c2 2.90" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pts_df_reset = points.reset_index()\n", "pts_df_reset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So why bother with these (you haven't seen nothing yet) complicated multi-indices?\n", "\n", "As you might have guessed, adding data points, i.e. increasing the dimensionality of the data, is very easy and intuitive. Data remains aligned through addition and deletion of data. Moreover, treating these categorical variables as an index can help the mental modeling of the problem, especially when you wish to perform statistical modeling with your analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Constructing a MultiIndex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Creating a hierarchical index can be done in several ways:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiIndex([('a', 1),\n", " ('a', 2),\n", " ('b', 1),\n", " ('b', 2)],\n", " )" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], [1, 2, 1, 2]])" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiIndex([('a', 1),\n", " ('a', 2),\n", " ('b', 1),\n", " ('b', 2)],\n", " )" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.MultiIndex.from_tuples([('a', 1), ('a', 2), ('b', 1), ('b', 2)])" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiIndex([('a', 1),\n", " ('a', 2),\n", " ('b', 1),\n", " ('b', 2)],\n", " )" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.MultiIndex.from_product([['a', 'b'], [1, 2]]) # Cartesian product" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most common way to construct a MultiIndex, though, is to add to the existing index one of the columns of the dataframe. We'll see how it's done below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another important note is that with dataframes, the column and row index is symmetric. In effect this means that the columns could also contain a MultiIndex:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subjectBobGuidoSue
typeHRTempHRTempHRTemp
yearvisit
2013121.038.143.036.939.036.7
218.039.128.038.530.036.5
2014127.036.139.039.128.035.2
246.037.927.037.250.036.7
\n", "
" ], "text/plain": [ "subject Bob Guido Sue \n", "type HR Temp HR Temp HR Temp\n", "year visit \n", "2013 1 21.0 38.1 43.0 36.9 39.0 36.7\n", " 2 18.0 39.1 28.0 38.5 30.0 36.5\n", "2014 1 27.0 36.1 39.0 39.1 28.0 35.2\n", " 2 46.0 37.9 27.0 37.2 50.0 36.7" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],\n", " names=['year', 'visit'])\n", "columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],\n", " names=['subject', 'type'])\n", "\n", "# mock some data\n", "data = np.round(np.random.randn(4, 6), 1)\n", "data[:, ::2] *= 10\n", "data += 37\n", "\n", "# create the DataFrame\n", "health_data = pd.DataFrame(data, index=index, columns=columns)\n", "health_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This sometimes might seem too much, and so usually people prefer to keep the column index as a simple list of names, moving any nestedness to the row index. This is due to the fact that usually columns represent the measured dependent variable." ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HRTemp
yearvisitsubject
20131Bob32.036.7
Guido50.035.9
Sue60.037.5
2Bob32.036.8
Guido49.036.0
Sue18.038.3
20141Bob11.038.0
Guido24.036.4
Sue26.036.4
2Bob32.037.9
Guido54.037.6
Sue48.036.7
\n", "
" ], "text/plain": [ " HR Temp\n", "year visit subject \n", "2013 1 Bob 32.0 36.7\n", " Guido 50.0 35.9\n", " Sue 60.0 37.5\n", " 2 Bob 32.0 36.8\n", " Guido 49.0 36.0\n", " Sue 18.0 38.3\n", "2014 1 Bob 11.0 38.0\n", " Guido 24.0 36.4\n", " Sue 26.0 36.4\n", " 2 Bob 32.0 37.9\n", " Guido 54.0 37.6\n", " Sue 48.0 36.7" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "index = pd.MultiIndex.from_product([[2013, 2014], [1, 2], ['Bob', 'Guido', 'Sue']],\n", " names=['year', 'visit', 'subject'])\n", "columns = ['HR', 'Temp']\n", "\n", "# mock some data\n", "data = np.round(np.random.randn(12, 2), 1)\n", "data[:, ::2] *= 10\n", "data += 37\n", "\n", "# create the DataFrame\n", "health_data_row = pd.DataFrame(data, index=index, columns=columns)\n", "health_data_row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating a MultiIndex from a data column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While all of the above methods work, and could be useful sometimes, the most common method of creating an index is from an existing data column. " ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
locationdaytemphumidity
0ALSUN12.331
1ALSUN14.145
2NYTUE21.341
3NYWED20.941
4NYSAT18.849
5VASAT16.552
\n", "
" ], "text/plain": [ " location day temp humidity\n", "0 AL SUN 12.3 31\n", "1 AL SUN 14.1 45\n", "2 NY TUE 21.3 41\n", "3 NY WED 20.9 41\n", "4 NY SAT 18.8 49\n", "5 VA SAT 16.5 52" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "location = ['AL', 'AL', 'NY', 'NY', 'NY', 'VA']\n", "day = ['SUN', 'SUN', 'TUE', 'WED', 'SAT', 'SAT']\n", "temp = [12.3, 14.1, 21.3, 20.9, 18.8, 16.5]\n", "humidity = [31, 45, 41, 41, 49, 52]\n", "states = pd.DataFrame(dict(location=location, day=day, \n", " temp=temp, humidity=humidity))\n", "states" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
locationtemphumidity
day
SUNAL12.331
SUNAL14.145
TUENY21.341
WEDNY20.941
SATNY18.849
SATVA16.552
\n", "
" ], "text/plain": [ " location temp humidity\n", "day \n", "SUN AL 12.3 31\n", "SUN AL 14.1 45\n", "TUE NY 21.3 41\n", "WED NY 20.9 41\n", "SAT NY 18.8 49\n", "SAT VA 16.5 52" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states.set_index(['day'])" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temphumidity
daylocation
SUNAL12.331
AL14.145
TUENY21.341
WEDNY20.941
SATNY18.849
VA16.552
\n", "
" ], "text/plain": [ " temp humidity\n", "day location \n", "SUN AL 12.3 31\n", " AL 14.1 45\n", "TUE NY 21.3 41\n", "WED NY 20.9 41\n", "SAT NY 18.8 49\n", " VA 16.5 52" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states.set_index(['day', 'location'])" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temphumidity
daylocation
0SUNAL12.331
1SUNAL14.145
2TUENY21.341
3WEDNY20.941
4SATNY18.849
5SATVA16.552
\n", "
" ], "text/plain": [ " temp humidity\n", " day location \n", "0 SUN AL 12.3 31\n", "1 SUN AL 14.1 45\n", "2 TUE NY 21.3 41\n", "3 WED NY 20.9 41\n", "4 SAT NY 18.8 49\n", "5 SAT VA 16.5 52" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states.set_index(['day', 'location'], append=True)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
locationtemphumidity
day
iSUNAL12.331
iiSUNAL14.145
iiiTUENY21.341
ivWEDNY20.941
vSATNY18.849
viSATVA16.552
\n", "
" ], "text/plain": [ " location temp humidity\n", " day \n", "i SUN AL 12.3 31\n", "ii SUN AL 14.1 45\n", "iii TUE NY 21.3 41\n", "iv WED NY 20.9 41\n", "v SAT NY 18.8 49\n", "vi SAT VA 16.5 52" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states.set_index([['i', 'ii', 'iii', 'iv', 'v', 'vi'], 'day'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexing and Slicing a MultiIndex" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use these dataframes as an example:" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subjectBobGuidoSue
typeHRTempHRTempHRTemp
yearvisit
2013121.038.143.036.939.036.7
218.039.128.038.530.036.5
2014127.036.139.039.128.035.2
246.037.927.037.250.036.7
\n", "
" ], "text/plain": [ "subject Bob Guido Sue \n", "type HR Temp HR Temp HR Temp\n", "year visit \n", "2013 1 21.0 38.1 43.0 36.9 39.0 36.7\n", " 2 18.0 39.1 28.0 38.5 30.0 36.5\n", "2014 1 27.0 36.1 39.0 39.1 28.0 35.2\n", " 2 46.0 37.9 27.0 37.2 50.0 36.7" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HRTemp
yearvisitsubject
20131Bob32.036.7
Guido50.035.9
Sue60.037.5
2Bob32.036.8
Guido49.036.0
Sue18.038.3
20141Bob11.038.0
Guido24.036.4
Sue26.036.4
2Bob32.037.9
Guido54.037.6
Sue48.036.7
\n", "
" ], "text/plain": [ " HR Temp\n", "year visit subject \n", "2013 1 Bob 32.0 36.7\n", " Guido 50.0 35.9\n", " Sue 60.0 37.5\n", " 2 Bob 32.0 36.8\n", " Guido 49.0 36.0\n", " Sue 18.0 38.3\n", "2014 1 Bob 11.0 38.0\n", " Guido 24.0 36.4\n", " Sue 26.0 36.4\n", " 2 Bob 32.0 37.9\n", " Guido 54.0 37.6\n", " Sue 48.0 36.7" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data_row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If all we wish to do is to examine a column, indexing is very easy. Don't forget the dataframe as dictionary analogy:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
typeHRTemp
yearvisit
2013143.036.9
228.038.5
2014139.039.1
227.037.2
\n", "
" ], "text/plain": [ "type HR Temp\n", "year visit \n", "2013 1 43.0 36.9\n", " 2 28.0 38.5\n", "2014 1 39.0 39.1\n", " 2 27.0 37.2" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data['Guido'] # works for the column MultiIndex as expected" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "year visit subject\n", "2013 1 Bob 32.0\n", " Guido 50.0\n", " Sue 60.0\n", " 2 Bob 32.0\n", " Guido 49.0\n", " Sue 18.0\n", "2014 1 Bob 11.0\n", " Guido 24.0\n", " Sue 26.0\n", " 2 Bob 32.0\n", " Guido 54.0\n", " Sue 48.0\n", "Name: HR, dtype: float64" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data_row['HR'] # that's a Series!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accessing single elements is also pretty straight-forward:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "HR 50.0\n", "Temp 35.9\n", "Name: (2013, 1, Guido), dtype: float64" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data_row.loc[2013, 1, 'Guido'] # index triplet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can even slice easily using the first `MultiIndex` (year in our case):" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HRTemp
yearvisitsubject
20131Bob32.036.7
Guido50.035.9
Sue60.037.5
2Bob32.036.8
Guido49.036.0
Sue18.038.3
20141Bob11.038.0
Guido24.036.4
Sue26.036.4
2Bob32.037.9
Guido54.037.6
Sue48.036.7
\n", "
" ], "text/plain": [ " HR Temp\n", "year visit subject \n", "2013 1 Bob 32.0 36.7\n", " Guido 50.0 35.9\n", " Sue 60.0 37.5\n", " 2 Bob 32.0 36.8\n", " Guido 49.0 36.0\n", " Sue 18.0 38.3\n", "2014 1 Bob 11.0 38.0\n", " Guido 24.0 36.4\n", " Sue 26.0 36.4\n", " 2 Bob 32.0 37.9\n", " Guido 54.0 37.6\n", " Sue 48.0 36.7" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data_row.loc[2013:2017] # 2017 doesn't exist, but Python's slicing rules prevent an exception here\n", "# health_data_row.loc[1] # doesn't work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slicing is a bit more difficult when we want to take into account all available indices. This is due to the possible conflicts between the different indices and the columns.\n", "\n", "Assuming we want to look at all the years, with all the visits, only by Bob - we would want to write something like this:" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "scrolled": true }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (514763098.py, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m Cell \u001b[0;32mIn[89], line 1\u001b[0;36m\u001b[0m\n\u001b[0;31m health_data_row.loc[(:, :, 'Bob'), :] # doesn't work\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "health_data_row.loc[(:, :, 'Bob'), :] # doesn't work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This pickle is solved in two possible ways:\n", "\n", "First option is the [`slice`](https://www.programiz.com/python-programming/methods/built-in/slice) object:\n" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "year visit subject\n", "2013 1 Bob 32.0\n", " 2 Bob 32.0\n", "2014 1 Bob 11.0\n", " 2 Bob 32.0\n", "Name: HR, dtype: float64" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bobs_data = (slice(None), slice(None), 'Bob') # all years, all visits, of Bob\n", "health_data_row.loc[bobs_data, 'HR']\n", "# arr[slice(None), 1] is the same as arr[:, 1]" ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "year visit subject\n", "2013 1 Bob 32.0\n", " Guido 50.0\n", " 2 Bob 32.0\n", " Guido 49.0\n", "2014 1 Bob 11.0\n", " Guido 24.0\n", " 2 Bob 32.0\n", " Guido 54.0\n", "Name: HR, dtype: float64" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row_idx = (slice(None), slice(None), slice('Bob', 'Guido')) # all years, all visits, Bob + Guido\n", "health_data_row.loc[row_idx, 'HR']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another option is the [`IndexSlice`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.IndexSlice.html) object:" ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HRTemp
yearvisitsubject
20131Bob32.036.7
2Bob32.036.8
20141Bob11.038.0
2Bob32.037.9
\n", "
" ], "text/plain": [ " HR Temp\n", "year visit subject \n", "2013 1 Bob 32.0 36.7\n", " 2 Bob 32.0 36.8\n", "2014 1 Bob 11.0 38.0\n", " 2 Bob 32.0 37.9" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "idx = pd.IndexSlice\n", "health_data_row.loc[idx[:, :, 'Bob'], :] # very close to the naive implementation" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "year visit subject\n", "2013 1 Bob 36.7\n", " Guido 35.9\n", "2014 1 Bob 38.0\n", " Guido 36.4\n", "Name: Temp, dtype: float64" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx2 = pd.IndexSlice\n", "health_data_row.loc[idx2[2013:2015, 1, 'Bob':'Guido'], 'Temp']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, there's one more way to index into a `MultiIndex` which is very straight-forward and explicit; the [cross-section](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.xs.html)." ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
HRTemp
subject
Bob32.036.7
Guido50.035.9
Sue60.037.5
\n", "
" ], "text/plain": [ " HR Temp\n", "subject \n", "Bob 32.0 36.7\n", "Guido 50.0 35.9\n", "Sue 60.0 37.5" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "health_data_row.xs(key=(2013, 1), level=('year', 'visit'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Small caveat: unsorted indices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Having an unsorted index in your `MultiIndex` might make the interpreter pop a few exceptions at you:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "char int\n", "a 1 0.306670\n", " 2 0.989591\n", "c 1 0.793785\n", " 2 0.844271\n", "b 1 0.847586\n", " 2 0.780704\n", "dtype: float64" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# char index in unsorted\n", "index = pd.MultiIndex.from_product([['a', 'c', 'b'], [1, 2]])\n", "data = pd.Series(np.random.rand(6), index=index)\n", "data.index.names = ['char', 'int']\n", "data" ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "ename": "UnsortedIndexError", "evalue": "'Key length (1) was greater than MultiIndex lexsort depth (0)'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mUnsortedIndexError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[96], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mdata\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43ma\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m:\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mb\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m]\u001b[49m\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/series.py:1146\u001b[0m, in \u001b[0;36mSeries.__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1142\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_get_values_tuple(key)\n\u001b[1;32m 1144\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(key, \u001b[38;5;28mslice\u001b[39m):\n\u001b[1;32m 1145\u001b[0m \u001b[38;5;66;03m# Do slice check before somewhat-costly is_bool_indexer\u001b[39;00m\n\u001b[0;32m-> 1146\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_getitem_slice\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1148\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m com\u001b[38;5;241m.\u001b[39mis_bool_indexer(key):\n\u001b[1;32m 1149\u001b[0m key \u001b[38;5;241m=\u001b[39m check_bool_indexer(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mindex, key)\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/generic.py:4349\u001b[0m, in \u001b[0;36mNDFrame._getitem_slice\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 4344\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 4345\u001b[0m \u001b[38;5;124;03m__getitem__ for the case where the key is a slice object.\u001b[39;00m\n\u001b[1;32m 4346\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 4347\u001b[0m \u001b[38;5;66;03m# _convert_slice_indexer to determine if this slice is positional\u001b[39;00m\n\u001b[1;32m 4348\u001b[0m \u001b[38;5;66;03m# or label based, and if the latter, convert to positional\u001b[39;00m\n\u001b[0;32m-> 4349\u001b[0m slobj \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mindex\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_convert_slice_indexer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mkey\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mkind\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mgetitem\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4350\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(slobj, np\u001b[38;5;241m.\u001b[39mndarray):\n\u001b[1;32m 4351\u001b[0m \u001b[38;5;66;03m# reachable with DatetimeIndex\u001b[39;00m\n\u001b[1;32m 4352\u001b[0m indexer \u001b[38;5;241m=\u001b[39m lib\u001b[38;5;241m.\u001b[39mmaybe_indices_to_slice(\n\u001b[1;32m 4353\u001b[0m slobj\u001b[38;5;241m.\u001b[39mastype(np\u001b[38;5;241m.\u001b[39mintp, copy\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m), \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m)\n\u001b[1;32m 4354\u001b[0m )\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/indexes/base.py:4281\u001b[0m, in \u001b[0;36mIndex._convert_slice_indexer\u001b[0;34m(self, key, kind)\u001b[0m\n\u001b[1;32m 4279\u001b[0m indexer \u001b[38;5;241m=\u001b[39m key\n\u001b[1;32m 4280\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m-> 4281\u001b[0m indexer \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mslice_indexer\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstart\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstep\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 4283\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m indexer\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/indexes/base.py:6662\u001b[0m, in \u001b[0;36mIndex.slice_indexer\u001b[0;34m(self, start, end, step)\u001b[0m\n\u001b[1;32m 6618\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mslice_indexer\u001b[39m(\n\u001b[1;32m 6619\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 6620\u001b[0m start: Hashable \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 6621\u001b[0m end: Hashable \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 6622\u001b[0m step: \u001b[38;5;28mint\u001b[39m \u001b[38;5;241m|\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 6623\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mslice\u001b[39m:\n\u001b[1;32m 6624\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 6625\u001b[0m \u001b[38;5;124;03m Compute the slice indexer for input labels and step.\u001b[39;00m\n\u001b[1;32m 6626\u001b[0m \n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 6660\u001b[0m \u001b[38;5;124;03m slice(1, 3, None)\u001b[39;00m\n\u001b[1;32m 6661\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m-> 6662\u001b[0m start_slice, end_slice \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mslice_locs\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstart\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mend\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstep\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstep\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 6664\u001b[0m \u001b[38;5;66;03m# return a slice\u001b[39;00m\n\u001b[1;32m 6665\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m is_scalar(start_slice):\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2904\u001b[0m, in \u001b[0;36mMultiIndex.slice_locs\u001b[0;34m(self, start, end, step)\u001b[0m\n\u001b[1;32m 2852\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 2853\u001b[0m \u001b[38;5;124;03mFor an ordered MultiIndex, compute the slice locations for input\u001b[39;00m\n\u001b[1;32m 2854\u001b[0m \u001b[38;5;124;03mlabels.\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 2900\u001b[0m \u001b[38;5;124;03m sequence of such.\u001b[39;00m\n\u001b[1;32m 2901\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 2902\u001b[0m \u001b[38;5;66;03m# This function adds nothing to its parent implementation (the magic\u001b[39;00m\n\u001b[1;32m 2903\u001b[0m \u001b[38;5;66;03m# happens in get_slice_bound method), but it adds meaningful doc.\u001b[39;00m\n\u001b[0;32m-> 2904\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43msuper\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mslice_locs\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstart\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mend\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstep\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/indexes/base.py:6879\u001b[0m, in \u001b[0;36mIndex.slice_locs\u001b[0;34m(self, start, end, step)\u001b[0m\n\u001b[1;32m 6877\u001b[0m start_slice \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 6878\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m start \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m-> 6879\u001b[0m start_slice \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_slice_bound\u001b[49m\u001b[43m(\u001b[49m\u001b[43mstart\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mleft\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 6880\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m start_slice \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 6881\u001b[0m start_slice \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2848\u001b[0m, in \u001b[0;36mMultiIndex.get_slice_bound\u001b[0;34m(self, label, side)\u001b[0m\n\u001b[1;32m 2846\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(label, \u001b[38;5;28mtuple\u001b[39m):\n\u001b[1;32m 2847\u001b[0m label \u001b[38;5;241m=\u001b[39m (label,)\n\u001b[0;32m-> 2848\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_partial_tup_index\u001b[49m\u001b[43m(\u001b[49m\u001b[43mlabel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mside\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mside\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[0;32m~/Projects/courses/python_for_neuroscientists/textbook-public/venv/lib/python3.10/site-packages/pandas/core/indexes/multi.py:2908\u001b[0m, in \u001b[0;36mMultiIndex._partial_tup_index\u001b[0;34m(self, tup, side)\u001b[0m\n\u001b[1;32m 2906\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_partial_tup_index\u001b[39m(\u001b[38;5;28mself\u001b[39m, tup: \u001b[38;5;28mtuple\u001b[39m, side: Literal[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mleft\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mright\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mleft\u001b[39m\u001b[38;5;124m\"\u001b[39m):\n\u001b[1;32m 2907\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(tup) \u001b[38;5;241m>\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_lexsort_depth:\n\u001b[0;32m-> 2908\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m UnsortedIndexError(\n\u001b[1;32m 2909\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mKey length (\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mlen\u001b[39m(tup)\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m) was greater than MultiIndex lexsort depth \u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 2910\u001b[0m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m(\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_lexsort_depth\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m)\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 2911\u001b[0m )\n\u001b[1;32m 2913\u001b[0m n \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mlen\u001b[39m(tup)\n\u001b[1;32m 2914\u001b[0m start, end \u001b[38;5;241m=\u001b[39m \u001b[38;5;241m0\u001b[39m, \u001b[38;5;28mlen\u001b[39m(\u001b[38;5;28mself\u001b[39m)\n", "\u001b[0;31mUnsortedIndexError\u001b[0m: 'Key length (1) was greater than MultiIndex lexsort depth (0)'" ] } ], "source": [ "data['a':'b']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`lexsort` means \"lexicography-sorted\", or sorted by either number or letter. Sorting an index is done with the [`sort_index()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_index.html) method:" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "char int\n", "a 1 0.306670\n", " 2 0.989591\n", "b 1 0.847586\n", " 2 0.780704\n", "c 1 0.793785\n", " 2 0.844271\n", "dtype: float64\n", "char int\n", "a 1 0.306670\n", " 2 0.989591\n", "b 1 0.847586\n", " 2 0.780704\n", "dtype: float64\n" ] } ], "source": [ "data.sort_index(inplace=True)\n", "print(data)\n", "print(data['a':'b']) # now it works" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Aggregation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data aggregation using a `MultiIndex` is super simple:" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
locationdaytemphumidity
0ALSUN12.331
1ALSUN14.145
2NYTUE21.341
3NYWED20.941
4NYSAT18.849
5VASAT16.552
\n", "
" ], "text/plain": [ " location day temp humidity\n", "0 AL SUN 12.3 31\n", "1 AL SUN 14.1 45\n", "2 NY TUE 21.3 41\n", "3 NY WED 20.9 41\n", "4 NY SAT 18.8 49\n", "5 VA SAT 16.5 52" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temphumidity
locationday
ALSUN12.331
SUN14.145
NYTUE21.341
WED20.941
SAT18.849
VASAT16.552
\n", "
" ], "text/plain": [ " temp humidity\n", "location day \n", "AL SUN 12.3 31\n", " SUN 14.1 45\n", "NY TUE 21.3 41\n", " WED 20.9 41\n", " SAT 18.8 49\n", "VA SAT 16.5 52" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "states.set_index(['location', 'day'], inplace=True)\n", "states" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temphumidity
location
AL13.20000038.000000
NY20.33333343.666667
VA16.50000052.000000
\n", "
" ], "text/plain": [ " temp humidity\n", "location \n", "AL 13.200000 38.000000\n", "NY 20.333333 43.666667\n", "VA 16.500000 52.000000" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# mean all days under each location\n", "states.groupby('location').mean()\n" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
temphumidity
day
SAT17.6550.5
SUN13.2038.0
TUE21.3041.0
WED20.9041.0
\n", "
" ], "text/plain": [ " temp humidity\n", "day \n", "SAT 17.65 50.5\n", "SUN 13.20 38.0\n", "TUE 21.30 41.0\n", "WED 20.90 41.0" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# median all locations under each day\n", "states.groupby('day').median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`````{admonition} Exercise: Replacing Values\n", "````{hint}\n", "When we wish to replace values in a Series or a DataFrame, two main options come to mind:\n", "\n", "1. A boolean mask (e.g. `df[mask] = \"new value\"`).\n", "2. The [`replace()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) method.\n", "\n", "In the following exercise try and explore the second method, which provides powerful custom replacement options.\n", "\n", "````\n", "* Create a (10, 2) dataframe with increasing integer values 0-9 in both columns.\n", "````{dropdown} Solution\n", "```python\n", "data = np.tile(np.arange(10), (2, 1)).T\n", "df = pd.DataFrame(data)\n", "```\n", "````\n", "* Use the `.replace()` method to replace the value 3 in the first column with 99.\n", "````{dropdown} Solution\n", "```python\n", "df.replace({0: 3}, {0: 99})\n", "```\n", "````\n", "* Use it to replace 3 in column 0, and 1 in column 2, with 99.\n", "````{dropdown} Solution\n", "```python\n", "df.replace({0: 3, 1: 1}, 99)\n", "```\n", "````\n", "* Use its `method` keyword to replace values in the range [3, 6) of the first column with 6.\n", "````{dropdown} Solution\n", "```python\n", "df[0].replace(np.arange(3, 6), method='bfill')\n", "```\n", "````\n", "`````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`````{admonition} MultiIndex Construction and Indexing\n", "* Construct a `MultiIndex` with three levels composed from the product of the following lists:\n", " - `['a', b', 'c', 'd']`\n", " - `['i', 'ii', 'iii']`\n", " - `['x', 'y', 'z']`\n", "````{dropdown} Solution\n", "```python\n", "letters = ['a', 'b', 'c', 'd']\n", "roman = ['i', 'ii', 'iii']\n", "coordinates = ['x', 'y', 'z']\n", "index = pd.MultiIndex.from_product((letters, roman, coordinates))\n", "```\n", "````\n", "* Instantiate a dataframe with the created index and populate it with random values in two columns.\n", "````{dropdown} Solution\n", "```python\n", "size = len(letters) * len(roman) * len(coordinates)\n", "data = np.random.randint(20, size=(size, 2))\n", "df = pd.DataFrame(data, columns=['today', 'tomorrow'], index=index)\n", "```\n", "````\n", "* Use two different methods to extract only the values with an index of `('a', 'ii', 'z')`.\n", "````{dropdown} Solution\n", "Option \\#1:\n", "```python\n", "df.loc['a', 'ii', 'z']\n", "```\n", "Option \\#2:\n", "```python\n", "df.xs(key=('a', 'ii', 'z'))\n", "```\n", "Option \\#3:\n", "```python\n", "idx = pd.IndexSlice\n", "df.loc[idx['a', 'ii', 'z'], :]\n", "```\n", "````\n", "* Slice in two ways the values with an index of `'x'`.\n", "````{dropdown} Solution\n", "Option \\#1:\n", "```python\n", "idx = pd.IndexSlice\n", "df.loc[idx[:, :, 'x'], :]\n", "```\n", "Option \\#2:\n", "```python\n", "df.xs(key='x', level=2)\n", "```\n", "Option \\#3:\n", "```python\n", "df.loc[(slice(None), slice(None), 'x'), :]\n", "```\n", "````\n", "`````" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## _n_-Dimensional Containers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While technically a dataframe is a two-dimensional container, in the next lesson we'll see why it can perform quite efficiently as a pseudo n-dimensional container. \n", "\n", "If you wish to have _true_ n-dimensional DataFrame-like data structures, you should use the `xarray` package and its `xr.DataArray` and `xr.Dataset` objects, which we'll discuss in the next lessons." ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Tags", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 2 }