{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Murcko histogram split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Murcko histograms are a new coarse-grained descriptor of molecules that generalizes the concept of Murcko scaffolds. In the DreaMS project, we utilize them to create non-leaking train-validation splits for MS/MS datasets such as MoNA and NIST20. This tutorial demonstrates how to construct a Murcko histogram split for a new dataset, using the [MassSpecGym dataset](https://huggingface.co/datasets/roman-bushuiev/MassSpecGym) as an example.\n", "\n", "Let's look at the example of a Murcko histogram for the luciferine molecule." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# Load the necessary libraries\n", "from rdkit import Chem\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from tqdm import tqdm\n", "from dreams.algorithms.murcko_hist import murcko_hist, are_sub_hists\n", "from dreams.utils.data import MSData, evaluate_split\n", "from dreams.utils.plots import init_plotting\n", "from dreams.definitions import *\n", "tqdm.pandas()\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original molecule:\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO2de1hU1f7/3zPcr6KAQN4Ab6DiFcqE8mjYyUS0FCwT0VIsy9HMr9jpKHanOiqadqTyZ2hpoqkhat71KKA4oQgoqIAiAsP9DsNc1u+PjeMwDMhlZvbMdr0eHp9xrT17vwf2vPda6/NZa/EIIaBQKBRKV+GzLYBCoVAMG2qjFAqF0i2ojVIoFEq3oDZKoVAo3YLaKIVCoXQLaqMUCoXSLYzZFkDROjJZRUXF/sbGLKm03NS0r43NJGvriTyeEdu6KBSOwKN5o9ymoSHjzh1/iaTIxMTZ2NixqSlPJqsaPPi4re0rbEujUDgCbY1ynPz8lVJpibv77z17zgFAiKy29qKV1Xi2dVEo3IG2RrkNuX7dztTUddiwVLaVUCichYaYuA2PEJlE8lAmq2BbCYXCWaiNchw7u5lSadmtWz5lZTtlsmq25VAoHIR26jmOTFaVl/deRcV+QqR8vlXPnsHOzuHm5kPZ1kWhcAdqo9yBEGlDQ2pt7aXa2oSamgseHpfNzNyYKqm0uLz89/Ly3+rqko2MbIYOTbKwGM6uWgqFM1AbNWyk0tK6uqTa2oTa2oT6eqFc3qiocnPb26vXGyrHi0Qb8vNX9eo1z81tt26V6iNSKVJTwedjzJgW5WIx0tPh7Iw+fVhSRjEoqI0aHhJJQW1tAtPqrK+/BsgVVWZm7tbWvtbWflZWvhYWwwBeq3eTlBQzC4uRnp5CXWrWT8rLYW8PAHv34g2lJ86dOxgyBKtX45tv2JJGMSRo3qgBoNJbl0qLFVV8vqWl5Rhraz9ra18rqwnGxvbtn0osvkuIxNi4l5YlGxgrVuCVV2Bnx7YOimFCbVRPKS0tTUpKSkhIeOstoVSaoNxbNzFxsbb2tbb2tbLytbQcw+O1+UeUyxuzsnzt7GbY2EwyNnZobMwqKFgLwN5+vi4+g4EwcSIuXsQnn2DbNralUAwTaqN6REFBQUJCwqVLlxISEq5duyaXywGMGDHK07OxA711NUilpSYmfQoLvywoiGBKjI0d+vXb3KvXPC1+DENj5Ei4u2P7doSG4tln2VajW+rr60tLS/v27cvn09zHrkNtlE3EYrFQKExMTExISEhMTCwpKVFUWVlZ+fj4+Pn5jR07cehQHyOjHl04v6lp30GD4uTyWrE4WyIpNDZ2NjcfyudbaO4TcISvv8bBg1i8GH//DeOn4zshFovDwsL27NkjlUp79ep14sQJb29vtkUZKk/HLaNPVFdXJycnM03OhISEhoYGRZWzs7O3t7efn5+vr6+Pj4+ZmZlGrsjnW1tYjLKwGKVcKBbfrayMc3JaqZFLGDpOTli3Dh99hK1bsWIF22q0TFlZ2datW7du3VpaWsqUlJeX+/n57dq1Kzg4mF1tBgq1UV1QW1u7cuVKqVSalJSUmZmpKOfz+V5eXn5+fhMmTPD19XVzc9ONHrm89tatZ2WySkvLcTY2E3VzUT1HIMAvv2DdOgQFsS1FaxQVFW3fvj0qKqqqqgrAsGHDpkyZsmDBgsDAwAcPHsyZM+fAgQPR0dE9e/ZkW6mhQSjaZ+LEx1ZlaWnp6+sbHh4eFxdXVlbGlqSCgs+FQmRkDJfLm9jSwDplZQQgy5Y1//fiRcLjkZAQcvs2Acjq1YQQIpWyKFBjpKWlhYWFKfo3vr6+cXFxygfExMRYW1sD6N+//7lz51iSaahQG9UFRkZGAN59992rV69KJBK25RBCiFwuTk8fKhSiqGgD21pYQ8VGCSGhoYTHI7/+2myjcjkZNYoEBZGEBPZUdo+LFy8GBATweDwAfD4/ICDg6tWrao/Mycnx9fUFwOPxBAJBY2OjjqUaLtRGtY5MJmNu4pqaGra1tKCq6oRQiGvXbJqaHrKthR1a26hIRHr2JK6uzTaakkL4fAIQgDz/PNmzh4jF7MntDHK5PC4ujrFFAGZmZiEhIbdv327/XRKJJDIy0sTEBMCIESOuX7+uG7WGDrVRrcMM5Pfs2ZNtIWq4e/c1oRA5OW+xLYQdWtsoIWTbtmbfZDr12dkkPJzY2zcXOjmR8HCSl8eK3g7R1NQUExMzfHjzmgm2trYCgaCgoKDjZ7hy5crgwYMBmJubR0ZGymQy7anlBtRGtU5GRgYADw8PtoWoQSzOS0mxEgpRXX2WbS0soNZGZTLy3HOPbZShoYHExJCRI5vN1NSUBAWRU6d0rPcJ1NbWRkVF9e/fnzFQZ2fniIiIysrKto4vKytLT09XW1VfXy8QCJjzvPTSSw8ePNCaai5AbVTrnDt3DsCLL77IthD1MLGm9PRhT1WsiYnt1dWR8HDy55+qtampJDycHDum5o1CIQkJIcbGzX46ZgyJjiZ1dVoX3D4lJSURERH29s1TgQcNGhQVFdXO4GZhYWFERESPHj1Gjx4tl8vbOuz48eMuLi4AevTo8euvv2pHOxegNqp1fv/9dwBBQUFsC1HPUxhrKi4mgwaRsDDS5WhfQQGJiCCOjs1mamdHBAJy755GVXaMe/fuCQQCS0tLxkDHjRsXExPTTjc8PT09NDSUGf3k8XhTp06tqqpq5/wikWj69OnMyYOCgsrLy7XwIQweaqNaZ8uWLQDef/99toW0SVXVyUexpny2tWid2lri7d0cMmpo6NapGhtJbCyZMKHZTPl8EhBATp0ibTfvNMmNGzdCQkIUhhgQEHCq3VEGoVAYEhLCJI0wIfvk5OQOXoumQ7UPtVGt8+9//xvAp59+yraQ9ngUa5rLthDtIpORmTMJQNzciEiksdMmJpI33ySmps1+Om1a2fbt22trazV2gZYo5zCZmJiEhIS0NcSpfLxyyD4rK6uzF83OzqbpUG1BbVTrhIWFAdi+fTvbQtrjKYk1LVtGAGJvTzIzNX/yoiISGUn69SMTJmxlQuRhYWE3b97U1PmZHKYJEyYwhmhlZSUQCO7fv9/W8TKZLC4ubty4cV0O2atA06Hagtqo1pkxYwaAgwcPsi3kCRQUfMHtWNOGDc1B9rOtnhTr1pHz5zVzFbGY7N27X2F2TPf5r7/+aieS80SYHKZhw4Yx53RwcIiIiCgtLW3r+M6G7DvF5cuXaTqUCtRGtc7zzz8PIEHv58EoxZr+w7YWzRMXR4yMmmcoqfD99wQgNjakbV/qCikpKWFhYYrgz6BBgyIjIzsboqmpqYmKiurXrx9zkgEDBkRFRbUzXKA2ZN/QzTHgVtTV1SnSofz9/fPzuT+k3j7URrWOu7s7gDt37rAt5MlwNdaUnEwsLQlAvvlGtSo+nhgbEx6PxMRo5dIVFRVRUVGurq6M6djY2ISFhaWlpT3xjcXFxREREb16Ne9T4OXlFRMT09TUZkeBCdlbWVkph+yl2lwR4NixY87OzgDs7Oye8nQoaqNahwlxtp9Woj/cvft6SorNwYOr2BaiMXJyiJMTAciiRapVQiGxsiIA+fxz7Wpghin9/f2ZuBCzOEhsbKzaBRZyc3OVc5iYZUTaGRNIS0tThOzVLjuiPWg6FAO1Ue1SV1fHjCKxLaSj1Nff9/Z2BXC29QiiAVJaSoYOJQCZOlU1SzQ3lzg7E4C8/bbu9GRlZSm3Gfv06RMREVFSUsLUpqamhoSEGBsbK3KY2h8Lar3siFAo1MnnaEFMTAzziQYMGHBeU2PMBgW1Ue2Sk5PD3F5sC+kEX3zxBYBhw4a104U0CMRiMmlS81wjlWVhqqqIlxcByKRJLKw2UlFRsWHDhoEDBzJmamlpOWvWrEmTJjGGaGpqumDBgnZC/GqXHWF31Cg7O5uJqjHpUGJDWcFFQ1Ab1S6XL18G4OPjw7aQTiAWi4cOHQrgP/8x4FiTXE7mzSMA6dOHqMwIb2oiL71EADJ8OKmoYEkfITKZ7NSpU0FBQUZGRubm5jwez9raWiAQ5LW98InKsiMODg7h4eGFhYW6lN0WyulQXl5eqampbCvSHdRGtcuff/4JICAggG0hnePkyZNMPMRwg7CrVxOA2NoSla+zXE7mzycAeeYZ0nbOpU6Jj49nOvjtjC12NmTPFk9nOhS1Ue3y008/AXhbl8NvGuL1118H8Oabb7ItpCv89BMBiIkJOXlSterf/25Ob7p2jQ1l6jh79iyAiRMnqq1VyWEaMWJE+yF71qmurmamnDw96VDURrULM8748ccfsy2k0+Tl5TFxgzNnzrCtpXMcO3bs2Wen2dhU79ypWrVjBwGIkRHRVSi7Q+zduxdAcHCwSnlnQ/Z6xcGDBx0cHJh0qN9++41tOdqF2qh2YbKUN23axLaQrvDll18aXKwpJSWFyTCLjPyvStW5c83T3rdtY0Vam2zevBnABx98oFxYX19vZ2fHBG0CAwP1f/pGa0QikWIuP7fToaiNapc33ngDgIE+jcVisYeHB4DvvvuObS0dIj8/v2/fvgDeeOMNlVbbjRu3XVxqAbJmDVvq2uSTTz4B8Nlnn6mUr169OjQ0NCMjgxVVGkEul0dHRzPdGicnp61bt7KtSCtQG9UukydPBnD69Gm2hXSRU6dOGUqsqaqqatSoUcwgo8r6Q/n5+f369fP09F60qEQP+8SLFy8GEB0dzbYQbXHr1i3mTwNg9uzZbMvRPHxQtIlIJALQu3dvtoV0EX9//1mzZtXU1KxatYptLe0hkUhmz56dmprq6el56NAhxU7CAGpqagICAh48eODgYPH999aPphHpEYZ+kzwRDw+PpKSkF154AcDBgweZz8sp2PZxjuPo6AigqKiIbSFdJy8vjxlt1OdY05IlSwA4Ozvfa7kGvVQqDQwMBDBw4MDi4mK25LXP+PHjYQiL13QTmUzGeI4e5ml1E9oa1SIymaysrIzP5yuyVQyRfv36/etf/wKwbNkyiUTCthw1fPbZZ9HR0ZaWlocPHx4wYIBylUAgiIuLc3BwOH78OPNI00OY1pmTkxPbQrRLQUEBAFNTU8VcWO7Ato9zmcLCQgC9e/dmW0h3UcSavv32W7a1qLJ3714ej2dkZHT48GGVqq+++gqAhYVFYmIiK9o6CGMr1dXVbAvRJAcOHDh9+rTy2ivp6ekAPD09WVSlJaiNapHU1FQAI0aMYFtI58jLy2u98hATa7K2ttarWNOFCxeYYdAtW7aoVO3bt4/P5/P5/D/++IMVbR2ktraW8Xq2hWgSuVxubGzM4/GUU+Xan2Vg0Biz1Qp+GjDEzlpRUdGLL744dOjQ/fv329jYKMr9/f3/+c9/njx50s3NrV+/fi4uLj179nzmmWeYF8qvXVxceDqJ49y6dWvmzJlisXjVqlXLli1Trrp48WJoaKhcLt+0aRMzHUtvMcSb5ImUlZVJpdJevXoplu8DRz8pA7VRLVJcXAyDisA2NDS89tpr9+7dc3R0ZLaQVFBSUpKVlcXn8yUSSU5ODrNylVrMzc1VjFXlRe/evZm14LpDSUnJ9OnTKyoqZs+e/c033yhXZWdnz5o1q7GxccmSJStWrOjmhbSNwd0kHUGtY3LykzJQG9UihvX4lcvlc+fOvXz5spub25EjRxRzEAE0NjbOnDnz3r1748aN++uvv6qrqwsKCioqKgoLC1u/YP4tLCy8efNmW9diGq1tuW3fvn179OjRjtSGhobAwMDs7GwfH5+YmBg+/3GktLS0dOrUqSUlJdOmTdu2bZtGfjNaxbBukg6i1jGpjVK6gmHdNx9++OHhw4ft7e2PHz+u/K0mhLzzzjuJiYmurq7x8fEODg4ODg7MzihqqaurE4lEIpGopKRE8aK4uLiwsLDkERUVFRUVFe2IsbGxcXFxcXR0dHR0VLxwdnZ2cnKyt7dfs2bN5cuX3d3d4+Pjle2esdc7d+6MGzdu3759Kg1q/cSwbpIO0k5rlGMPDAZqo1rEgL4hGzdu3LJli6mp6f79+5nFRhWsWrVqz549tra2cXFxzN477WNlZeXu7t6OzwKoqKhoqz1bUVHx4MGDmpqampqa27dvq327tbV1jx49/vrrL+XfrVwunzdvXlJSkqur69GjRw0lq+bpaY1yeJYBtVEtYiiP3yNHjqxevZrH4+3YsWPSpEnKVT/++OPGjRtNTEz++OMPLy8vTV2R6c63c0BlZWXrZizzIi8v7+HDh88884xiA2EGHo/n5eV15syZI0eO6P/vXEFJSQk4Zy7tdOoN6E/TCdhOFeAy3t7eAK5cucK2kPZITk5mGm7ftNo28+jRo0zaSoyWts3sEhKJZOTIkQA+V7cRnZ4sBd9x5syZA2DPnj1sC9EkalcJcHNzA3D37l22VGkPOotJi+j/4zc3N3f69Ol1dXWLFi1avXq1clVKSsqcOXOkUumnn346f/58thS2xtjYeNu2bTwe76uvvsrNzVWp7ciwg16h/zdJF1DbfzegMa7OQm1UizD3jd7OQSwvL586dapIJJo6dep///tf5ar8/PwZM2bU1tYuXLhw7dq1bClsCz8/v7lz5zY0NHz00Udsa+kunBwxbP1sqKurq6urs7CwUE5G5g5sN4c5S2VlJQAbGxu2hahHLBYzw6BeXl6VlZXKVVVVVcww6D/+8Q+93eKxqKiISYqKj49nW0u3YJaIF4lEbAvRJK3779nZ2QBcXV1ZVKU9aGtUW+hzF4YQ8s4775w7d65Pnz7Hjh1TTtKUSCSzZs1KS0sbNmzYoUOHTE1NWdTZDk5OTuvWrQOwfPnyxsZGtuV0EalUWl5ebmRkZNCL17Sm9c2vz1+H7kNtVFvocyLLv/8tvXfPiMkZYpaLZyCELF68+PTp0y4uLsePH2c2sdBbBALBqFGjsrOz//Of/7CtpYuUlJTI5XIHBweDSHHtIGr77/r8deg+3bPRPXvw1lvIz1ctT0rCW2/h0qUWhfX12L4dc+bghRfwz39CIEBCQreurt/o7eM3OhpffWWSnLzzzz//HjFihHLV+vXrY2JirK2tjx49qpJOpIcYGxtv3bq1rViTQaC3N0l3eNpmgqK7NpqSgj17UFmpWn7vHvbsgfK06/R0eHrivfeQkQEnJxgZYdcu+Plh0SJIpd3SoK/o5+P3+HF88AEAbNnCmzhxoHLV779f/Pzzz42NjWNjY8eMGcOOvk6iiDWtXLmSbS1dQT9vkm7STu49xz6pAp106quqMG0aysoQF4f0dBw4gGPHkJeH4GDs2IHPPtOFBp2jh4/f9HS8+SakUqxbhyVLWlSdPo3Q0BcmTvx069atU6dOZUlgV9iwYUOPHj0OHz589OhRtrV0Gj28SboPbY1qh+3bkZeHyEhMn/640NYWu3Zh8GBs2IDycl3I0C0HDhwAcO/ePbaFNPPwIV59FVVVeOMNrF/fourmTQQFoakJ48evXaLir3qPk5NTREQEDDPWxMk22tO2Lgl0ZKOHD4PPx4IFquVmZli4EPX1OHlSFzJ0C7P0BrM3N9taUF2NadPw4AFefBG//ALl5UALC/Hqq6isRFAQvvySPYndYNmyZUys6bvvvmNbS+fg5EzQp21dEmjGRpOTcfZsi5+MjBYHpKfDxQXW1mreO3Zs8wHcIi0tjWkZHT16dMiQId988037CxppFYkEs2cjNRWenjh8GEqbZqK2FtOm4f59+Plh1y7wDTNxQxFr+vrrrw0r1sTh3PunZ10SoJvp9x99RIA2f5iJ2BIJAYiPj/ozJCcTgCxd2i0Z+sS9e/fGjx9vZ2e3b9++WbNmKeLdNjY277///q1bt3Qv6d13CUCcnUnLTTOJVEoCAwlABg4k+rppZieYN28egBkzZrAtpBO8+uqrMPwZBCqoXSWAmWWgt5uzdhNNND8OHsSNGy1+lFcjNzaGqamaaD5DQwMAGMiaZu2Tl5f30ksvDRkyxMTEpKioKDg4+MCBA7m5uXFxcf7+/rW1tdu2bfP09PTz89u/f79UV/kJn32G7dthYYHDh9Fy00wsX464ODg44Phx6OuE1U7w3Xff2draHjkSd/36Kba1dBROttFad+oVswx69erFni5t0i0TZlqjaWmq5Xv2PG6NEkKGDCHm5kQuV3OGHTsIQLZt65YMtsnLywsMDDQ2NjY3Nz9w4IDaYzIzMwUCgWIRzD59+kRERJSUlGhV2N69hMcjRkbk0CHVqshIAhALC8Kl3dH37fvvlSsj09IGymQNbGvpEP369QNwT6WbYOAMGzYMQJqSLTBbKzs5ObGoSqvoxEbDwghALlxQcwamV3nzJiGEGOAGsw8ePAgNDbWysjIxMRk1alRpaWn7x1dVVUVHR3t6ejJmamZmFhIScv36dW1ou3CBmJkRgLTaNJPExhI+n/D5RL83zew0crkkI2OUUIiCgs/Y1tIhzM3NAdTX17MtRJO07r9fv34dgJeXF4uqtIpObPT6dWJkRLy9SU1Ni8OOHCE8HnnlFUIIkcnImDHE31/N2fQSkUi0ePHiXr16WVpa2trafvXVVx1/r0wmO3XqVFBQkGIK4Lhx42JiYpR3o+0mMhkZPpwA5KOPVKsuXiTm5gQgGzdq6mp6RE3NRaGQl5Ji0diYw7aWJ8BEHW1tbdkWokkkEgmfzzcyMpJKpYrCEydOAPD392dRmFbRiY2SR31IDw8SHU0uXSLHj5Nly4ipKenbtznwcf06sbEhADExIe+/r88hD5FItGjRIhcXF2tra3Nzc29v7y4HjrKysgQCga2traKn/8UXX4hEqnvEd428PBIeTmSyFoV37xJHRwKQJUs0chF9JCdnnlCIu3f1PdaUlZUFYNCgQWwL0SRq+++7d+8GMHfuXLZUaRtd2SghJDa2uYHE/FhYkHnzSH7+4wNKSohAQIyNCUCsrUlEBNGzzo5IJFq4cGGfPn0sLS0tLS3t7OwWL17c/SZkTU1NdHQ0M73dw2OuqSkJCiKXLmlEcgtKSsjgwQQg06YRpbYC12hqKrp2zU4oRGXlEba1tMf//vc/AL6+vmwL0SRM/33kyJHKhczaMR9++CFbqrRN92xUKiWNjWpiRzIZaWxUbQgxFBaSlBRy8yZpaCMIkJlJgoKarbZvXxIdrQ/f+OLi4oULF/bt25fP51tbW5uZmXl4eJw8eVKDl5DL5adOnXr//Swjo+ZP/9xzZPdu0tiomfPX15MJEwhAxo0jtbWaOafeIhJtEgqh57EmZp7ba6+9xrYQTaK2/85srPD111+zpUrbdC/hycgIZmYt5sQw8PkwM1OfzO3sjDFj4OkJc3P15xw6FLGxOHMGY8ciPx9LluDZZ3HuXLd0doOSkpKwsLBnn312586d1dXVFhYWhJDJkycnJSVNmTJFgxfi8Xj+/v5btw7Jy0NEBBwccOUKQkLQvz/WrEFeXnfPP38+EhPh6or4eG4kmLWHo+MyC4tRYnG2SPQt21rahM4E5Qz6Om1l8mQIhYiNhasrUlIweTKmTEFami4llJaWLl68+Nlnn/3pp59KSkosLS1rampcXFy+/fbbY8eOaW8tzmeewfr1yM9HTAxGj0ZxMb75Bu7umD4dp093/bTBwXByQlwcDG2zoq7A4xn1778V4BUVRYrFOU9+Axtw0lzUPhs4+cBQRl9tFACPh6Ag3LyJyEj06IHTpzF2LJYsQVGRtq9cWlr6wQcfPPfccz///PODBw+sra2lUqlMJvP19T1z5szSpUu1LQCAmRnmz8e1axAKERICPh/x8ZgyBWPG4McfUV+vevy1a1izBlFRquUJCVizBjk5CApCdjY0t0eyvmNt7WdvP08ub8jP/5BtLerhpI0+na1RA9mLqbSUhIcTU1MCECsrEh6umjulIUpKSpYuXeru7s78cmxtbZmEeRcXl2XLlsnUjvbqhMJCEhlJ+vRpHjbt0YMIBCRHKaVn9+7mqri4Fm+MiiIAOXdOt3L1Az2PNc2aNQvA/v37VcpXrlx5/vx5ViR1nwULFgDYsWOHciEzy+D+/ftsqdI2BmKjDDdvkunTFdGnnNhYDfpaaWnp0qVLXV1dGQNlYvE8Ho/H440cOTIxMVFTF+oOYjGJjSW+vs2/Az6f+PuTuDgilzfbqIUFGTCgRQTpabZRQohIFKW3sSY/Pz8AF1pOS4mLiwPA5/NXr16tt/sJtgOzWK3KKgGcnGWgjEHZKENiInn+ebmd3cCePYcPH3706NFuno8xUGYvQwBGRkZMLB6AnZ1dcHBwXV2dRoRrkCtXyLx5zZOUAPLLL802Gh5OALJ69eMjn3Iblculj+Y1rWdbiypDhgwBkJmZqVwokUgiIyNNTEwAjBgxQksz3LTHuHHjACQnJytKODnLQAUDtFFCiEyWGhurWDxp2rRpGRkZXThNWVnZ0qVLmbtZpRcPYOjQob/99pvGtWuQoiLy+efE05NUVzfb6LFj5LXXiLExUXz7nnIbJYTU1Fx6NK8pm20thBAiFosTExM3bNjA7Lqam5vb+pgrV64MHjwYgLm5eWRkJIujSZ2F2SFRuf/OyVkGKhimjRJCCBGLxVFRUczmwHw+PyQkpLCwsIPvraqqWrlypbKBWlpaWllZ8Xg8AGZmZi+88EJBQYFW9WsWhY1mZxNzc+Lr25zOS22UEJKbGyIU4u7dQLYEVFVVnTp1KiIiwt/f38LCgrnleDyesbFxv379zp492/ot9fX1AoGAOfKll1568OCB7mV3AaYbp9x/5+QsAxUM2EYZSktLw8PDmQe7lZVVeHh4dbtLnFRXV3/44YfKBqrciwfQv3//9evXy9WuR6XHKGyUEBIRQQDy00+EUBslhLAUa8rKytq5c+eiRYuGDRvGU8qt5vP5w4cPX7Jkybfffuvj48OUrFq1qlHdRIvjx4+7uLgA6NGjx6+//qoz8V1Dbf+dk7MMVDB4G2XIysoKCgpibtNnnnkmOjpa2mruU3V19fLlywcObLEdpnIv3sjIyNvbO81A1kZRQdlGGxrIwIHE3p6UlVEbbUYk2pKbu0AiEWnvEhKJRCgURkVFBQUFqST3WFpa+vr6CgSC2NhY5WXAlEdChw8ffu3aNXXKRYGBgcx5goKCmG1p9JPKyg8w0SoAAA4oSURBVMpPPvnk448/Vi7ctm0bgHfffZctVTqAIzbKkJSUNGHCBOaGGzZsmCJcWF1dvWLFChUDVe7FA3B0dAwNDTXE2CiDso0SQk6cIABZtozaaDtooMMhkZTeu3c8PDzcz8/PvOXEPBcXl1mzZm3atOnKlSvtL7yQnJzMdI/MzMzaGgmNiYmxtrZmekvnDOrPyWw4uG7dOraFaBFO2SghRC6Xx8bGKhI/J02aFBoa6uHhoXx/q/TiAXh4eBw5oo+phR1HxUYJIa+9RoyMmtd6NajvnXaRSqvy8lakpw/5+2+zlBTzjIwR+fn/6pSlNjU9LC+PzcsT3Lw5TijkX7pko1jw0N3dPSQkJDo6Oj09vVPjQsxIKPNQnzx5cl5eXutjsrOzfX19mUFVgUCgdhBAD3nvvfcAbN26lW0hWoRrNsrARJ+s1W2iZ2pqamxsrPivlZXVK6+8os8dpQ7S2kbv3ydWVs1TFqiNPkKemfmiUIjbt18uLPy6sPDL7OzgzEy/J7xHLq6pSSgq+u7u3RnXr/cWCqH4SUmxzMr6x6ZNXx49erSioqKb4v766y/FSOju3btbH2CI6VCvv/461M0y4BLctFGGtWvXMo9uxjEtLCysWq7J4erqupEraxe3tlHyaJVXaqMK6ur+FgqRk/PmE4+USquqqk49fBhx+7Z/SoqFsnWmpjrfuRNQWBhZU3NRLtdwk7C4uHjGjBmKkdCysrLWx1y+fNkg0qFkMtmhQ4eYxfA///xztuVoES7b6IYNGwAsXLhw9uzZPXv2VO7FGxkZ+fr65uTo+wLpHUetjUokxMuL2uhjysr2CIUoKlL/7FTprStbZ1qae25uSElJdH19ukYGVdtHeSRUbTpUXV2dIh3K398/X3ndXj1ALBbv2LFDMZjG9P/efvvt9rNoDBcu2+jmzZsBCAQCQsiPP/6o3A6dOXNm61C+QVNTQ7KzSev5VuXlJDtb39a/Zo3q6vNCITIyRojFqmmY9fVpKr31zEzfvDxBeXmsRPKELba0QU5ODjNbtJ2R0GPHjjk7OwOws7PTk3SompqaqKgoZhI9gAEDBmzatOn7779nOoKurq4X1O7JZuBw2UZ/+OEHAO+99x4hZO/evczflXnIx6ks4EF5WpBlZU0UCpGSYp2bG1pdfU6paSnLyBienT1bJNpUW3tFLtfMVi7doYPpUNOnT1cMArA4yl9cXBwREaHYQtnLy0t5e7GbN2+OHTuW6QiGh4cbbkqMWrhso0wLdPHixYSQw4cPW1paBgcHMyl4Bw8eZFsdhR1ksoaioo0ZGV5MqzM9fWhNjV7vMX3lyhUmHaqdkdCYmBimuTdgwADdrw6Vm5srEAgsLS0ZA/X19Y2Li2udqCCRSCIiIpisBm9vb5XFBAwaLtvozp07ASxYsIAQkpOTw/R6goODAezbt49tdRSWqa+/cf/+UqGQd+2aHSvd9o6jnA7V1sTQ7OxsJmmaGQTQTXMvNTU1JCSEGfrk8XgBAQEJCU94JiUmJjIZ3BYWFlFRUQY3XVAtXLZRZj/CefPmKRfOnTsXgJ6vOULRGQ8ffiIUorT0F7aFPJmOpEMpmnteXl6pqanaE3Px4sWAgADG2U1MTEJCQjq+PFBVVVVYWBjTdH355ZcfPnyoPZ26gcs2yoyHzpkzR7lw/vz5AH75xQC+NhQdUFb2m1AIkWgT20I6REcmhmo1HUomk8XFxY0fP14RaRAIBGonCzyRAwcO2NvbA3B0dDx06JAGReoeLtsosybCrFmzlAvfeecdAD///DNbqigsIpEUt4zRy+7cCRAKUVV1gjVNneeJE0Orq6sVzT1NpUOJxeKYmBhPT0/mtI6OjhEREWrTWjtOUVHRtGnTmBOGhITUaGdLCx3AZRs9fPgwgBkzZigXLlmyBMD27dvZUkVhkcLCr4VCfmamX17esrw8QUbGcKEQt29P0UEqqGbJycl54sTQgwcPMqnvdnZ23RnFYnKYmIVEmaSlqKgoTa1lLpfLo6OjmfCUq6vr//73P42cVsdw2UaPHj0K4NVXX1Uu/OCDDwB8//33bKmisEhT08Oiog23b7+cnj4kNbVPZuYLxcU/yOUGmXzTkYmhIpEoICCgy+lQIpEoIiKiZ8+ezBlGjhwZExMjkWg+FSw9PX306NFMon7Whg1EC5fQKly20RMnTjBj2MqFK1asALBpk2GMhVEo7fPEdfLlcvnWrVuZtaKDg4M7eNqcnByBQKBYYbqtHCYN0tTUFBER8YmPD+HxiI8PMah0KC7b6NmzZwFMmjRJufD//u//AHz77bdsqaJQNEtH1sm/devWxIkTb9++/cSzXb9+XZHDxOfzAwICdLmfo+zCBTJgQPMGwNu3EwNJh9Ljfeq7DXMrSKXSJxZSKIaLhYXF5s2bmXXyz5w5M2LEiN9++03lGA8Pj/PnzzPt1ra4dOnS9OnTx4wZs3v3bmZXnvT09CNHjjz//PPalN8C/osv4sYNhIWhrg7vvoupU1FQoLOrdxnu26hEIlEuZMaSVAopFEPnlVdeuX79emBgYFVV1bx584KDg5ktPZ6IXC4/cuTI+PHjX3jhhfj4eCsrK4FAkJ2dvWvXLkVcXqfY2iI6Gvv3w94eJ05g9Gj8+ScLMjoDl22UcUyVhieTnExboxTu0bt37z///JNJh9q/f//o0aPPnz/fzvFNTU27du0aPnx4YGDglStXmBym+/fvb968WRGXZ43Zs5GejldfRUkJZs7E/PmorWVZUttw2UbV9t/VeiuFwhnmz5+fmprq6+ubl5c3efLk5cuXi8VilWNqamo2b97s7u4eGhqamZnp5uYWFRV1//799evXK9YWYR9nZ8THIzoalpbYvRsjR+LSJbY1qYfLNqq2/07HRimcx93d/fz585GRkcbGxlu2bPH29k5KSmKqiouL169fP2DAgBUrVjx8+HDUqFExMTG3b99evny5Ii6vR/B4CAvD1asYMwa5uZg0CWvWQA9H5NiOcWmR7KysIa6ur/v7Kxf+ER3tP3bsDxERLImiUHSHYmIoj8ebOHHi8OHDmbYFdJLDpEmamkhEBDEyIgDx8SFZWWwLagGXbZRkZxOAuLu3KPz+ewKQDz5gSROFolNqamp8fHyUW07Tpk1LSkpiW1eXSEgg7u4EIBYWJCpKf9KhuNypB7N1nUr/nSnUw34BhaIFrK2tk5OT16xZM2LECE9Pz5MnT8bHxyvWFjEwJkzA33/jrbfQ0IAVKzBz5uNvt1SK2FgsWoRp0zBzJlavRnJyi/devowlS3D1quo5CwqwZAn++KM7up4CG1VxTKZTQ8dGKU8TX3/9dVpa2s2bN6dMmcK2lu5hZ4dff8X+/ejVC336NH/H8/IwdizmzMHJk5DLUVmJbdvw3HNYvPjx1//2bfz4I+7eVT1heTl+/BGPxo67hvGTDzFc1Dqm2iYqhUIxIGbPxvjxYCb7i8UICEBmJn7+GW+/DWYn4MpKvPsufv4Z9vaIjNS2nKegNarimIy30k49hWLQ9O0LZr/0X39FWho+/hjvvINHu6nDzg67d2P4cGzahKIibWvhtI2qdUzaGqVQuMTBgwDwaH3Vx5iYYMkSNDUhPl7bEjjdqW8nxERtlELhBjduwMoKffqoqWJSFG7ceFySna0aesrJ6b4ETtuo2tYo7dRTKFyislK9hwKwtwcA5bUF1q7F2rUal8BpGzUyAo8HmQyEPB40oa1RCoVLmJujqkp9VWMjACjPzvr0U6jkKuTkYN68bkrgtI0CMDaGRAKpFI8mb9C8UQqFU/Tvj2vXIBbDzEy1KjsbAAYMeFwyeDBU1v2zsem+BE6HmKCuC0/zRikULjFxIgjBuXNqqo4caT5Ay3DdRlt34WmnnkLhEmFhMDHBmjWqK+lduYLduzFuHPz8tC3h6bDR1q1R2qmnULiBhwciI5Gaiueewy+/4No1JCbi00/h7w9ra/y//6cDCVwfG23dhaetUQqFY6xciT59sHYtFi5sLjEywssvY+NGeHjo4Po8QogOLsMaffvi4UPk5z9OiUhNxejRGDkSqamsKqNQKJomNxf5+TA1xdChsLNrUdXUhJoa2NjA1LRFuUyGykpYWMDSssuXfTpaozTERKE8Dbi5wc1NfZWpaXMaqQpGRurLO8PTMTZKO/UUCkVrcN1G22qN0hAThULREFy3UdoapVAoWobrIaa//wYAT8/H48cyGe7fh6kpWN9ClkKhcAKu2yhDfT2EQohEsLKClxf69WNbEIVC4Q5cj9SLxfjXv/DDD82LFDD84x/44Qd4erIni0KhcAdOj43K5Xj9dWzciMBAXLqEggKkp+PLL3H1Kvz8cOcO2/ooFAoX4HSn/pdfsHAhwsIQHd2i/ORJvPIKXnoJp06xpIxCoXAHTtuory+uXkVBARwcVKumTsWJE7h7F+7ubCijUCjcgbud+qYmCIUYNEiNhwJ4+WUQ0s1dVSkUCgVcttHSUjQ1tVixVRkmWF9YqEtFFAqFk3DXRuVy4FGyfWtkssf/UigUSjfgro3a24PPx4MH6mvv3weA3r11qYhCoXAS7tqohQU8PJCd3SJjVEFaGgCMHatjURQKhXtw10YBvPkmamuxY4dq+f37OHgQI0Zg1Cg2ZFEoFE7BaRtdvhxubvjoI+zc+XgYNDUV06ahoQGbNrEqjkKhcARO540CyMnBa6/hxg307Al3d1RWIjsbtrbYvh1vvsm2OAqFwgW4bqMAZDIcP44LFyASwdYWI0fi9dfVJ5NSKBRK53kKbJRCoVC0CafHRikUCkX7UBulUCiUbkFtlEKhULoFtVEKhULpFtRGKRQKpVtQG6VQKJRu8f8B0+CYTc5htJQAAAGIelRYdHJka2l0UEtMIHJka2l0IDIwMjIuMDkuNQAAeJx7v2/tPQYg4GWAACYgFgJiESBuYORg0ADSzExsEJqFgyEDRDMysimYgBQzsrBDZJhhKgQYFEASbAwJIIqJzQEq7AA1CMpncwAbxMwIE0CYjJBBU8IOoZm5GRg1mBiZFJgZWRiZWRhYWDWYWNkY2NgZWDkUODgTOLkSuLgzmLh5Enh4FXj4Mpj4+BP4BTKYBAQT2JkZBDmALK4EEWagUazMLKxs7KycHIICXGzcPHz8Alzij4DijNCwYBCK27HwwJpgoQMgjmjPogMesZ77QWyRtjMHBL8V7gWxZ54uOjCpMGsPiJ2VL3Ug5YQkWM2H+9v2G/5gtwexb9zhOWA99QWYbZSRdUAufy+YLeLYZqt39YwdiL2kOsx+7fEksF5nLWGHkmoeMHvmqUQH5fQtYLZ2+hwHkZluYHbY9sMOn0svgtn3ZixwePfOHGxm2sQihwcKp8Dslb3SDv08RWD2l6Q2+2miR8BsMQD5/1nSu2/juQAAAeF6VFh0TU9MIHJka2l0IDIwMjIuMDkuNQAAeJx9VMuO2zAMvPsr9AMRSIkP8dDDJtkuimJtoEn7D733/1FSaVZaQKgTEQ41HpHDcbYU14/r999/0sdVrtuWEvzna2bpVwWA7T3FTTq/vn3b0+X+cn5mLsfP/X5L2FKB1D+fsS/34/2ZwXSkE2fgqizpVHJpUlkTZOjXeLakS0eqiadPkLUJN1gga3BKroZVNJDABloXSArOmtUc+EAiVbAFktMe1RFW4H66F6FIC6QEJ2YqTTsTV8SKC6CmW28YwNQS5oYKvGJsjypFiqArlknNlkBzoGeFuW+Tb8oK5yw3T7eqrVA00xo3XrWN6JReoxlxRzI1XSNjPjVz8eKwCwBWbTVIrI4kP9OJOqcq63I8ru+RJCOKyx6cUoX8oAWSndPNgU0ktFZsZCtvoPQy1QC9dcyshqWtgNo7dztwCUYXgGEpZnNrYAayZhyMFW1podf9+sn6j5fhfOzX8TKUWMPxkajD1uiGdZm/vn3B4eBI8rBp8SXDi+hLh+PQVxu+il0b7iFfOLuEIiBObqAIWKapUwSs03SxB5qmSD3D07QoAso0FoqAOulPEbBNQqPXPHijYE9+NFDg3yM4Sz4LHL+f/1R+v/0FocLuxCYz9rAAAAEHelRYdFNNSUxFUyByZGtpdCAyMDIyLjA5LjUAAHicHY+rTgRBEEV/Bbmb9Hbq/WCyCckYFAiCIqjxGARmP57bY0+dW3Xr/b5f3q9f+8vrN+8f++WQn0OP4wA8Dv2V6/2Nnx6Xm09yzRg3mVKhMbaFskFoZoUXSExt1lyIvCllaTqzNeqEbEqtJ4yQ4MHTsttBsJeoC6Q4aRHMpJIHTVdm5bFhQ7i7FpgRUdjYeJJ1NXKwGpJMHHGB0uZOUFYDYllKot/Y0JwrAkpyWY/NZlV6n7+kZ8KJyRyQUQPvGq81Lii7iFHbuiSNGyvldubRsDRLbLEqr8XwmjEqLRZEyTau4+/zWac8/gEoTE9xeEFbaAAAAABJRU5ErkJggg==", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Murcko scaffold:\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO2de1RUVf//PzMMtwFULooiIAKieckUzIxSU8s0bK2+BZY64OUrv7IczVRQq0EzxUIZ9ZtFT2oj6uMzdh21LNRUKlOG4Y5cBBRB7ndmgLnt3x8HjzzojHM9Z2bYr8VyuXCffd4znnnPPvtzOQyEEGAwGAzGUJh0C8BgMBjrBtsoBoPBGAW2UQwGgzEKbKMYDAZjFNhGMRiM4chkMplMRrcKmsE2isFgDKGzs/PVV18dNGjQoEGD5s6d29raSrci2sA2isFg9ObAgQMeHh5nzpxRqVQqlerSpUtDhw7du3cv3broAdsoBoPRg6KiooULF65bt06hUDg4OCQkJCQnJzs7OyuVyo0bN77wwgs5OTl0a6QchMFgMDrQ3NzM5XJZLBYAuLu7b9mypauri/gnuVy+a9euYcOGAQCTyeRwOHV1dfSqpRIGwlVMAACAkKKt7Ux7+6WenlK1upPFGubg4O/qOnPw4AVMJptudZj+3LwJHA4AAIsFP/wAPj6PGFNTA4sWAQBs2ABLllAqz8ZQKpVHjhz58MMPGxoaWCzWypUrd+7cOXTo0H7DWltbExMTk5OT5XL5kCFD4uPj169f7+joSItmSqHbxy0CqTQrPz9ELIaHfyQStlQqoVsgpj8ZGQig9ycq6tFjbt/uHbBvH7XibIuLFy9OmjSJsIs5c+bk5ORoH19cXBwREUGMDwkJOXPmDDU6aQTvjYJCUV1a+lJ3dwkAODk94e39vq/v58OHb3F3f8PObhAA09l5PN0aMdoQCuHcObpF2CK3bt2KioqaO3duXl5ecHCwUCi8ePHik08+qf0owjrT0tImTJhQUlKyaNGiF198saCggBrN9EC3j9PPnTtriIVnTU0iQqq+/6RWd0ulGXQJw2iBXI0OHowAUGAgksn6j8GrUYPp7Ozk8XhOTk4A4OLiwuPxyG1Q3ZHL5Xw+f/DgwQBgb2/P5XJbW1vNoZZ2sI2i3NwAsRhyc33pFoLRA9JGd+5ETCYCQNu29R+DbdQA1Gq1QCAYPnw4ADAYDA6HU1NTY8yEjY2NXC7Xzs4OADw9Pfl8vlKpNJVaCwHf1INCUQMADg7+dAvBGMKkSb2xps8/h8JCutVYORkZGeHh4TExMbW1tdOmTfvrr7+OHTtGWKrBeHp67t+/PyMjY+bMmU1NTevXrw8LC7t69aqpNFsC2EaBxfIEgK6uApVq4JZhWDV79sCgQSCXw9tvA048MYzq6uro6Ojp06dfu3Zt5MiRAoHg+vXrM2bMMNX8U6ZMuXLlikgkGj16dHZ29qxZsxYtWlRRUWGq+ekF2yi4uEwHAJWq7datV+XySrrlYPTG2xu2bgUASE8HgYBuNdZGV1fXnj17xo0bl5qa6uTkFBcXd/PmzejoaAaDocvhetWALlq0qKCgIDEx0c3N7ezZsxMmTIiPj+/o6DBUu8VA964C/UilGZmZjkSUKTPTvrQ0orlZqFZ3060Low1yb/TnnxFCqLsbhYQgAOTlhRobe8fgvdHHIhKJAgICCCuIiIgoLy/X/diqqioOh+Pr69vR0aHveaurqzkcDuHUPj4+KSkpKpXq8YdZKthGEUKore3XnJzhfdNFs7O97t59v7u7hG5pmEfTz0YRQr//3vubVat6f4NtVAsSiWTmzJmEgRJ33LofK5VKExIS2Gw2ALDZ7IsXLxqm4caNG+S+AbEVa9g8tINttBeVqqu5+XRp6aLMTPs+ufdOTU2pdEvDPIKHbRQhFBWFABCDgf78EyFsoxowMnTebwFbUVFhjJh+iQGRkZGVlZXGTEgLA9RGpVKJSvVQniFCCCGForG+/svCwjDyNl8qFVMsD/NYHmmj9+71ppFOmYJUKmyj/TEykTMzM/P5558nF7BXr141lTCTpKnSyMCyUZksv7qaR9R9trT8oHWsuqpqC+Gk5eVvUqQPozOPtFGEUFJS7+8PH+5vowUFKDAQcbkoPR2p1bSoppO0tLTx43vr8ebNm5efn6/7sdTkft65c4dDJK8B+Pn5CQQCk5/CTAwEG1V1dFyprOTm5vqSd+s5OT6NjY/5T1KrFTk5I8ViyMsLpEYoRnc02ahCgSZORABo+HCUn/9fNrp794My/KAgtHkzun59QPhpcXHxK6+8QthTSEjI2bNndT+W+kqkS5cukfWmRNs9s57OJNisjarVyo6O9MpKbk7OCNI9c3P9Kyu5bW1parVCl0mKisLFYpBIXM2tFqMvmmwUIZSejhgMBICWL/8vG1UqUXo64nKRj88DP/XzQ7GxSCRCCp2uCCujpaUlLi7OwcEBAIYMGZKYmNjT06P74f0WsAUFBeaT2heVSiUQCIgOUkTbvfr6empObRi2ZqN93NO7j3sGVFZyOzrSEdJr7aHKzvYUiyEvb7S55GIMRYuNIoRiYhBAb5How3ujKhUSixGPh4KDH/iplxficJBIhORyal6BeSGcyOAGoERvZsJAx44de+7cOfNJ1URzczP5HeDu7q7vdwCV2IiNqlRdra2iigpOVtZg0j3z8gLvu6c2Wlp+Uige8V1XW7uPmOf27VUP/yuGXrTbaF0dcnd/YJFaQkz5+YjHQ2PHPhjs4dHrp91Wmzrc97549uzZ2dnZuh9raeZlCYb+WKzbRlUq2X33dCPdMz9/fHU1r6tLpxsQpbJNInHKzLQvKXm5ri65peWn9vYLDQ2HS0peJPuNdneXmvuFYPRFu40ihL74QicbJSH8NDT0wVFsNoqIQAIB0j+7nDYqKysNjtI8fCttOR3s6dpe0BGrtFGlsqW5WVhRwZFIXB5yz2K9purs/Cc72+uRDZvFYsjOHtrWlmamV4ExhsfaqEqFnn7akISnsjLE56Pw8N4NVgDk7Nzrp21tJtFuFvrmDLHZbH1zhiw/sGPJbfesyUYViqbGRkFpaURmpsN9p2MWFYXX1CR2d98yeFq1WtHa+svt2/9bUDA5O9tDInHKzh5WXDyrpiZRoWh8/PEYOqipQYmJKDERlWq+VcjP7x1z44Yhp7h9u7+fOjmhiAiUkoIsKuChVquFQqG/vz+ZwX7nzh3dDzdmAUs9ltl2zwpstKHhXn39/xUXv5CZaXc/JZ5VUvJiff1XcnkttVpUajX9/2cYiqmsRCkpKCICsVi9fmpnh8LDEZ+PjGvFaQIyMjKeffZZwgTDwsL0qqe03qT3voUAU6dONWEhgGFYro1WVlby+fx58+Y5Ojr+8YenWAyZmXZFReF1dXzK3RMhhBSKupKSl+/d20H9qTF9SUhAly/Tc+qGBiQQoIgIZG/f66cvvPBreHg4n8+vrq6mWEx1dXVsbCyTyTSguwexgPXz8zNsAWshmLYs1RgszkZLSkp2794dFhZG9ulydnY+fZrb1HRCqaRza6q9/aJYzMzMZHV0/EmjjAHOTz8hBgOx2TTfVjc2osOH0cKFaMyYBcRVymQyn3vuueTkZAr8qKenh8/nu7m5AYCDgwOXy21vb9f9cJtpCIIQkkqliYmJrq6uhFHExcUZ0G7KeCzFRsvKyvh8fnh4ONnBj81mR0RECAQCvS4Rs0KUh+bm+ikUTXRrGYhUVSFPTwSA9u+nW8p9pFKpSCTicDjEJ5lg/PjxPB6vqKjIHGcUiUSBgYHkEqysrEz3Y22sPR0J0bKPeF1Ez2k1tdVpNNtofn4+j8cjUxmIWovIyEiBQNDZ2UmvtodRqxU3b84Qi6Gs7HW6tQw4VCo0Zw4CQAsWWGIFp0wmI/x00KBB/fxULDZNa5vCwsL58+cTMz/xxBPnz5/XSx7RLJlctVnO6sRUXL9+/ZlnniHen6effvratWuUnZoeGyXcMyQkhLzgPDw8OByOSCSy2EIFgp6eO1lZ7mIxNDR8TbeWgcX27QgAeXujWho2xvWgq6tLJBLFxsYSCZgEgYGBXC43PT3dsFVSU1MTGZ728PAwoLXd6NGjyQWsXr2ZrQsi9dWEz+PTEepsVKVSpaenx8XFBQUFkZeXl5cX4Z5y6ynBa24WEq1IZbJcurUMFK5fR/b2iMlEadaTxatUKtPT07lcbt9Hwo0aNUovP5XL5SkpKV5eXkSyZGxsbENDg+4a+vZmfuqpp/TqzWy9EBkIjo6OZAZCt5kr0sxuo+TFNGLECPJi8vf3Jy4mK92aqahYIRZDQcFETU1LMSakpQUFBCAAtHWrtmEiETp40BLv98mPgI+PD/kR8PPzi42NFYlECs09UdLS0iZMmECMnzdvXl5enu4ntcz8SiopLS2NjIwk3r3g4GChUGi+c5nLRslLx9vbm7x0AgICjLm1sRxUqs78/HFiMVRWrqVbi+3z5psIAE2bpq1pCBl9+kF7F1laUalUYrGYx+MFBweTHwpPT8+Hb8hKSkpICxgzZoxeFmDJ1T7Uc+HChUmTJhHv5MSJE3/88UdznMXENtrU1LRp06bIyMi+G+0TJ07k8XgWWF5mDDJZrkTiJBYzWlp+oluLLZOSggDQ4MFIy4aeSoXmzkUA6OWXLXE1+jBqtfrGjRtxcXH9/HTlypVCoXDjxo3EDamrq6u+N6T9FrB69Wa2VRQKRUpKioeHBxmdu3nzpmlPYUobVSqVZFkuGaYsLCw04SksCqIFVFaWe0+P9aUuWwUFBYjNRgDoxAltwz75pDf6RHtNkQEQ4dbQ0FDiU0Nk7djZ2a1evVqvziD9ejOfOXPGfJqtkZKSEvILZtSoUabd4jCljVZW9j7kffny5TYcDeyDurR0kVgMRUUzcZGoyenuRk89hQDQypXahpHRp99/p0qZebh58yaPxwMAFouVlZWl+4EP92Y2d0TFevnuu+8IjzLtg/NMb6MMBsOEc1o4CkU90V3/3r2ddGuxNd57DwGg4GCkJcGxtRWNHo0AUHw8hcrMhkwmAwAnJycdxxvZm3lgQqz3TWujTMAYAYs1dPTokwyGXU1NQmfn33TLsR1++QW++AIcHUEoBDc3jcPeeQcqKiAsDLZvp1CcZXD58uWpU6fGxMTU19fPnj1bIpEcO3aMsFQMxZjSRp2dnck/Bw5ubrO9vTcipKyoeFOpbKZbji1QXQ0xMYAQJCbClCkah33zDfz73+DqCidOgIMDhfro5u7du9HR0URXUKK13aVLlyZPnky3LuvAHDZlShu1t7cHABaLZcI5rQIfn50uLjPk8ruVlf+Pbi1Wj1oNMTHQ2Agvvwzr1mkcVloKGzYAAHz1FfSphrNxpFLptm3bQkJCUlNTXVxcPv3005KSkujoaLKPD+axEAZFmJWpwDf1JoDBYI0efdzOblBLy3eNjYfplmPd7NoFFy+CtzccPQqazKGnBxYvho4OWL4cli6lVh+tyOXyr7/+uqenJzIysrCwcOvWrUS3UAxJd3d3dnZ2cXExlScdcCtHM+HoGDhq1L/KyxffvbvO1fVZJ6cn6FZkldy4ATt2AJMJqanQp4SyP5s3Q1YWBAfDgQMUirMA3N3dv/76ax8fn+nTp9OtxUIpKyubMmXK+PHjCwoKKDspXo2aDHf3KE/PGLVaWl4epVZ30S3H+mhrgzffBIUCNm+GF1/UOOzXX+HgQbC3h+PHtUWfbJXXXnsNe6ilgW3UlPj7f+HkNLarK7+6egvdWqwPXcLudXWwYkVv9AmbCcZCwDZqSphMl9GjTzIYDvX1B1pbRXTLsSZ0Cbur1bBsGdTVwfz58P771OrDYDSDbdTEsNlTR47cBYDu3FmlUNyjW451oGPYPTERLlyAYcPg2281Rp8wGOrBNmp6vL03DBr0slLZWFLyvlqtpluOpdPd3bNkifyxYfeMDEhIAAYDjhzRFn3CYKgH26g5YAQEfNvauuCFF87t2bOHbjGWTlzcZpnsudmzbx08qHFMWxssXgwKBWzaBPf7b2AwlgK2UbNgb++tUq1vaur6+OOPr127Rrccy+XXX389ePBgaWl2YmJTn4fC9WfNGqiogNBQ+OQTCsVhMLqBbdRcvPTSSx988IFSqVy6dGlbWxvdciyRurq6FStWIIQSExO1JPGkpkpPngQ3Nzh1amAVfWKsBWyjZuTTTz995plnKioqVq9eTbcWi0OtVi9btqyurm7+/Pnva467FxcXv/9+wMyZhw4dgj49jjEYCwLbqBmxt7c/fvz4oEGDTp8+ffToUbrlWBaJiYkXLlwYNmzYt99+q6kkvKen56233mpqagwIuL5sGcUCMRhdwTZqXoKCgv71r38BwNq1a4uKiuiWYylkZGQkJCQwmczjx48P1xx3j4+Pz8rKCgoKOqgl/ITB0A22UbMTFRXF4XCkUmlUVFR3dzfdcuinra1t8eLFCoVi48aNL2qu+jx//vz+/fvt7e1PnDjR99FeGIylgW2UCg4dOjR27Ni8vLytW7fSrYV+1qxZU1FRERoa+onmuDsZfdq1axcuIcdYONhGqcDV1fXEiRMODg58Pv/MmTN0y6GTI0eOnDx50tXV9eTJkw4a4u5qtZrD4dTW1r700ksbiPImDMaCwTZKEaGhoTt37kQIrVq16t69AVokeuvWrfXr1wPAl19+GaK56vOzzz5LS0sjok9MJr5EMZYOvkapY+PGja+88kpDQ8PSpUtVKhXdcqimp6cnKiqqo6MjJiZmmea4u1gs5vF4DAbj8OHDI0aMoFIhBmMY2Eapg8FgHDlyZPjw4ZcvX05KSqJbDtWQYfcDmpstd3Z2Ll26VC6Xf/DBBxEREVTKw2AMBtsopZBpkh9++OGAKhLVMez+9ttvl5SUhIaGfvrpp1TKw2CMAdso1cyfP3/Dhg1KpXLZsmXt7e10y6ECMuy+e/duLWH3b7/99sSJEy4uLkQ4jkqFGIwxYBulAcJNysvLB0KRKBl2J74/NA27desWl8uF+8lhFArEYIwF2ygNkPe2QqFQIBDQLce89A27ayr6VCgUy5Yt6+joiIqKio6OplghBmMk2EbpgYy0vPvuuxQ/DJZKyLA7EVvTNCw+Pv769euBgYFE4SwGY11gG6UNIu9HKpUuWbJELpfTLcf0kGF3ItNL07DffvstOTmZxWLhok+MlYJtlE6ILHSJRGKTRaJk2H3nzp2axtTX1y9fvhwhRDQVpFIeBmMqsI3SCVkkum/fvrNnz9Itx5QQYXfyBT5yDNFytLa2dvbs2Rs3bqRYIQZjKrCN0kxYWNiOHTsQQitXrqypqaFbjmnQMeyelJSUlpY2dOjQkydP4qJPjPWCr12zc+rUKaVSqWXApk2bZs2a1djY6OPjw7AJxowZ09HRsXjxYg6Ho+lVK5XKf//73wwGQyAQ4KJPjFWDbdS8HD169K233lq4cCHdQiwOFov1999/nz59esGCBXRrwWCMAtuoGbl169a6desAQHsuZFJS0pUrV7y8vO7du4dsgtLSUjc3t//85z/Hjh3T8sKdnZ1ff/11E7/pGAzlYBs1Fz09PYsXLyZSyrU0NMrMzPzoo48YDMaRI0ds5t42ODiYyIpds2aNDWfFYjAE2EbNxZYtWyQSCfkspkfS2dlJJI1u2LDBxhoaLV++fOnSpVKplEgdpVsOBmNGsI2ahfPnz/P5fPLJoJqGvfPOOyUlJVOnTt21axeV8qjhq6++CgkJyczM3LZtG91aMBgzgm3U9NTX1xMNjbSnlB87duz48eMuLi5aHqdh1ZBJo3v37rWxrFgMpi/YRk0MmVL+0ksvffDBB5qGlZWVrV27FgC++OILG25oFBYWtn37doTQqlWrbCYrFoPpB7ZRE/P5558TKeVaniOkUCiWLl3a3t4eFRUVExNDsUKK2bx584svvkgUfarVarrlYDCmB9uoKRGLxR9//PFjw+5bt24dOA2NmExmamrq8OHDf//993379tEtB4MxPdhGTQbZ0Eh72J1wExaLpT36ZEt4e3sfPXqUwWAQ3x90y8FgTAy2UZOhS9idvLfduXPnjBkzqJRHLy+//PK6devI3Qy65WAwpgTbqGkQCASPDbuTkZaB2dAoMTFxypQpZGwNg7EZsI2agLKyMqKhkfawO5H3M3To0BMnTtjZ2VEo0CJwdHQUCoVubm5EphfdcjAYk4Ft1Fh0DLsTWegMBuPw4cM+Pj5UKrQcgoOD+Xw+3N8AoVsOBmMasI0ay5YtWx4bdiejT+vXr1+0aBGV8iyNlStXLlmyhKyCpVsOBmMCsI0axW+//aZL2J3o0DFp0iSbLPrUl5SUlDFjxhA9WejWgsGYAGyjhkM+R0h72F0oFKamprq4uAiFQicnJyoVWiZEkai9vf3nn39+7tw5uuVgMMaCbdRAEEIrV6587HOEysrKVq9eDQAHDx4cN24chQItmmnTpiUkJJDvId1yMBijwDZqIMnJSCqd4ePjc/LkSU1hd4VCsWzZsvb29sjIyBUrVlCs0MKJj4+fN28euaKnWw4GYzjYRg1BLIYtW5hXrmw7cqRES9Hntm3b/vnnH39//5SUFCrlWQVMJvP48ePe3t7Ec+rploPBGA62Ub3p7IRly0Auh/XrYf58F03D/vjjj71797JYrFOnTrm7u1Op0Fogi0Tj4+NxkSjGesE2qjdr1kBxMUyaBFqi7vX18NFHHgEBIdu3bx9QRZ/6smDBgrVr1/r6uimVu9TqTrrlYDCGgG1UP4RCSE0FFxcQCkFT1B0hWL4c/vpr8tixmfHx8dQKtD4+++wzkWiKk5OosvI9urVgMIaAbVQPyspg9WoAgIMHQUvUfd8++PVX8PKCb75ha2o5iiFxdHQcM+ZLOzu3piZBc/MJuuVgMHqDP+S6olDAsmXQ3g6RkaAl6i6RwNatwGDA4cMwUGs+9cbRcYyv7z4AuHPn7e5uXCSKsTKwjerKtm3wzz/g7w9aou5SKSxZAnI5cLnw6qsUirN+vLz+18PjLbW6s6JiKUK4SBRjTWAb1Ynff4e9e4HFglOnQEvU/d13obgYJk6E3bspFGcr+Pt/6eg4WiYT37vHo1sLBqMH2EYfT0MDLF8OajXs2AFaou5CIQgEvdEnZ2cK9dkKdnaDR48+xWDY19Z+1t6eRrccC+XHH3/EyWGWBrbRx4AQrFwJNTUwaxZs3qxxWHl5b/Rp/3544gnK1NkaLi5PjxjxMYD69m2OQoGLRPvT0tISGxs7Y8aMqKioO3fu0C3HEgkKCsrKyvrhhx+oPCm20cewbx+cPQteXnDyJGhqtaxU9kaf3ngDVq2iVp/NMWLEVje3uQpF3e3bKwBwkeh/4eDgEBsb6+joePr06QkTJuzatau7u5tuUZaFk5PTU089RfVDy5HpaGhoAAA2m23COeklMxM5OCAGA/38s7ZhcXEIAPn5oaYmqpTZNHJ5VXa2l1gMdXXJdGuhFJlMRhiB9mGVlZUcDofBYACAr6+vQCBQq9XUKLQB2Gw2ADQ0NJhwTlOuRru6usg/bQAdw+6XL0NSUm/0ycODQn22i739yIAAAQCjqipeJsuiW47F4efnd+zYsUuXLk2ePLmqqiomJuaFF17IycmhW5d1YA6bwjf1GtEl7N7QAEuWgEoFCQnw7LMUirN1Bg9eOGzYuwj1lJdHqVQddMuxRGbPni2RSAQCwbBhw65cuTJ16tTo6Oi6ujq6dQ1ETG+jCKFVq1ZVVFSYfGYq0SXsjhCsWgU1NTBzJuCaT5Pj65vEZj/V03Orqmo93VqooKioaM+ePQCgVCqzs7N1OYTJZEZHRxcXF8fFxdnb26empo4bN27Pnj09PT1mFmutfP/998gcXRlNuEGgVConT55Mzjx+/Hgej1dYWGjCU1BDWRkaNAgBoG++0TZs3z4EgNzd0Z07VCkbYHR1FUgkbLEYmppO0K3FXOTn5/N4vNDQUOJTQ+x42tnZrV69uq6uTvd5iouLIyIiiElCQkLOnDljPs3WSElJyYQJE4j3Z9SoUUql0oSTm9JGEUJNTU2bNm2Kiorq+2CiiRMn8ni8nJwc057LTCgUaMYMBIDeeEPbsNxc5OSEGAz0009UKRuQNDSkiMWQlTW4u7ucbi0mQ61W37hxIy4uLjg4mPyYeHp6rly58rvvvvvwww8dHR0BwNXVlcfjdXd36z5zWloaaRbz5s3Lz88336uwFhQKRUpKisf9wMUTTzxRXFxs2lOY2EZJlEpleno6l8v19vYmL5SAgAAul5uenm7JgcVDhxAACghALS0ax3R2onHjEABau5ZCZQOV8vI3xWK4eXOaWi2nW4tRqFQqsVjM4/H6uSeHwxGJRHL5g1dXUlISGRlJDBgzZoxQKNT9LHK5nM/nDx48GADs7e25XG6LlkvZ1rlw4cLEiRPJ9dyPP/5ojrOYy0ZJSD/t2yXe39+f8FOVSmVuAfqiVKIdO9Bff2kbs2IFAkATJyKZjCpZAxilsiU3N0AshqqqrXRrMQTyI+DTp1eNn59fbGysSCRSKBSaDkxLSyMtYO7cuXl5ebqftLGxkcvlEo+38fDw4PP5pr2NtXz6fhUFBwfr9VWkL2a3URKVSpWenh4XFxcUFEReTF5eXg9/FVs4QiECQE5OKDeXbikDhs7O65mZ9mIxs60tjW4tukK65/Dhw8kLftSoUXrdkMnl8pSUFC8vLwBgsVixsbF6JTxmZWXNnDmTvJk9f/68oa/Gmujs7OTxeMTGiIuLi74bIwZAnY32hdhWDwkJIS8vDw8Pwk97enpokaQj5eVo8GAEgL7+mm4pA4x797aLxZCT4y2X19KtRRtdXV0ikSg2Nnbo0KHk5R0YGGjMdlZTU5MxS0uRSDR69GhCSURERHm57ewy90OlUgkEAmIjkclkcjic2loqrhZ6bJSE8NPx48eTF9yQIUMiIyMFAkFnZye92h6GjD69/jrdUgYiquLiOWIxlJYuQMji9tZlMplIJOJwOH2Dq0SyilgsNskpCgsL58+fb9jSUiaTJSYmurm5AYCDgwOXy21vbzeJKsvh+vXrzzzzDPH+PP3009euXaPs1DTbKElZWRmfzw8PDycvQTabHRERIRAILO3N67EAAApdSURBVOf/e8sWXPRJJ3J5VXa2p1gMdXX76dbSi1QqJdzT1dW1n3sWFRWZ44wikSgwMJBcWpaVlel+bHV1dWxsLPFEBh8fn5SUFAsMThjA3bt3yerYkSNHUl8dayk2SlJSUrJ79+6wsDDiTQGAsLC1//M/6MQJ1NZGp7DiYmRnh1gs9OefdMoY4LS0/CgWg0TiJJNl0yhDoWhsaDickvKOg4MDcZUymcznnnsuOTn5jvmziHt6evh8ft+lZZs+n40bN248e7/kLiws7C/t4VTLRiqVJiYmEt9hbDY7Li6uo6ODehkWZ6Mkt2/f3rdvX3h4+KxZWQAIANnZofBwxOcjSrY7HsHPP6PkgdUrwxK5c2fNrVuvKRQ03BEoFA2NjYLS0ojMTHuxGM6eDbCzswsPD+fz+dXV1RSLMWZpqVarhUKhn58fADAYjMjISArc3+SIRKKAgAByYV5RUUGXEsu1UZKGBiQQoIgIZG+P+vnpvXt0i8NQjlqtMUPITPT0VNbV8YuKnheLmWIxiMWQmelQWrqgoeGbpiZTNgoygIyMjL5Lyz/1uVciItpOTk5kRLurq8t8Uk1IZmbm888/T7zqqVOnXr16lV49DGSOClPz0NwMZ8/C6dPw++8glwMAMJkwYwYsWgRvvAF90qh05fJlIPqIh4TAa689esyNG/DHHwAAa9aAm5vB2jHmRi2T5XZ1ZSuVLUwm28HB19l5soODrzEzyuV3Wlt/amk53dn5N9H5lMl0cnObN2TIoiFDXmOxhj52BmpACH333XcbN26srKxkMBhvvPFGUlKSv7+/joffvXt327ZtqampAODn57dz587o6Ghz6jWKxsbGTz755IsvvlCpVJ6enh999NF7771np6kTMGXQ6+KG0dKChELE4SAXl971KQAaPx7xeEivKi+iTygAcnBAmkr/P/usd8zduybRjjE9zc2n8vKCiHVinx/GzZtPNzd/p+9s3d1ldXX8oqJwsZhBTCWROJeWRjQ2CpRKWrfntSKVSsmlJZvN1ndpeenSpSeffJLwBKLtnvmkGgZRnUUkQhDVWa2trXSL6sUqbZREJkMiEeJwkJtbfz8tKHj84aSNAqCZM9Ejg3vYRi2c2tp9pHVmZ3sWFoYWFk7NzvYgflNff0jHeWSy/OpqXmFhKDmbRMIm3FOloiFqYRhER2fCDf38/AQCge7HEkmXw4YNg/tJl3r1RjEraWlpT9x/OM+8efMKdPl4U4h12yhJV1evnxK58cRPYCDiclF6usajCBtlMBCDgQDQIy85bKOWTHd3eWamA5GW39x8GqEHMRaZLKe6+sOensf8txHumZ8/to8Xe1RUcFpbRWq1eUtfzMcff/xB9lqbPXt2drYeWQ3Nzc1xcXFEBoK7u3tiYiK9FTFFRUULFy4kXsvYsWPPnTtHoxhN2IiNkiiVKD0dcbnI2/uBnwYE9Pppv/UmYaOOjui11xAA8vREDxfaYRu1ZGpqdhPe19ysV8W0SioVV1fz8vKC+7in1333tJq6ZC0YubTsa14hISFnz541n1RNWJqha8HWbJSE9NMRIx74qZ8f4nJRWhoi2kGQNnrzJmKxEABavbr/PNhGLZmKihWECfb0VD52sFqt7OhIr6zk5uT4kO6Zm+t3+3Zsa6uI+gQACmhpaSGdaMiQIfo6UVpaGllhSOWtNNHajqimJb4D6uvrqTm1YdisjZKoVOjKFcTlIl/fB3569ChCfWwUIbR2be8Nfr+MEWyjlszt27GEG3Z2/q19ZF1dMvGYPOInLy+oqmpzZ+d1C6wrNTnFxcWvvPKKYUvLh9vumTuwc/HiRQsPdj2M7dsoiVqNrl1DmzahMWMQ8d3W10abm9HQob3t7/p2m8I2asnU1iYRtlhUFK5Uavt419cfEoshLy+wspLb0ZE+ENyzH8Z0dO7bds/T09NMbfdKS0vJ1nb+/v56xcfoZQDZ6MP0tVGE0Fdf9Trmnj0PxmAbtWQUijqJxIVw0pycEdXVvO7uW48cqVQ2d3VZ3/NsTIuRS0uJREImvU+ZMuXKlSumEma9hQAE2EYf2KhKhaZPRwCIzUZkLzFsoxZOa+u5rCy3vumiRUUzGxuPqVS4pfajMXJp2a8E08i2e2q1WiAQEC1ZGQwGh8OpqakxZkJawDb6wEYRQhkZyM4OAaBFi3p/g23U8unpuVNVFZ+b69c3/T4726O5+T90S7NcJBIJ2dFZ36WlVCpNSEhgs9lEqv/FixcN03Djxo0ZM2YQGqZNm/b334/Z4LZYsI3+l40ihN55p9c3f/kFIWyj1oSqvf3i7dursrKG3DdTZmsrfkCmNozp6FxVVcXhcHx9fQ1oDUwcS3Rx8/Hxob61nWnBNtrfRltbe3Okxo1Dcjm2UetDpZLdvbuJcNL8/JC+OfmYh+nb0dnZ2TkuLk6vDr/6Pi/PyNNZJthG+9soQujbb3utMzkZ26i1Ulq6iHDSri7LKhy0TPouD83X+dhWH2fC1L2JycAhOhpmzwYA2LEDmptpFoMxjEGD5hB/kcur6FViFYwcOfLYsWPXr1+fMWNGdXV1TEzM9OnTr127Zqr5s7KyZs2a9eqrr1ZUVBBbsWfOnCEt1drBNvoIGAw4eBBYLGhpga+/plsNxiAUilriL0ymE71KrIhp06b99ddfROg8IyMjPDw8Ojq6trbWmDmbmprWrVs3bdq0q1evEokBGRkZZHTLNsA2+mgmToR16wAAr0YtGpWqHUD98O/V6u7W1p8BgMFgOTtPolyXFcNgMKKjo2/dukU8ozg1NTU4ODghIaG7u1vfqRQKxf79+4OCgg4cOMBkMrlcbllZ2bp16+hvD2py6N5VoBNNe6ME7e1o5MgH9aN4b9QCqax8Ly8vqLr6I6k0S6XqQgip1d3t7X/cvDmD2BgtL19Gt0Yrpm9ZUVBQkFCoR/8XY4qmrA5r6n5vcuLjYc8ecHQETV+0QiEsXtz797t3wdeoZuoYk6POzR1J3rwDgJ3dkL7rUzb7qTFjLrJYHjTJsxEuXbr0/vvv5+bmAsCcOXOSk5PJmvdHUlJSsmHDhnPnzgFASEjI3r17IyIiKNJKEwP6pn7kSAgNhalTNQ6IioLoaAgNhdBQuP8ISIzlwBw7Nt3HZ7uTU28XIpWqlfBQe3sfH5+EsWP/xh5qPHPmzMnMzCRaLl26dGnKlCnR0dENDQ0Pj2xtbY2Pj580adK5c+eIhlJ5eXk276EAMKBXoxibQals7O4uUas7GAwHBwd/R0f9n8yFeRwtLS0JCQmHDh1SKpXu7u4rVqz45JNPiFomhUKRlJTE5/Pr6+uZTObSpUuTkpKIbqcDAWyjGAxGD4qLizds2PDLL78AgKOj48cff+zs7Lx161YiBjV79mw+n0/23h8gYBvFYDB6c+DAgY0bNyoUCvI3LBYrMTHxgw8+oFEVXWAbxWAwhtDZ2blkyRJiWTpr1qzvv/9+yJAhdIuiB2yjGAzGcGQyGQAQO6QDFmyjGAwGYxQDOuEJg8FgjAfbKAaDwRgFtlEMBoMxCmyjGAwGYxT/H4QJ05uJ8slUAAABC3pUWHRyZGtpdFBLTCByZGtpdCAyMDIyLjA5LjUAAHice79v7T0GIOBlgAAmIOYDYgEgbmBkY0gAiTGxM2gAaWZmNgjNIsCgABKHSbM5QIUhNJzP5pAB1saIlwFRyw6hmbkZGBkYmTSYmJiBggxMrAqsbAls7AnsHBlMHJwJnFwZTFzcCdw8GUw8vAksQHezAlnsCSLMQM2sQI3MLKxsrLw87GwcnFzcPOzim4DijFC/MfBJL+k/wMDgsB/s49/OB86FzAazl/RyH7hg+3spkrg9iA1VD2b3ment37/h/RIQO+KPrx1M7/3IC/YwMxMCrB0YGA6A2S8t2x1g4lC2PZIaeyS99khmgtliAE2sQPZFJXOHAAABTnpUWHRNT0wgcmRraXQgMjAyMi4wOS41AAB4nIVT226DMAx95yv8A0S249we21JN01SQ1m7/sPf+v2YLQYK2ZQRbSTg5do7NAPa8T29fT9gfnoYBADtvKQU+PSION7AJnK8vrzNcHqfztnNZPubHHUiAop7RccSeHstt2yG4wChOck7Zw4guBeXWIw5xnWxIhhlG7zAylQwjOSYfOPyC9MbJjjIGRuPcAD+QAveGs0MZDml2sowGRBf/j500NjpGotC/TlZKcpFTkr5CRYHsCseUyRjDX6FJawTimEOm0qUkapE9ILfBe7F9e50eo2jBd4E6+lzn6dBSa5Odl3mqTWaDayfZ8LVdSE1qT5BaqJW3r7HWV9RSLaKo5VoqUSu1IKJGre5ijqjRV8wRN0KKOfKNYmKOpJGGLLF9HQwhNUuK65HcatQqYuvtl9X58A27vsARkagG1QAAAJl6VFh0U01JTEVTIHJka2l0IDIwMjIuMDkuNQAAeJxVjjsOwzAMQ6/SMQEcQaL8URB08t6lR9DepWsOX6eJjXbTEymKLu6Ot09V749anzq/HC63fYoEJENgKilsnZYTQStysbAINRLKKCV2kQkskr4qRJOGrSnZOLW0NoNkzEqcgV9npGh2RJ9hgy+8DoZ/PBub3ua/6VF03j8L1zFkzCXeHwAAAABJRU5ErkJggg==", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Murcko histogram: {'0_1': 1, '1_0': 1, '1_1': 1}\n" ] } ], "source": [ "hist = murcko_hist(Chem.MolFromSmiles('O=C(O)[C@@H]1/N=C(\\SC1)c2sc3cc(O)ccc3n2'), show_mol_scaffold=True)\n", "print('Murcko histogram:', hist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resulting Murcko histogram is a dictionary where the number of elements corresponds to the number of rings in the molecule's Murcko scaffold. Each key in the dictionary contains two values separated by an underscore: the first value represents the number of neighboring rings, and the second value represents the number of neighboring scaffold linkers. The corresponding values in the dictionary represent the counts of each ring type in the molecule's Murcko scaffold.\n", "\n", "Let's now create a training-validation split of the [MassSpecGym](https://huggingface.co/datasets/roman-bushuiev/MassSpecGym) dataset based on the Murcko histograms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the MassSpecGym dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "msdata = MSData.from_mgf('../data/MassSpecGym.mgf', prec_mz_col='PRECURSOR_MZ', mol_col='SMILES', adduct_col='ADDUCT', in_mem=False, mode='a')\n", "df = msdata.to_pandas()\n", "df_us = df.drop_duplicates(subset=[SMILES]).copy() # Uniquify SMILES" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['COLLISION_ENERGY',\n", " 'FOLD',\n", " 'FORMULA',\n", " 'IDENTIFIER',\n", " 'INCHIKEY',\n", " 'INSTRUMENT_TYPE',\n", " 'PARENT_MASS',\n", " 'PRECURSOR_FORMULA',\n", " 'SIMULATION_CHALLENGE',\n", " 'adduct',\n", " 'precursor_mz',\n", " 'smiles',\n", " 'spectrum']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "msdata.columns()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Compute Murcko histograms" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 31602/31602 [01:46<00:00, 296.20it/s] \n" ] } ], "source": [ "# Compute Murcko histograms\n", "df_us['MurckoHist'] = df_us[SMILES].progress_apply(\n", " lambda x: murcko_hist(Chem.MolFromSmiles(x))\n", ")\n", "\n", "# Convert dictionaries to strings for easier handling\n", "df_us['MurckoHistStr'] = df_us['MurckoHist'].astype(str)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Num. unique SMILES: 31602 Num. unique Murcko histograms: 515\n", "Top 20 most common Murcko histograms:\n" ] }, { "data": { "text/plain": [ "MurckoHistStr\n", "{'0_1': 1, '1_0': 1, '1_1': 1} 3964\n", "{'0_1': 2} 3457\n", "{'0_0': 1} 3054\n", "{} 3027\n", "{'0_1': 2, '0_2': 1} 2308\n", "{'1_0': 2} 1753\n", "{'0_1': 1, '0_2': 1, '1_0': 1, '1_1': 1} 1202\n", "{'1_0': 2, '2_0': 2} 1199\n", "{'1_0': 2, '2_0': 1} 1187\n", "{'0_1': 2, '1_1': 2} 845\n", "{'0_1': 2, '1_0': 1, '1_2': 1} 672\n", "{'0_1': 2, '0_2': 2} 641\n", "{'0_1': 1, '1_0': 1, '1_1': 1, '2_0': 1} 626\n", "{'1_0': 2, '1_1': 2} 563\n", "{'0_1': 3} 450\n", "{'0_1': 2, '0_2': 1, '1_1': 2} 332\n", "{'1_0': 2, '2_0': 3} 325\n", "{'0_1': 1, '1_0': 1, '1_1': 1, '2_0': 2} 294\n", "{'0_1': 1, '1_0': 2, '2_1': 1} 276\n", "{'0_1': 2, '1_0': 1, '1_1': 1} 251\n", "Name: count, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Num. unique SMILES:', df[SMILES].nunique(), 'Num. unique Murcko histograms:', df_us['MurckoHistStr'].nunique())\n", "print('Top 20 most common Murcko histograms:')\n", "df_us['MurckoHistStr'].value_counts()[:20]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MurckoHistStrcountsmiles_listMurckoHist
0{'0_1': 1, '1_0': 1, '1_1': 1}3964[C1=CC=C(C=C1)C2=C(C(=O)NC3=CC=CC=C32)O, CN1C(...{'0_1': 1, '1_0': 1, '1_1': 1}
1{'0_1': 2}3457[CC(=O)N[C@@H](CC1=CC=CC=C1)C2=CC(=CC(=O)O2)OC...{'0_1': 2}
2{'0_0': 1}3054[C[C@@H]1C[C@H]2[C@H](O2)/C=C\\C(=O)CC(=O)O1, C...{'0_0': 1}
3{}3027[CCCC[C@@H](C)[C@H]([C@H](C[C@@H](C)C[C@@H](CC...{}
4{'0_1': 2, '0_2': 1}2308[CC(C)C1C(=O)NC(C(=O)NC(C(=O)NC(C(=O)N1)CC2=CC...{'0_1': 2, '0_2': 1}
...............
510{'0_1': 4, '0_2': 1, '0_3': 2, '1_0': 1, '1_1'...1[COC1=C(C=C(C=C1)C(C2=C(N(C(=S)N(C2=O)C3=CC=CC...{'0_1': 4, '0_2': 1, '0_3': 2, '1_0': 1, '1_1'...
511{'0_1': 1, '1_0': 1, '2_1': 1, '3_0': 3, '4_0'...1[COC(=O)C12C3CC4C15C(CCN4)(C6=CC=CC=C6N5)OC2OC...{'0_1': 1, '1_0': 1, '2_1': 1, '3_0': 3, '4_0'...
512{'0_1': 4, '0_2': 2, '0_3': 1}1[C[C@H](C1=CC=CC=C1)NC(=O)[C@@H](CC(=O)N2CCC(C...{'0_1': 4, '0_2': 2, '0_3': 1}
513{'0_1': 4, '1_0': 1, '1_2': 1, '2_1': 1}1[CC1=C2C(C(=O)C3(C(CC4C(C3C(C(C2(C)C)(CC1OC(=O...{'0_1': 4, '1_0': 1, '1_2': 1, '2_1': 1}
514{'0_2': 1, '1_0': 1, '1_1': 2, '2_0': 2, '3_0'...1[CN1CCC2=C3[C@@H]1CC4=C(C(=C(C=C4C3=C(C(=C2OC)...{'0_2': 1, '1_0': 1, '1_1': 2, '2_0': 2, '3_0'...
\n", "

515 rows × 4 columns

\n", "
" ], "text/plain": [ " MurckoHistStr count \\\n", "0 {'0_1': 1, '1_0': 1, '1_1': 1} 3964 \n", "1 {'0_1': 2} 3457 \n", "2 {'0_0': 1} 3054 \n", "3 {} 3027 \n", "4 {'0_1': 2, '0_2': 1} 2308 \n", ".. ... ... \n", "510 {'0_1': 4, '0_2': 1, '0_3': 2, '1_0': 1, '1_1'... 1 \n", "511 {'0_1': 1, '1_0': 1, '2_1': 1, '3_0': 3, '4_0'... 1 \n", "512 {'0_1': 4, '0_2': 2, '0_3': 1} 1 \n", "513 {'0_1': 4, '1_0': 1, '1_2': 1, '2_1': 1} 1 \n", "514 {'0_2': 1, '1_0': 1, '1_1': 2, '2_0': 2, '3_0'... 1 \n", "\n", " smiles_list \\\n", "0 [C1=CC=C(C=C1)C2=C(C(=O)NC3=CC=CC=C32)O, CN1C(... \n", "1 [CC(=O)N[C@@H](CC1=CC=CC=C1)C2=CC(=CC(=O)O2)OC... \n", "2 [C[C@@H]1C[C@H]2[C@H](O2)/C=C\\C(=O)CC(=O)O1, C... \n", "3 [CCCC[C@@H](C)[C@H]([C@H](C[C@@H](C)C[C@@H](CC... \n", "4 [CC(C)C1C(=O)NC(C(=O)NC(C(=O)NC(C(=O)N1)CC2=CC... \n", ".. ... \n", "510 [COC1=C(C=C(C=C1)C(C2=C(N(C(=S)N(C2=O)C3=CC=CC... \n", "511 [COC(=O)C12C3CC4C15C(CCN4)(C6=CC=CC=C6N5)OC2OC... \n", "512 [C[C@H](C1=CC=CC=C1)NC(=O)[C@@H](CC(=O)N2CCC(C... \n", "513 [CC1=C2C(C(=O)C3(C(CC4C(C3C(C(C2(C)C)(CC1OC(=O... \n", "514 [CN1CCC2=C3[C@@H]1CC4=C(C(=C(C=C4C3=C(C(=C2OC)... \n", "\n", " MurckoHist \n", "0 {'0_1': 1, '1_0': 1, '1_1': 1} \n", "1 {'0_1': 2} \n", "2 {'0_0': 1} \n", "3 {} \n", "4 {'0_1': 2, '0_2': 1} \n", ".. ... \n", "510 {'0_1': 4, '0_2': 1, '0_3': 2, '1_0': 1, '1_1'... \n", "511 {'0_1': 1, '1_0': 1, '2_1': 1, '3_0': 3, '4_0'... \n", "512 {'0_1': 4, '0_2': 2, '0_3': 1} \n", "513 {'0_1': 4, '1_0': 1, '1_2': 1, '2_1': 1} \n", "514 {'0_2': 1, '1_0': 1, '1_1': 2, '2_0': 2, '3_0'... \n", "\n", "[515 rows x 4 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Group by MurckoHistStr and aggregate\n", "df_gb = df_us.groupby('MurckoHistStr').agg(\n", " count=(SMILES, 'count'),\n", " smiles_list=(SMILES, list)\n", ").reset_index()\n", "\n", "# Convert MurckoHistStr to MurckoHist\n", "df_gb['MurckoHist'] = df_gb['MurckoHistStr'].apply(eval)\n", "\n", "# Sort by 'n' in descending order and reset index\n", "df_gb = df_gb.sort_values('count', ascending=False).reset_index(drop=True)\n", "\n", "df_gb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Split the dataset into training and validation sets based on Murcko histograms" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Distribution of spectra:\n" ] }, { "data": { "text/plain": [ "fold\n", "train 0.804482\n", "val 0.195518\n", "Name: proportion, dtype: float64" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Distribution of SMILES:\n" ] }, { "data": { "text/plain": [ "fold\n", "train 0.805487\n", "val 0.194513\n", "Name: proportion, dtype: float64" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "median_i = len(df_gb) // 2\n", "cum_val_mols = 0\n", "val_mols_frac = 0.15 # Approximately 15% of the molecules go to validation set\n", "val_idx, train_idx = [], []\n", "\n", "# Iterate from median to start, assigning molecules to train or val sets\n", "for i in range(median_i, -1, -1):\n", " current_hist = df_gb.iloc[i]['MurckoHist']\n", " is_val_subhist = any(\n", " are_sub_hists(current_hist, df_gb.iloc[j]['MurckoHist'], k=3, d=4)\n", " for j in val_idx\n", " )\n", "\n", " if is_val_subhist:\n", " train_idx.append(i)\n", " else:\n", " if cum_val_mols / len(df_us) <= val_mols_frac:\n", " cum_val_mols += df_gb.iloc[i]['count']\n", " val_idx.append(i)\n", " else:\n", " train_idx.append(i)\n", "\n", "# Add remaining indices to train set\n", "train_idx.extend(range(median_i + 1, len(df_gb)))\n", "assert(len(train_idx) + len(val_idx) == len(df_gb))\n", "\n", "# Map SMILES to their assigned fold\n", "smiles_to_fold = {}\n", "for i, row in df_gb.iterrows():\n", " fold = 'val' if i in val_idx else 'train'\n", " for smiles in row['smiles_list']:\n", " smiles_to_fold[smiles] = fold\n", "df[FOLD] = df[SMILES].map(smiles_to_fold)\n", "\n", "# Display fold distributions\n", "print('Distribution of spectra:')\n", "display(df[FOLD].value_counts(normalize=True))\n", "print('Distribution of SMILES:')\n", "display(df.drop_duplicates(subset=[SMILES])[FOLD].value_counts(normalize=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Evaluate data leakage" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 25455/25455 [00:03<00:00, 6929.22it/s]\n", "100%|██████████| 6147/6147 [00:00<00:00, 7772.50it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "INFO: Pandarallel will run on 4 workers.\n", "INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2e2dc804d78543d092b067d3866d5713", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=1537), Label(value='0 / 1537'))), …" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAASwAAAEiCAYAAABDd+8FAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAA3fklEQVR4nO3de1xUdfrA8c8gIDcVRQV1rU1bLSvXC2nkJcEkFa+kaZm3NC+pmK2WuqSlspZiZpp521VxMREvgealEjXvm5m7amKL/UrzAipeYIbbMOf3B8sEMsCccYaZgef9evHSObfv8z0zPJxz5pzvo1EURUEIIZyAi70DEEIIc0nCEkI4DUlYQginIQlLCOE0JGEJIZyGJCwhhNOQhCWEcBqSsIQQTsPV3gFUBL1ez927d6levTouLpKjhXA0BoOBnJwcatWqhatr6WmpSiSsu3fv8ssvv9g7DCFEOf74xz/i5+dX6vwqkbCqV68OFOwMT09Ps9fLz8/np59+olmzZlSrVs1W4VUY6Y9jq8r9ycrK4pdffjH+rpamSiSswtNAT09PvLy8zF4vPz8fAC8vr0rzAQLpj6OS/lDuJRu5oCOEcBqSsIQQTkMSlhDCaUjCEkI4DUlYQginIQlLCOE07HJbw7p164iPj8fFxYWHH36YuXPnotfrefvtt0lNTcXDw4Po6GiaNGkCwKZNm1i/fj16vZ7evXsTERFhj7CFEHZW4Qnr5MmTbNmyhc2bN+Pt7c2iRYtYtGgRt2/fJjg4mGHDhnHkyBGmTJlCQkIC58+fZ82aNWzbtg0PDw9GjRrF119/Tbdu3So6dCGEnVX4KWHt2rWZNWsW3t7eALRo0YJff/2VQ4cOER4eDkCHDh3IyMggJSWFpKQkunXrRs2aNXF3dyc8PJydO3dWdNgOJSZOT0yc3t5hCFHhKvwIq2nTpjRt2hSAzMxMli9fTq9evTh//jw+Pj7G5fz9/bl+/TqpqanGU8PC6ampqRa1nZ+fb7z71tzli/7rKDIzCwodqY3LUftjKemPY1PTH3P7bLdHc1JTUxk/fjytWrWiX79+rF69usQyLi4umKpCptFoLGrzp59+smi9M2fOWLSeLbi5uaHVPgTAuXOXyMvLU70NR+qPNUh/HJs1+2OXhJWcnMzYsWN56aWXmDBhAnq9ntzcXHQ6nfFZv7S0NPz9/QkICODGjRvGddPS0ggICLCo3WbNmql+lvDMmTM89dRTDvVs1+HvDAA88cQTqtZz1P5YSvrj2NT0R6fTmXVAUeEJKy0tjZEjRxIZGUlYWFhBEK6udO7cmfj4eIYPH86xY8dwdXWlSZMmhISEMHnyZF5//XW8vb3Zvn07L774okVtV6tWzaIPgqXr2YpGU3DUaWlMjtafByX9cWzm9Mfc/lZ4wlq1ahU6nY5Vq1axatUqAB555BFmzZrFzJkziY+Px93dncWLF6PRaHj88ccZNWoUQ4YMQa/XExISQu/evSs6bCGEA6jwhBUZGUlkZKTJeWvWrDE5fdCgQQwaNMiWYQkhnIDc6S6EcBqSsIQQTkMSlhDCaVSJIZIru5g4PVoteHvDsEHylorKSz7dlYBWC5lae0chhO3JKaEQwmlIwhJCOA1JWEIIpyEJSwjhNCRhCSGchiQsIYTTkNsanEhMnB5PT3tHIYT9SMJyIlotUHI8QyGqDDklrKRk3HdRGak+wsrJyaF69eoYDAaSkpLw9fUlMDDQFrGJ/7HkVFArd76LSkjVEVZiYiIdO3YEYOHChbz33nu8+eabpY5jJaxDq4Usnb2jEML+VCWs1atX89lnn5Gbm0tcXBzLly8nPj6emJgYW8UnhBBGqk4JU1NTCQwM5MiRI3h5edGyZUsURUEr5x9CiAqgKmE1btyYTZs2kZSURJcuXcjNzWXt2rX86U9/Ut1wbm4uY8aMYfjw4QQFBRUbAtlgMPDTTz+xbt06goKCCA8PJzc31zhQ/dixY+nZs6fqNoUQzk1Vwpo7dy5RUVF4e3szZcoUTp06xc6dO4mOjlbV6Llz55g1axYpKSkMHz4cDw8PEhISjPMXL17Mk08+SVBQEFlZWVy7do0jR47g4iJfagpRlalKWC1atCA2Ntb42s/Pjx07dqhudOPGjURERJi8WJ+cnMyOHTtITEwE4OzZs3h4ePDaa69x69YtXnjhBd544w1JXkJUQaoSVn5+PmvXrmXbtm3cunWLbdu2MXfuXP72t79Rp04ds7cTFRUFmK6S8/HHHzN+/Hhj2frMzEyeeeYZZs2aRX5+PmPGjKF27doMGTJETejG+J2xVL2iKCgKxX68vSBmU8HtDkXnF8ZaWDG7aOyO0h9rkf44NruXqv/oo484c+YM7777LhEREdSuXRsvLy9mz57N0qVL1WzKpEuXLnHmzJli2woODiY4ONj4esSIEWzatMmihOWMpeoLS9Nr0KDLAg0Y/9VmgbdnwevMTAMaNJw7dwmgzHL2UgrdsUl/SqcqYe3YsYMdO3ZQq1YtNBoNXl5eREVF0aVLF6sEs3fvXnr06IGbm5tx2oEDB/Dx8THenKooCq6ulj1R5Kyl6g9/Z8DLq+CpnNL+NRgKxnQvLF9vqpy9o/THWqQ/js3upepdXFxKHLplZ2fj4eGhZjOl+v777wkPDy82LS0tjRUrVhjv9YqNjbW48rOzlqrXaBQ0Gsz6KYyzrHL29u6PtUl/HJvdStX36dOHN954gwkTJmAwGDh9+jRLly4lLCxMzWZKdfnyZRo0aFBs2oABA/j555/p168fer2e0NBQBgwYYJX2qpKYOD0o0KqFW/kLC+GgVCWswm/2oqKi0Ov1TJ06ld69ezN+/HiLGt+wYUOx119++WWJZVxcXJg+fTrTp0+3qA1RQKv9/UK8EM5KVcJydXVl3LhxjBs3zlbxCCFEqcxKWNOmTUOj0ZS5zIIFC6wSkBBClMashPXwww/bOg4hhCiXWQlr4sSJto5DmCBDIgtRnKprWM8991ypp4YHDhywRjyiCBkSWYjiVCWshQsXFnt9+/ZtPv/8c7p162bVoIRlvL3kqExUbqoSVrt27UpMa9++PS+99JJFj8oI65OjMlGZPfCQB3q9nrt371ojFiGEKJOqI6z7b2/Iz8/nxIkTxR5OFo5DThFFZaMqYd1/e4NGoyE4OJjQ0FCrBiWsR04RRWWiKmFNnDiRU6dO8eijj1KzZk2OHz+Om5sb7u7utopPCCGMVF3DiomJISIighs3bgBw8+ZNpkyZQnx8vE2CE0KIolQlrL///e98/vnnNG3aFIBevXoRGxvLZ599ZpPghBCiKFUJS6vVUq9evWLT6tWrR1ZWllWDEkIIU1QlrE6dOjFjxgwuXbpEdnY2ly5dIjIy0lgNWgghbElVwpo9ezb5+fn07NmTVq1aERYWhqIoREZG2io+IYQwUvUtoa+vL5988gm5ubncvXsXPz8/KbclhKgwqrPN7t27mThxImPGjOHmzZvMnTuXnJwcW8QmhBDFqP6WcPny5XTr1o3Lly9TvXp1fvrpJ95//31bxSeEEEaqElZsbCyrV69m4MCBaDQaatWqxbJly0hKSlLdcG5uLiNGjGD//v0A7N+/n/bt29O3b1/69u3L0KFDjcsuXbqU7t27061bNzZu3Ki6LWcTE6cnPlFv7zCEcDiqrmHl5uZSs2ZNAOMzhW5ubqrrBJ47d45Zs2aRkpLC8OHDATh9+jTjxo1j5MiRxZb95ptvOHz4MImJieTk5DBo0CBatWpFixYtVLXpTORxGiFMU5VpgoODmTp1KlOnTgUK7nSPjo6mc+fOqhrduHGjsQJPoR9++IFq1aqRmJiIr68vM2bMoFmzZiQlJdG7d2/c3d1xd3enZ8+e7Ny506KE5Syl6k2VprfKDyXL1zuzqlza3RnYvVT9jBkzmD9/Pv379ycnJ4fg4GB69uzJzJkz1WyGqKgogGIJy9fXlwEDBtC5c2e++uorxo0bx65du0hNTSUkJMS4nL+/P8ePH1fVXiFnKFVfVmn6sv7NzDRQw9ulzGUUg1Lh/akI0h/HZrdS9V5eXsydO5e5c+eSnp6Or6+v1W5r+OSTT4z/Dw0NZenSpfz4448ma+mVV8GnNM5Sqt6c0vSmStWXt0zhEVZVLIXuDKpyf6xaqn7JkiXlLjN58mRzNmVSRkYGGzduZOzYscWmu7q6EhAQQFpamnFaamoqAQEBFrXjLKXq1ZSmV/NTeF2sKpZCdyZVsT/m9tesw6Pr16+X+/MgvL292bhxo/Ebw8OHD5OVlcXjjz9O165djRfcMzMz2bNnD126dHmg9oQQzsmsI6z58+cXe60oCunp6dSsWRM3N7cHDsLFxYVly5Yxd+5coqOj8fLyYtmyZbi5udG1a1fOnz9PeHg4er2el156icDAwAduUwjhfFRdw9LpdMybN4+dO3eSl5eHq6sr3bp1Y86cOfj4+KhufMOGDcb/P/XUU2zevNnkchMnTpTaiEIIdTeOzp8/n/T0dLZv387333/P1q1b0el0/O1vf7NVfEIIYaTqCGv//v3s2bPHeDTVrFkzFixYIHUJhRAVQtURlkajKfGgc05OjozpLoSoEKoSVu/evRk/fjyHDh3i4sWLfPvtt0yYMIFevXrZKj4hhDBSdUo4ZcoUPv74YyIjI0lPTycgIIA+ffqUuH9KCCFsQVXCcnNzY9q0aUybNs1W8QghRKlUJayUlBRWrlzJ9evXMRgMxebFxsZaNTBhWzFxBcPXDBukbqQNIexJ9Snh448/Tr9+/WRoZCen1do7AiHUU5Wwrl69yvbt21WPfyWEENag6jApLCyML774wkahCCFE2VQdKoWEhDBhwgTmz59PjRo1is07cOCANeMSdhITp0erBW9vub4lHI+qT+Ts2bMZN24cgYGBlWr4C/E7rRYy5fqWcFCqElZ2djYTJ060eAA9IYR4EKquYQ0fPpyPP/6Y9PR08vPzMRgMxh8hhLA1VUdYcXFxpKamsmrVKuM0RVHQaDScP3/e6sFVNTFxejw97R2FEI5LVcKqCjUB7UnKewlRNlUJq1GjRraKQwghyiW3qwshnIbdEtb9perPnTvH4MGD6du3L/379+fIkSPGZYOCgowl7Pv27cvJkyftFbYQwo5UnRJ+9913PP300yWm79u3j65du5q9HVOl6idPnsy8efN45pln+Omnnxg6dCgHDx7k+vXr1K1bl4SEBDWhCiEqoXITlk6n49atWwCMGTOGxMTEYvMzMzOZOnUqP/zwg9mN3l+qPjc3l7Fjx/LMM88A0LRpU/R6PXfv3uX06dMYDAZeffVVMjIyGDRoEK+88orZbRXl6KXqbVai3kSpekXRmOxb0RgcvWR6VS7t7gzsUqo+NzeXAQMGcPfuXYAS47e7ubnRv39/sxordH+pend3dwYOHGic/+mnn9K0aVP8/f3Jy8ujc+fOvPXWW9y+fZthw4bRsGFDi2oTOnKpektL1FtSqj45ORmt9iEAzp27RF5eXrEYMjMVNGiKzXNkUtrdsVVoqXpfX19OnDgBwODBg9m0aZPVGr+foih89NFH7N69m5iYGIBiiax+/foMGjSIffv2WZSwHL1UvSUl6i0pVf/YY49x+LuCI6wnnniiRAwKBc8S3j/P0VTl0u7OwG6l6gtt2rQJnU7HwYMHSU1NZdCgQaSkpPDUU0+p2YxJubm5/OUvf+HGjRvExcXh5+cHQGJiIo899hjNmjUDCpKapcPbOHqpeluVqDdVql6jUYz/Ly0GZ/mlqYql3Z1JhZeqL3Tu3DlCQ0PZsGEDS5Ys4ebNmwwbNswqQ87MmDEDRVGIiYkxJiuAixcv8umnn2IwGLh37x5bt24lNDT0gdsTQjgfVQlrzpw5zJ49m40bN+Lq6krjxo1Zs2YNy5cvf6Agzp07x86dO0lJSWHgwIHG2xd+/vlnxo8fj6enJ71792bgwIEMHDiQoKCgB2pPCOGcVJ1b/fzzz8bbFwpHbGjTpg3p6ekWNV60VP2FCxdKXe6DDz6waPtCiMpF1RHWo48+ys6dO4tNS0pK4tFHH7VqUEIIYYqqI6zIyEhef/11YmNj0el0jBo1ivPnz7NixQpbxVdpOcLInt5eUj1HOBdVn9InnniCr776ioMHD3Lt2jXq1q3LokWL8PX1tVF4lZejjOwp1XOEM1H9LOGdO3cICwtjyJAh3L17l4MHD6IoMiaKEML2VB1h/eMf/2DlypWcOHGCOXPmcObMGTQaDcnJybzzzju2ilHYWOGpoQweKBydqiOszZs3s3HjRnQ6HTt37mTJkiVs2LBBSn9VAlotZOnMWzYmTm+89iVERVJ1hJWenk7Tpk355ptvqF+/Pk2bNiUvLw+9Xj68VYlc9xL2oiphNW/enI8++ogTJ07w/PPPc+/ePT766CNatmxpq/iEEMJI1Snhhx9+yPXr12nRogVTpkzh559/5vr168bRF4QQwpZUHWE1bNiQBQsWGF+3atVK7sESpXKEe81E5SKfIjurzN/QOcq9ZqLykITlABy9vJfcDS8chXwCRbnkW0HhKFQnrNTUVC5fvlzi7nZTxSmEEMKaVCWslStXsmTJEmrXrl1s1E+NRsOBAwesHZsQQhSjKmHFxsaycuVKOnXqZKt4hBCiVKruw8rLy5PRPoUQdqMqYQ0ZMoTo6Ghu3bqFwWAo9iMql8LbLeIT5bEr4ThUV825efMm69evN05TFAWNRsP58+dVNZybm8uYMWMYPnw4wcHB3Lhxg7fffpvU1FQ8PDyIjo6mSZMmxnbXr1+PXq+nd+/eREREqGpLWMbRb7cQVY+qhBUXF2eVRk2Vqn/vvfcIDg5m2LBhHDlyhClTppCQkMD58+dZs2YN27Ztw8PDg1GjRvH111+XKOgqhKj8VJ0SNmrUiBo1anDq1Cl27drFiRMn8PDwoFGjRqoaLSxVX/jQdF5eHocOHSI8PByADh06kJGRQUpKCklJSXTr1o2aNWvi7u5OeHh4iXHlhfksrekohCNQ9ek9e/Yso0ePpnHjxjRo0IArV67wt7/9jTVr1tCqVSuzt3N/qfo7d+7g7u6Oj4+PcRl/f3+uX79Oamqq8dSwcHpqaqqasI3y8/PJz89XtXzRf61JURQUhQr98fJSOPSvxpz7r0H1ukX3Q+E9eOXtl6J9tMU+tOX7Yw9VuT/m9llVwoqKiuLtt982HgkBbN26laioKOLj49VsqpjSLtq7uLiYHH65sMSYWuaUwjblzJkzFq1XGjc3N7Tah8jMVKjh7YIuCzTwQP9mZhrM2pY2Cwz5OtXbUgwK585dAkCrfQiAc+cukZeXV24fNWjKXPZBWfv9sTfpT+lUJayUlBT69etXbFq/fv0eeHgZPz8/cnNz0el0eHl5AZCWloa/vz8BAQHcuHHDuGxaWhoBAQEWtdOsWTPj9s2Rn5/PmTNneOqpp6xeOvzwdwYUwMsLq/xrMJS/jKeXgjYrC08vTxQ0qrYFBUVICmMv+rq8Pnp7l7+sJWz5/thDVe6PTqcz64BC9fAyx44do0OHDsZpx44d4w9/+IOazZQMwtWVzp07Ex8fz/Dhwzl27Biurq40adKEkJAQJk+ezOuvv463tzfbt2/nxRdftKidatWqWfRBsHS9smg0ChoNFftT2DYaNBqNqnUL90Nh7EVfm9NHW/4C2uL9saeq2B9z+6sqYb311ltMmDCB4OBgGjZsyJUrVzh48CAff/yxms2YNGvWLGbOnEl8fDzu7u4sXrwYjUbD448/zqhRoxgyZAh6vZ6QkBB69+79wO0JdSrzMDjCeahKWM899xybNm1iz549pKen06xZMyZPnswjjzxiUeNFS9XXr1/feBH+foMGDWLQoEEWtSGsp7T7smT4GVFRzPqE/fbbb/zhD3/g8uXLeHt7lzglu3z5Mo0bN7ZJgMLxyfAzoqKYlbD69OnDqVOn6NatW4lv6Cy9010IIdQyK2GdOnUKgOTkZJsGI4QQZVF1p3uPHj1MTu/YsaNVghFVlxRnFeYo9wjrypUrvPPOOyiKwqVLlxgyZEix+VqtFm9vb5sFKKoGuQ4mzFFuwmrUqBGvvfYad+7c4cyZMwwYMKDYfHd3dxkeWQhRIcy6hhUSEgLAk08+SbNmzUrMryzPPgnrsOb9WnLLhChK1adAo9Hw5ptvcvPmTeMzfnq9nkuXLnHs2DGbBFjZVIWbL605jpacKoqiVF10nzFjBq6urjRr1gwPDw9CQ0O5ffu2cUwrUT6tFrJ09o5CCOekKmGlpKQwf/58hg4dSnZ2NsOHD2f58uV8+eWXtopPiDK5ubnZOwRRgVSdEvr5+WEwGGjcuDEXL14E4NFHH+Xq1as2CU6IssTGK+h0f+RBBoKQa2TORdURVvv27Zk8eTJarZbHH3+cxYsX89lnn1G/fn1bxSdEqTK1ChmZD1YARauV62TORFXCevfdd3nyyScBeP/99zl79ixJSUnMnTvXJsEJIURRqo6DPT09mThxIgC1atXi73//u02CEkIIU8xKWEOHDi13WOKYmBirBFRZVYXbGYSwNbNHa4CCsZmPHj3Kq6++SsOGDblx4wb//Oc/i41AKkyTGn9CPDizEtbAgQOBgio3MTExNGzY0Diva9euvPLKK0RGRtomQlEpybdzwhKqPi3p6el4eHiUmK6Vr1lEGQqHV4bfE5R8ZIQlVCWsvn37MmLECIYOHYq/vz/Xrl1j3bp1DB482CrB7Nq1i5UrVxpfp6en4+LiwurVqxk8eHCxUU3j4uJMJk/hmCRBCWtQlbBmzJjBhg0b+OKLL7h58yb16tVj+PDhVhtvvWfPnvTs2RMoOGobOHAgs2bN4vTp04SHhzvVaaec8ghhfap+m6pVq8aIESMYMWKEjcL53bJly3j22Wd55plnSEhI4PLly4SHh+Pu7s7UqVMJDAy0eQwPQo4ohLA+sxJWjx492L17N88991yptzccOHDAakGlpqayfft29uzZA4CXlxf9+/fnxRdf5IcffuCNN94gMTGRevXqqdpuRZaqv7+cuz1K05f4+d/XlAoPHou3F8RsKrhVw9y+3r9MWa9L24/F5lH6PEvfJ3uSUvXlMythFd7JvmDBAovLxKsRFxdHnz598PX1BQrusC/UunVr/vznP3P8+HHV9QkrqlR9YZl2H28Naza44OUBuiyDxeXlrVmqHiBLl2W1svfenqZL2WvQlNhWaeXu739dWNL+/v14714+NXxceDbwFwB0/1snOTnZuI4l79P97dqblKovnVkJq02bNhgMhgobWXTPnj0sXLgQKPgLuHLlSoYNG1aszLyrq/prQxVZqv7wdwa8vECrA4MCXp6WlZd3lFL15f0LxUvZm9rW/cuU9drUflQo2JeFyxz6Lh+dVsdjjz1mcaXk0tq1BylVb6VS9S1atCj1yMraZb7u3LlDWloaLVq0AAoGDTxw4AA+Pj68+uqrJCcn85///IcPPvhA9bYrslS9XUrR27BUvdpS9qaW8faC2C0Knp4YP09F1yn6urT96ONdZBv8vg1Lf8FLa9eepFR96cxKWPv27TNrY9Zw6dIlAgICiiXIBQsWEBkZSVxcHACLFi0yni4K52KNO/7lqYGqy6yE1ahRI+P/L168yI0bN4oNkXzx4kWrfXPYsmVLdu7cWWzaQw89JM8qCiHU3dYQHR3NunXrjGW9FEUhIyODoKCgCrnVQQhRtalKWFu3bmXz5s1kZmYSFxfHokWLWLZsGZcvX7ZVfKISK3xkR0axEOZSlbAMBgMtWrTgzp07nD17FoAxY8YYy4AJoVbR61Gmnjksj4+3hth4BTR6eaqgClA14uhDDz3Ev/71L3x9fcnKyuL69etkZGSQnZ1tq/hEFWPJkMWZWkWeLKgiVP1JmjhxIuPHjychIYFhw4YRHh5OtWrVCA0NtVV8QghhpCphPffccxw6dAgPDw9Gjx5N27ZtuXfvHp07d7ZVfEIIYaTqlHDEiBHs2bMHna6gEmjr1q3LfL5QCCGsSVXCCgsLIzExkU6dOhEREcE333zjMM9fCSEqP1UJa+DAgaxbt46vv/6adu3asX79ejp37lzs4WQhhLAVVQmrUN26dWnevDnNmzfHzc2NlJQUa8clhBAlqLrofvLkSfbs2cOePXuoWbMmvXv3JjY2ttjQxaJqkptARUVQlbDefPNNevXqxYoVK4wVoIUoJA8lC1tTlbC+/fZbXFwsOosUQogHpiphSbISFUFOL0Vp5OEr4ZDk9FKYIodMQginYZWEdf36dWtsRgiLFZ5GFo72IConqySswuKnQtiTJSM9lEeSoGOxyjWsL7/80hqbEcLhyLA1jsUqCatBgwbW2AwRERFcuHABDw8PAPr3709YWBhvv/02qampeHh4EB0dTZMmTazSnhDCuahKWN999x2LFy/m+vXrGAyGYvOsUfn53//+N1988QW1a9c2TpswYQLBwcEMGzaMI0eOMGXKFBISEh64LSGE81GVsKZPn06vXr0ICgqy+j1ZV69eRafT8c4773Dt2jWeeeYZJk+ezKFDh/jwww8B6NChAxkZGaSkpPDoo4+qbqOiS9XbvTS9DUvVO8QPCpr7+qP2/SqvVH1FlrKXUvXlU5WwMjMziYiIsEmRx5s3bxIUFMR7772Hj48P06ZN44MPPsDd3R0fHx/jcv7+/ly/ft2ihFXRpeoLy7U/aEl4Ry1Vb824LNlWVhZ4e0JWVhYu/1vG2xPWbHBBMSg8G/hLmcMflVeq3l6l7KVUfelUJaxXX32V5cuXM3z48GJJBB78LviWLVvyySefGF+PGTOGt956y+SylrZlj1L1D1pe3llK1dtjWwYUIAtPT0+8vDTG6dqC8SXNKj9fXqn6iixlL6XqrVSqvlCjRo149913Wb58uXGatUrVnzx5koyMDIKDg43bdXV1JTc3F51OZ0w0aWlp+Pv7W9SGLUvV31/tpaqVqne0/hS+b2W9X56eGEfLLW1Ze5Syl1L1pVOVsBYuXMicOXNo166d1a9hZWdnExUVRWBgIN7e3qxfv55u3bqRkpJCfHw8w4cP59ixY7i6ujrkt4Ty9bfjMlU6rLRHf9SWGRMVS9W7Uq1aNfr06YObm5vVA+nYsSODBw9m0KBB6PV6nn76ad544w3u3LnDzJkziY+Px93dncWLF8sY8kIVNX9M5A+PY1OVsCZNmsRf//pXhg4dSq1atYolDmsM4jd69GhGjx5dbFr9+vVZs2bNA29bCOH8VCWs2bNnA5CYmFhsujWuYQkhRHlUJazk5GRbxSGEEOVSlbCOHTtW6rygoKAHDkYIWykczUGrhbp17R2NsJSqhPXXv/612Ot79+6h0+lo06aNJCzh8LRayNQWJC/hnFQlrKSkpBLTYmJiuHTpktUCEkKI0jzwzVRDhw5lx44d1ohFCIdQePoYnyjjYDkaVUdY94/QoNfr2bVrFzVr1rRqUEJUJFNFL2RMecekKmG1aNGixE2b3t7eUqpeOD1JUM5BVcLat29fsdcuLi74+fnh7u5u1aCEeFDWLBVWuC2QR3bszay9f/XqVYASR1eKonDz5k0AGjZsaOXQnI/U03Ms1jxqkkd2HINZCSskJASNRmMczKxQ0QQmd7oXkFMLIWzHrIR17ty5EtO0Wi0ffPABu3fvLnXcKiGqMjmNtD6z9uT9Y9X861//YsaMGfj7+/PFF1/w8MMP2yQ4IZyZnEZan6rUn52dzcKFC9m6dSsRERGMHDlShnoRQlQYsxPWqVOnmD59OrVr12bbtm0OOYieEKJyMythffjhh8TExNC/f39Gjx5NtWrVuHz5crFlrDEelhBClMWshLV27VoAtmzZwtatW01+WyjfEgpRktzDZV1m7UEZB0sIy8nFd+txqJS/bt064uPjcXFx4eGHH2bu3LmcPn2a6dOnExAQAEDNmjXZsGGDnSMVQtiDwySskydPsmXLFjZv3oy3tzeLFi1i0aJF+Pn5MW7cOEaOHGnvEEUVV9rpnZzyVRyH2cO1a9dm1qxZeHt7AwUPWm/evJlLly5RrVo1EhMT8fX1ZcaMGTRr1syiNqxVqj42XiFTq+DjrWHIwILbOhyyNL2J0u5QvLS7U/+YKFVv65/C07uin4nMTMXk56ToZ8LUfHM/b87K7qXqbalp06Y0bdoUgMzMTJYvX87LL7/M8ePHGTBgAJ07d+arr75i3Lhx7Nq1Cw8PD9VtWKNUvZubG6lpD5GRqVDDR8O5cwWDF5pTmt4RSsJD5S5VXxHxeHvCmg0u3LuXT0D9auiyDCgGpVg5+8Iy94WficJ1FIPCs4G/lFn2XkrVl85hElah1NRUxo8fT6tWrXj55Zd55ZVXjPNCQ0NZunQpP/74I23atFG9bWuVqj/8XUGRdG/v30uYm1Oa3t4l4atKqfqKiEer+9/2FPD638Pu95ezv/8zodWZXq68z5uzsnupeltLTk5m7NixvPTSS0yYMIGMjAw2btzI2LFjiy3n6mpZ2NYqVV+0DH3hdIcsTa+itLtT/jhQfwo/J0WZ+kyYWs5an1NHZc1S9datN/8A0tLSGDlyJG+//TYTJkwACgYH3LhxI/v37wfg8OHDZGVl8fjjj9szVCGEnTjMEdaqVavQ6XSsWrWKVatWAfDII4+wbNky5s6dS3R0NF5eXixbtgw3Nzc7R1tAxr8SomI5TMKKjIwkMjLS5LzNmzdXcDTmk/GvhKXkdgj1ZE8JYSdyB7x6krCEqEByGeHBSMISooLJZQTLOcy3hEIIUR45whLCyuSUz3YkYQlhZXLKZzuSsFSQv5yiNNa+mC63PJgme8OEwg/LkAEa3NzciI1X8PTSy19OUSYp3Gp7krBMuP/DkqlV+P3JNSGcQ2U8Sqs8PbERSx+0FsIS1nzsrDIepclvYym8vQoG6tMg1YBExTl68o9c+L+C88qi18WysgrmV6ajJUtU7d6XI1OroMFg7zBEFZKRacDT8/dLEIXXxbQ6qcADkrCEsKvCJKTVQl0/0LiUfa20Mp7mqSEJSwg702ohUwtenpXnK+iYOD0o0KqFdYeCkoQlRBVRkaeTWi0lCi5bgyQsIZxMWRfjjaeXdUuuVxlOJyVhCeGEil6Mv396prYgqZni7BfunS9iIcQDceYjLacZXuabb76hV69ehIaGMmvWrDLrugkh1IuJ0xuPvhyVUySsGzdu8N5777F69Wr27t2LTqcjNjbW3mEJ4RAKT/PiEy1LNoXrarXlH32VltQqKtk5xSnhkSNHaNOmDQ0aNADgpZdeYsGCBYwYMcK+gQnhIEp78Lq0USSKTi9t3cIL+N7ev1/vuj+hFdtGBXCKhJWWloa/v7/xtb+/P9evXzd7fYOh4G51rVZLfn5+ucv71VaoXh2qV1dwIRcD4FFdg0d1qF4dPKqDp4dCrZrFp6n91xrbULctBRdNLjVrgqeHxoHismxbZb0/ztg331qY9f6ojSsnB9xdwa928WVNTQfIyCi4edXDXcGQr+DhrjFO86utlFimcBtFp/vVVlBQUBSFzMxMXFzKPpnLzs4u9rtaGqdIWKY6Ud4OKConJweAS5cumbV804fM3rTz8bd3AKJMDvD+FFaML/p7cP80U8uYmq7XQ0pKitlt5+Tk4OPjU+p8p0hYAQEBJCcnG1+npaUREBBg9vq1atXij3/8I9WrV1eV6IQQFcNgMJCTk0OtWrXKXM4pElbHjh2Jjo7mypUrNGzYkC1bthAcHGz2+q6urvj5+dkwQiHEgyrryKqQRrHF/fM2sG/fPj7++GNyc3P585//zLx583B3d7d3WEKICuQ0CUsIIeSCjhDCaUjCEkI4DUlYQginIQlLCOE0JGEJIZyGJCwhhNOQhCWEcBqSsCh/rK3bt28zceJEevfuTVhYGOvWrbNPoGYyd+yw3NxcBgwYwD//+c8KjlCd8vqj1+uJjo6mX79+vPDCCw7fHyi/Tzk5OUydOpVevXoRFhbG559/bqdIzZebm8uIESPYv39/iXlarZaIiAh69uxJWFgYp06dsqwRpYpLS0tTOnTooFy9elUxGAzKX/7yF2Xt2rXFlpkxY4ayZMkSRVEUJSMjQwkNDVVOnjxph2jLZ05/Cs2bN09p166dsmHDhooNUgVz+rNy5Upl9OjRSl5ennLnzh2lY8eOysWLF+0TsBnM6dP69euVSZMmKQaDQbl7964SFBSk/Prrr/YJ2Axnz55VwsPDlZYtWypJSUkl5kdFRSlRUVGKoijKhQsXlOeee07JyclR3U6VP8IqOtaWRqPhpZdeYufOncWWCQkJ4eWXXwYKnnd6+OGHuXLlij3CLZc5/YGCv/Cpqamqnsm0B3P6s3PnTsaOHYurqyu1atUiNja22HBEjsacPhkMBnQ6HXq9nuzsbDQaDa6ujvvo78aNG4mIiKBly5Ym5yclJTFgwAAAmjVrRuPGjTl27Jjqdqp8wjJnrK3nn3+eevXqAXD06FFOnz7Ns88+W6Fxmsuc/vz22298+umnzJs3r6LDU82c/vz666/8+OOPDB06lL59+3Ls2DG8vb0rOlSzmdOnV155hYyMDDp16kTXrl0ZMGAADRs2rOhQzRYVFcVzzz1X6vzU1NRiI6yoHdOuUJVPWGrG2vrqq6946623WLx4MXVN1VFyAOX1Jy8vj2nTpjF79mxq1qxZkaFZxJz3R6/X89///pe1a9eyevVq1qxZw3fffVdRIapmTp8++eQTmjVrxuHDhzl48CAHDhxg7969FRWi1SkmHlm2ZKinKp+wAgICuHHjhvF1aWNtrV69mrlz57Jq1So6dOhQkSGqUl5/zp49y9WrV5k9ezZ9+/YlKSmJVatWOewXCea8P/Xq1aNnz564urpSv359nn32Wf79739XdKhmM6dP+/fvJzw8HFdXV+rUqUNYWBgnTpyo6FCtxlSfLTltr/IJq2PHjpw8eZIrV66gKIrJsbbi4uLYsmUL8fHxpZ6jO4ry+tO6dWsOHjxIQkICCQkJhISEMGbMGIcdH9+c9+f5558nMTHROBzviRMnePLJJ+0UcfnM6VOLFi3Ys2cPUPCN4aFDhxz+s1eWrl27snnzZqBgBNKff/6ZNm3aqN6ODC+D6bG2tm7dSlpaGpMmTaJ9+/Z4enpSu3Zt4zqjRo2iT58+doy6dGX1Z/LkycWWnT59Ok8++SSvvvqqnaItX3n90el0zJ8/n++//578/HwGDBjA66+/bu+wy1Ren9LT03n//fe5cOECrq6udOvWrcR754iGDh3Ka6+9RnBwMEuWLKF+/fq8/PLLZGZmMmvWLC5cuIBGo2HGjBkWnalIwhJCOI0qf0oohHAekrCEEE5DEpYQwmlIwhJCOA1JWEIIpyEJS5jl9u3baLVae4dhdb/99pvqdSzdF5V1H5piyX41hyQsE3777TeaN29Oz549S8w7evQozZs3Z+nSpVZpa9asWbRu3ZrWrVvzxBNP8OSTTxpfr1ixwuJtfvrpp1aJr1D37t25efOm1bY3evRotm/frmqdkydP8sILLwCwdOlSpk6dqrrdxMREXnvtNaDgXqh33nlH9TYs3ReWrlc0ZmssZ2uW7ldzOO7j3w7g1q1bXLhwgebNmxun7dy5Ey8vL6u1MWfOHObMmQMU3MTp7+/PlClTHnib1nbnzh2rbm/NmjWq1wkMDHzg5+n69OljvOH37t27Jp/rK4+l+8LS9YrGbI3lbM3S/WoOOcIqQ7du3di9e7fxdW5uLkeOHCl2h+6tW7eIiIigS5cutGzZkqFDh5KamkpWVhbdunVj1apVAPzf//0frVu3VvWM2y+//MLo0aPp2LEjrVq1Yvz48WRmZgIFdxR/8sknhIWF0bZtW0aPHm38hZg+fTqLFy82LrdixQq6d+9Oq1ateP/99/n6668JCQnh6aefLvYM4cGDB+nTpw9t27Zl8ODBnDlzBoDBgwcD0LdvX06cOEFGRgaRkZEEBQXRuXNnoqOjTQ4SqCgKH374Ic8++ywdOnRg0qRJ3L592xhXfHw8UDB8z7p16wgJCaFNmzasWLGCzz//nA4dOtChQwd27doFwIkTJ+jcuXOJdkp7Dwr3xVtvvUWnTp0YOXIk27Zt4+WXX+bChQvMnj2b06dP88ILL7B06VLGjRtn3GZeXh7t27fn559/LtaWpfvi/vXujys/P58PP/yQ0NBQWrVqRY8ePTh+/DiAMWYoOLKcPn06I0eOpHXr1vTv39/4Ppm7nKIoLF68mPbt2xMSEsLKlSsJCQkpETNAQkICzz//PO3atePll1/m7Nmzxnnr1q2ja9euBAUFMXPmTLRabYn9am2SsMoQFhZmfJ4L4NtvvyUwMBAPDw/jtIULF1KzZk2++uorjh49ChS8kZ6ensybN4/PPvuMy5cvM2PGDIYNG8af//xns9t/9913adWqFd9++y379u3j119/JSEhwTh/165drFmzxji21aZNm0xuZ+fOncTGxrJ9+3bi4uLYvHkzO3bsYNGiRURHR5OVlUVycjIRERFMnTqVEydOMHDgQGMSLNxuQkIC7du359133+XGjRvs3buXLVu2cPz4cVauXFmi3ePHj5OUlMTevXtJSkoiNzeXjRs3moxx//79JCYm8umnn7JkyRL+85//sH//fiZOnMj8+fPL3E+lvQeFfvjhBxISEoqdxjdv3pz333+fVq1asXfvXsLCwjhy5IjxD8LRo0dp0KABTZo0KdaWpfvi/vXujyshIYHjx4+zefNmvv/+e0JCQli4cKHJ/n755ZdMmDCB48eP89hjjxn/OJm73JYtW9izZw/btm1jy5YtJkcIBcjKyiIyMpIVK1Zw4sQJOnXqZIxpx44dxMbG8o9//IN9+/aRnZ3NggULSuxXa5OEVYa2bdui1WpJTk4GChJE7969iy3zl7/8henTp6MoCteuXcPX19d4naJ9+/b069ePIUOGkJOTw8SJE1W1/+GHHzJmzBiys7NJTU3F19e32BPv/fv3p0GDBtSuXZvOnTvz66+/mtxOnz598PPz45FHHqFevXoMGDAAb29vOnbsSF5eHjdv3mT37t106dKFzp074+rqyosvvkjjxo05ePBgsW1lZ2fz9ddfM23aNGrWrEn9+vWZNGlSsURayNvbm7S0NOMzcp999hkTJkwwGeOgQYPw8fGhXbt2GAwGhgwZgru7Ox07diQtLY38/PxS91NZ7wFAUFAQderUwcfHp9RtNGnShEcffdT4y7tr1y6T1zAt3RemFI0rNDSU1atXU6NGDa5evYq3t3ep17vatWtHYGAg1atXp3v37qW+76Utt2PHDkaOHEmjRo2oU6dOqZ9LV1dX3N3diYuL4+zZs4wdO5b169cD8MUXXzBy5EgefvhhvLy8mDx5Ml988YXJYWSsSRJWGVxcXOjevTu7d+9Gp9Pxww8/lHhg89q1a4wcOZIuXbowb948bty4UexNGzhwIKmpqXTv3h03NzdV7f/3v/8lPDyc0NBQFi9eTEZGRrFt16lTx/h/V1fXUj8svr6+xv9Xq1aNGjVqGPsHBeMz3b59u8QAcQ0bNjSeWhW6d+8eer2+2LINGjQwORhby5YtmTNnDnv27OGFF16gf//+/Oc//zEZY61atYzxAcYYNRoNYHo8pULlvQeFgy+Wp2fPnuzdu5fc3FySkpLKTVhq9oUpRePKzc1l9uzZBAUFMWXKFM6ePVtqn4s+hO/m5qZ6ufsH0yttYEA3NzfWrl3L5cuXGTZsGF26dDF+UXLt2jUWLlxIYGAggYGBvPjiixgMBm7dumVW3y0lCascYWFh7N69m6SkJDp16lQi6UybNo3+/ftz7Ngx1q9fz1NPPWWcpygKUVFRhIWFsWbNGq5evWp2u7m5ubz55ptMmzaNw4cPs3r1aho3bmxRHwp/6cvi7+9fIr7ffvutWFIE8PPzw83NrdiyV65cwc/Pr8Q2r127xp/+9Cc2bdrEsWPHaN++PZGRkRbHWJqy3gM1Ck8Lv/32W5o0acIf/vCHMpdXsy/Ks3jxYjw9PTly5AhbtmwhPDxc9TbMFRAQwLVr14yv7/+jVCgzM5Pc3FzjKeHbb7/NzJkzuX37NnXr1mXOnDmcPHmSkydPcvToURITEy3quxqSsMrRqlUr9Ho9y5cvL3E6CAV/ZQu/NTx16hSJiYno9XoAYmNjuXXrFh988AF9+/bl3XffNbvd3NxccnJy8PLyQlEUkpKSOHTokHHb1tajRw8OHjzIt99+i16vZ+vWrfzyyy/GYW/d3NzIzMykWrVqhIWFsXDhQu7du0daWhrLli0zeTTy73//mzfeeIOrV69So0YNvLy8jEdS1lTWe1AWd3f3YvdFNWzYkMcee4wlS5aUeXRlyb4oul5pffDw8KBatWpcu3aNVatW2ey97tOnD+vXr+fq1avcvXuXzz77zORyOp2OUaNGcfz4cdzd3alTpw4eHh5Ur16d3r17849//IOrV6+Sl5fH4sWLmTRpElByv1qTJCwz9OzZE51OR2BgYIl57733HkuWLKFt27ZERUUxcOBALl68yG+//cZHH33E+++/j7u7O2+++SY//fST8Zux8vj4+PDXv/6VyZMn0759e9auXcuLL77IxYsXrd09oOAazuLFi1m4cCFPP/00GzduZPXq1cbTlvDwcIYMGcI333xDZGQkderUoXv37vTp04fAwEDefPPNEtvs3r07vXv3ZuDAgbRt25bvv/+euXPnWj320t6D8jz99NNkZ2fToUMH4+lSWFgYKSkp9OjRo9T1LNkX9693v0mTJvHjjz/Stm1bhg0bRmhoKPfu3SM9Pd28naBC//796dKlC3369KF///489thjJi9X1K9fn/nz5zN79mxat27NvHnz+Pjjj/Hy8mLAgAH06NGDV199lWeeeYYff/yRpUuXotFoTO5Xa5HxsIQoYt++fcTExBgvLldGycnJ1K1b11iX4ODBg3z66afGEUEdmRxhCUHB6c+FCxf4+9//btPrR45g//79zJw5k6ysLDIzM9mwYYPDVoG6nyQsIYD09HQGDx6Mr68vvXr1snc4NjVixAhq1KhBly5d6Nq1K40aNWL8+PH2DsssckoohHAacoQlhHAakrCEEE5DEpYQwmlIwhJCOA1JWEIIpyEJSwjhNCRhCSGchiQsIYTTkIQlhHAa/w/jKWWNvvZUtQAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "eval_res = evaluate_split(df, n_workers=4)\n", "init_plotting(figsize=(3, 3))\n", "sns.histplot(eval_res['val'], bins=100)\n", "plt.xlabel('Max Tanimoto similarity to training set')\n", "plt.ylabel('Num. validation set molecules')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Store the dataset with a new `fold` column" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Converting to matchms before saving to mgf: 100%|██████████| 231104/231104 [00:24<00:00, 9487.65it/s] \n" ] } ], "source": [ "msdata.remove_column('FOLD') # Remove original MassSpecGym FOLD column\n", "msdata.add_column(name=FOLD, data=df[FOLD].tolist())\n", "msdata.to_mgf('../data/MassSpecGym_MurckoHist_split.mgf')" ] } ], "metadata": { "kernelspec": { "display_name": "dreams", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 2 }