{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 05 - PCCA and TPT analysis\n",
    "\n",
    "<a rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"><img alt=\"Creative Commons Licence\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by/4.0/88x31.png\" title='This work is licensed under a Creative Commons Attribution 4.0 International License.' align=\"right\"/></a>\n",
    "\n",
    "In this notebook, we will cover how to use PCCA++ to extract a coarse representation of the MSM.\n",
    "We will further investigate how to use transition path theory (TPT) to follow the pathways of the processes.\n",
    "When we want to analyze pathways, models with fewer states are more often desirable, since these are easier to understand.\n",
    "PCCA++ allows us to assign the microstates directly to metastable macrostates and TPT uses this group assignment to compute fluxes and pathways.\n",
    "\n",
    "Another method to get a model with fewer states are hidden Markov state models (HMM),\n",
    "introduced in [Notebook 07 ➜ 📓](07-hidden-markov-state-models.ipynb).\n",
    "In contrast to computing memberships of microstates to meta stable sets as in PCCA++,\n",
    "in HMMs we directly obtain a model with fewer states.\n",
    "\n",
    "While we will mostly rely on previously estimated/validated models, it will be helpful to understand the topics\n",
    "- data loading/visualization ([Notebook 01 ➜ 📓](01-data-io-and-featurization.ipynb))\n",
    "- dimension reduction ([Notebook 02 ➜ 📓](02-dimension-reduction-and-discretization.ipynb))\n",
    "- the estimation and validation process ([Notebook 03 ➜ 📓](03-msm-estimation-and-validation.ipynb))\n",
    "\n",
    "Here you can find literature on the used methods:\n",
    "- <a id=\"ref-1\" href=\"#cite-pcca_plus_plus\">roeblitz-weber-14</a>\n",
    "- <a id=\"ref-2\" href=\"#cite-weinan-tpt\">weinan-06</a>\n",
    "- <a id=\"ref-3\" href=\"#cite-metzner-msm-tpt\">metzner-09</a>\n",
    "\n",
    "Maintainers: [@cwehmeyer](https://github.com/cwehmeyer), [@marscher](https://github.com/marscher), [@thempel](https://github.com/thempel), [@psolsson](https://github.com/psolsson)\n",
    "\n",
    "**Remember**:\n",
    "- to run the currently highlighted cell, hold <kbd>&#x21E7; Shift</kbd> and press <kbd>&#x23ce; Enter</kbd>;\n",
    "- to get help for a specific function, place the cursor within the function's brackets, hold <kbd>&#x21E7; Shift</kbd>, and press <kbd>&#x21E5; Tab</kbd>;\n",
    "- you can find the full documentation at [PyEMMA.org](http://www.pyemma.org).\n",
    "\n",
    "---\n",
    "\n",
    "⚠️ We have assigned the integer numbers $1 \\dots $ `nstates` to PCCA++ metastable states.\n",
    "As PyEMMA is written in Python, it internally indexes states starting from $0$.\n",
    "In consequence, numbers in the code cells differ by $-1$ from the plot labels and markdown text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib as mpl\n",
    "import numpy as np\n",
    "import mdshare\n",
    "import pyemma"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Case 1: preprocessed, two-dimensional data (toy model)\n",
    "We start by loading the data and the previously analyzed MSM  (estimated in [Notebook 04 ➜ 📓](04-msm-analysis.ipynb)) from disk:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "file = mdshare.fetch('hmm-doublewell-2d-100k.npz', working_directory='data')\n",
    "with np.load(file) as fh:\n",
    "    data = fh['trajectory']\n",
    "\n",
    "msm = pyemma.load('nb4.pyemma', model_name='doublewell_msm')\n",
    "bayesian_msm = pyemma.load('nb4.pyemma', model_name='doublewell_bayesian_msm')\n",
    "cluster = pyemma.load('nb4.pyemma', model_name='doublewell_cluster')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We currently have an MSM with $50$ discrete states which was validated for two metastable states in the previous notebook.\n",
    "Internally, the metastable states have been computed using the Perron Cluster Cluster Analysis (PCCA++) method <a id=\"ref-4\" href=\"#cite-pcca_plus_plus\">roeblitz-14</a>.\n",
    "Let's analyze this in more detail here.\n",
    "We can explicitly compute it by calling `msm.pcca()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nstates = 2\n",
    "msm.pcca(nstates)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PCCA++ computes membership distributions, i.e., probabilities of micro-states to belong to the same metastable state.\n",
    "It does so by using the properties of slow processes in eigenvector space.\n",
    "Let us visualize the membership distributions in the same fashion as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 2, figsize=(10, 4))\n",
    "for i, ax in enumerate(axes.flat):\n",
    "    pyemma.plots.plot_contour(\n",
    "        *data.T, msm.metastable_distributions[i][cluster.dtrajs[0]], ax=ax, cmap='afmhot_r', \n",
    "        mask=True, method='nearest', cbar_label='metastable distribution {}'.format(i + 1))\n",
    "    ax.scatter(*cluster.clustercenters.T, s=15, c='k')\n",
    "    ax.set_xlabel('$x$')\n",
    "    ax.set_xlim(-4, 4)\n",
    "    ax.set_ylim(-4, 4)\n",
    "    ax.set_aspect('equal')\n",
    "axes[0].set_ylabel('$y$')\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, PCCA++ has assigned metastable states to the basins of the double well.\n",
    "Since PCCA++, in simplified words, does a clustering in eigenvector space and the first eigenvector separated these states already, the nice separation comes to no surprise.\n",
    "\n",
    "It is important to note, though, that PCCA++ in general does not yield a coarse transition matrix.\n",
    "How to obtain this will be covered in [Notebook 07 ➜ 📓](07-hidden-markov-state-models.ipynb).\n",
    "However, we can compute mean first passage times (MFPTs) and equilibrium probabilities on the metastable sets and extract representative structures.\n",
    "\n",
    "The stationary probability of metastable states can simply be computed by summing over all of its micro-states\n",
    "(please note that the PCCA++ object returned by `msm.pcca()` also has a convenience function to do that.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i, s in enumerate(msm.metastable_sets):\n",
    "    print('π_{} = {:f}'.format(i + 1, msm.pi[s].sum()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the `mfpt()` method of the original MSM object to compute MFPTs between pairs of metastable sets\n",
    "(accessible via the `metastable_sets` attribute of the MSM object). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mfpt = np.zeros((nstates, nstates))\n",
    "for i in range(nstates):\n",
    "    for j in range(nstates):\n",
    "        mfpt[i, j] = msm.mfpt(\n",
    "            msm.metastable_sets[i],\n",
    "            msm.metastable_sets[j])\n",
    "\n",
    "from pandas import DataFrame\n",
    "print('MFPT / steps:')\n",
    "DataFrame(np.round(mfpt, decimals=2), index=range(1, nstates + 1), columns=range(1, nstates + 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As described above, the errors can be estimated from the Bayesian MSM.\n",
    "Instead of just printing means and confidence intervals, let's compute the samples explicitly and histogram them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mfpt_sample = np.zeros((nstates, nstates, bayesian_msm.nsamples))\n",
    "for i in range(nstates):\n",
    "    for j in range(nstates):\n",
    "        mfpt_sample[i, j] = bayesian_msm.sample_f(\n",
    "            'mfpt',\n",
    "            msm.metastable_sets[i],\n",
    "            msm.metastable_sets[j])\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.hist(mfpt_sample[0, 1], histtype='step', label='MS 1 -> MS 2', density=True)\n",
    "ax.hist(mfpt_sample[1, 0], histtype='step', label='MS 2 -> MS 1', density=True)\n",
    "ax.set_xlabel('MFPT (steps)')\n",
    "ax.set_title('Bayesian MFPT sample histograms')\n",
    "fig.legend(loc=10);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We clearly see that there is no overlap of the distributions approximated by the Bayesian MSM.\n",
    "\n",
    "To do a more detailed analysis of the transition paths, we make use of transition path theory (TPT) in its MSM formulation.\n",
    "We first analyze the flux between the two metastable sets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = msm.metastable_sets[0]\n",
    "B = msm.metastable_sets[1]\n",
    "flux = pyemma.msm.tpt(msm, A, B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In TPT, many properties are derived from the committor functions.\n",
    "They describe the probability of reaching a set $A$ before set $B$ as a function of some state $x$.\n",
    "In order to understand this, we plot the committor of the previously defined sets as a function of the cluster centers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(5, 4))\n",
    "pyemma.plots.plot_contour(\n",
    "    *data.T,\n",
    "    flux.committor[cluster.dtrajs[0]],\n",
    "    ax=ax,\n",
    "    cmap='brg', \n",
    "    mask=True,\n",
    "    cbar_label=r'committor A $\\to$ B')\n",
    "ax.set_xlabel('$x$')\n",
    "ax.set_xlim(-4, 4)\n",
    "ax.set_ylim(-4, 4)\n",
    "ax.set_aspect('equal')\n",
    "ax.set_ylabel('$y$')\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the committor for the double well data approximates a step function between the two basins.\n",
    "In other words, the probability of transitioning from metastable state $A$ to $B$ is only $1$ if we already are in state $B$.\n",
    "If we are in $A$, it is $0$ by definition.\n",
    "The clustering did not resolve the transition region, so this particular example does not provide more information.\n",
    "In the next example we will see more.\n",
    "\n",
    "## Case 2: low-dimensional molecular dynamics data (alanine dipeptide)\n",
    "\n",
    "Again, we load the model that we have estimated previously."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pdb = mdshare.fetch('alanine-dipeptide-nowater.pdb', working_directory='data')\n",
    "files = mdshare.fetch('alanine-dipeptide-*-250ns-nowater.xtc', working_directory='data')\n",
    "\n",
    "feat = pyemma.coordinates.featurizer(pdb)\n",
    "feat.add_backbone_torsions(periodic=False)\n",
    "data = pyemma.coordinates.load(files, features=feat)\n",
    "data_concatenated = np.concatenate(data)\n",
    "\n",
    "msm = pyemma.load('nb4.pyemma', model_name='ala2_msm')\n",
    "bayesian_msm = pyemma.load('nb4.pyemma', model_name='ala2_bayesian_msm')\n",
    "cluster = pyemma.load('nb4.pyemma', model_name='ala2_cluster')\n",
    "\n",
    "# not to be used in MSM estimation (artificical transitions between individual trajectories)!\n",
    "dtrajs_concatenated = np.concatenate(cluster.dtrajs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous [Notebook 04 ➜ 📓](04-msm-analysis.ipynb), we saw that four metastable states are a reasonable choice for our MSM.\n",
    "We, thus, perform PCCA++ with this number of states for further analysis and print out the stationary probabilities of the metastable sets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nstates = 4\n",
    "msm.pcca(nstates)\n",
    "for i, s in enumerate(msm.metastable_sets):\n",
    "    print('π_{} = {:f}'.format(i + 1, msm.pi[s].sum()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We visualize the metastable memberships:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 4, figsize=(15, 3))\n",
    "for i, ax in enumerate(axes.flat):\n",
    "    pyemma.plots.plot_contour(\n",
    "        *data_concatenated.T,\n",
    "        msm.metastable_distributions[i][dtrajs_concatenated],\n",
    "        ax=ax,\n",
    "        cmap='afmhot_r', \n",
    "        mask=True,\n",
    "        cbar_label='metastable distribution {}'.format(i + 1))\n",
    "    ax.set_xlabel('$\\Phi$')\n",
    "axes[0].set_ylabel('$\\Psi$')\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PCCA++ nicely separates the high-density regions and we find that each of the basins was assigned a metastable set.\n",
    "This indicates that our projection indeed describes the slow dynamics.\n",
    "\n",
    "We concatenate all three discrete trajectories and obtain a single trajectory of metastable states which we use to visualize the metastable state memberships of all datapoints.\n",
    "We further compute the state with the highest membership to a PCCA metastable state to plot a state label there.\n",
    "\n",
    "⚠️ Please remember that the concatenated discrete trajectories (dtrajs) are not meant to be used for MSM estimation (artificial transitions), but only for visualization and indexing purposes!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metastable_traj = msm.metastable_assignments[dtrajs_concatenated]\n",
    "highest_membership = msm.metastable_distributions.argmax(1)\n",
    "coarse_state_centers = cluster.clustercenters[msm.active_set[highest_membership]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we use the `mfpt()` method of the MSM object to compute MFPTs between pairs of metastable sets and compute the inverse MFPTs for visualization purposes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mfpt = np.zeros((nstates, nstates))\n",
    "for i in range(nstates):\n",
    "    for j in range(nstates):\n",
    "        mfpt[i, j] = msm.mfpt(\n",
    "            msm.metastable_sets[i],\n",
    "            msm.metastable_sets[j])\n",
    "\n",
    "inverse_mfpt = np.zeros_like(mfpt)\n",
    "nz = mfpt.nonzero()\n",
    "inverse_mfpt[nz] = 1.0 / mfpt[nz]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We visualize our model in backbone torsion space:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(10, 7))\n",
    "_, _, misc = pyemma.plots.plot_state_map(\n",
    "    *data_concatenated.T, metastable_traj, ax=ax, zorder=-1)\n",
    "misc['cbar'].set_ticklabels(range(1, nstates + 1))  # set state numbers 1 ... nstates\n",
    "\n",
    "pyemma.plots.plot_network(\n",
    "    inverse_mfpt,\n",
    "    pos=coarse_state_centers,\n",
    "    figpadding=0,\n",
    "    arrow_label_format='%.1f ps',\n",
    "    arrow_labels=mfpt,\n",
    "    size=12,\n",
    "    show_frame=True,\n",
    "    ax=ax)\n",
    "\n",
    "ax.set_xlabel('$\\Phi$')\n",
    "ax.set_ylabel('$\\Psi$')\n",
    "ax.set_xlim(-np.pi, np.pi)\n",
    "ax.set_ylim(-np.pi, np.pi)\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Have you noticed how well the metastable state coloring agrees with the eigenvector visualization of the three slowest processes?\n",
    "\n",
    "If we could afford a shorter lag time, we might even be able to resolve more processes and, thus,\n",
    "subdivide the metastable states three and four.\n",
    "We show how to do this with HMMs in [Notebook 07 ➜ 📓](07-hidden-markov-state-models.ipynb).\n",
    "\n",
    "Now we define a small function to visualize samples of metastable states with NGLView."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def visualize_metastable(samples, cmap, selection='backbone'):\n",
    "    \"\"\" visualize metastable states\n",
    "    Parameters\n",
    "    ----------\n",
    "    samples: list of mdtraj.Trajectory objects\n",
    "        each element contains all samples for one metastable state.\n",
    "    cmap: matplotlib.colors.ListedColormap\n",
    "        color map used to visualize metastable states before.\n",
    "    selection: str\n",
    "        which part of the molecule to selection for visualization. For details have a look here:\n",
    "        http://mdtraj.org/latest/examples/atom-selection.html#Atom-Selection-Language\n",
    "    \"\"\"\n",
    "    import nglview\n",
    "    from matplotlib.colors import to_hex\n",
    "\n",
    "    widget = nglview.NGLWidget()\n",
    "    widget.clear_representations()\n",
    "    ref = samples[0]\n",
    "    for i, s in enumerate(samples):\n",
    "        s = s.superpose(ref)\n",
    "        s = s.atom_slice(s.top.select(selection))\n",
    "        comp = widget.add_trajectory(s)\n",
    "        comp.add_ball_and_stick()\n",
    "\n",
    "    # this has to be done in a separate loop for whatever reason...\n",
    "    x = np.linspace(0, 1, num=len(samples))\n",
    "    for i, x_ in enumerate(x):\n",
    "        c = to_hex(cmap(x_))\n",
    "        widget.update_ball_and_stick(color=c, component=i, repr_index=i)\n",
    "        widget.remove_cartoon(component=i)\n",
    "    return widget"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now sample some representative structures and visualize these with the aid of NGLView.\n",
    "For the sake of clarity, we draw only the backbone atoms.\n",
    "Since we have obtained several samples for each metastable state, you can click the play button to iterate over all samples.\n",
    "For each iteration, the samples of all four states will be drawn.\n",
    "You can double click the molecule to show it at full screen.\n",
    "Press escape to go back. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cmap = mpl.cm.get_cmap('viridis', nstates)\n",
    "\n",
    "my_samples = [pyemma.coordinates.save_traj(files, idist, outfile=None, top=pdb)\n",
    "              for idist in msm.sample_by_distributions(msm.metastable_distributions, 50)]\n",
    "\n",
    "visualize_metastable(my_samples, cmap, selection='backbone')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Coming back to TPT, we now have more than two metastable states and can expect more insights from analyzing the transition paths.\n",
    "As an example, we will focus on the committor between metastable sets $0$ and $3$ as defined above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = msm.metastable_sets[0]\n",
    "B = msm.metastable_sets[3]\n",
    "flux = pyemma.msm.tpt(msm, A, B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we go on with the visualization, let's coarse grain the flux with the metastable sets estimated with PCCA++:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cg, cgflux = flux.coarse_grain(msm.metastable_sets)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now show an overlay of the committor probabilities and the most likely transition path from the coarse graining TPT:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(figsize=(10, 7))\n",
    "\n",
    "pyemma.plots.plot_contour(\n",
    "    *data_concatenated.T,\n",
    "    flux.committor[dtrajs_concatenated],\n",
    "    cmap='brg',\n",
    "    ax=ax,\n",
    "    mask=True,\n",
    "    cbar_label=r'committor 1 $\\to$ 4',\n",
    "    alpha=0.8,\n",
    "    zorder=-1);\n",
    "\n",
    "pyemma.plots.plot_flux(\n",
    "    cgflux,\n",
    "    coarse_state_centers,\n",
    "    cgflux.stationary_distribution,\n",
    "    state_labels=['A','' ,'', 'B'], \n",
    "    ax=ax,\n",
    "    show_committor=False,\n",
    "    figpadding=0,\n",
    "    show_frame=True,\n",
    "    arrow_label_format='%2.e / ps');\n",
    "\n",
    "ax.set_xlabel('$\\Phi$')\n",
    "ax.set_ylabel('$\\Psi$')\n",
    "ax.set_xlim(-np.pi, np.pi)\n",
    "ax.set_ylim(-np.pi, np.pi)\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, the color map shows us a region with committor probability $\\approx 0.5$.\n",
    "This indicates that this particular metastable state is a transition state in the pathway from $A$ to $B$.\n",
    "Second, the `plot_flux()` function displays the most likely transition pathway along this path.\n",
    "There are other, less likely pathways included in the plot as well.\n",
    "The arrow thickness indicates the flux between the states.\n",
    "\n",
    "We can decompose the flux into these individual pathways by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "paths, path_fluxes = cgflux.pathways(fraction=0.99)\n",
    "print('percentage       \\tpath')\n",
    "print('-------------------------------------')\n",
    "for i in range(len(paths)):\n",
    "    print(np.round(path_fluxes[i] / np.sum(path_fluxes), 3),' \\t', paths[i] + 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected, about $85\\%$ of the flux goes through only one pathway.\n",
    "To get a cleaner picture, the `plot_flux()` function supports a `minflux` keyword argument that can be increased to exclude very low fluxes from the plot.\n",
    "\n",
    "#### Exercise 1\n",
    "\n",
    "Define a `featurizer` that loads the heavy atom coordinates and load the data into memory.\n",
    "Also load the TICA object from [Notebook 04 ➜ 📓](04-msm-analysis.ipynb) to transform the featurized data.\n",
    "Further, the estimated MSM, Bayesian MSM, and Cluster objects should be loaded from disk. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "feat = #FIXME\n",
    "feat. #FIXME\n",
    "data = #FIXME\n",
    "\n",
    "tica = #FIXME\n",
    "tica_output = tica.transform(data)\n",
    "tica_concatenated = np.concatenate(tica_output)\n",
    "\n",
    "msm = #FIXME\n",
    "bayesian_msm = #FIXME\n",
    "\n",
    "cluster = #FIXME\n",
    "dtrajs_concatenated = #FIXME"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "solution2": "hidden"
   },
   "source": [
    "###### Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "feat = pyemma.coordinates.featurizer(pdb)\n",
    "pairs = feat.pairs(feat.select_Heavy())\n",
    "feat.add_distances(pairs, periodic=False)\n",
    "data = pyemma.coordinates.load(files, features=feat)\n",
    "\n",
    "tica = pyemma.load('nb4.pyemma', model_name='ala2tica_tica')\n",
    "tica_output = tica.transform(data)\n",
    "tica_concatenated = np.concatenate(tica_output)\n",
    "\n",
    "msm = pyemma.load('nb4.pyemma', model_name='ala2tica_msm')\n",
    "bayesian_msm = pyemma.load('nb4.pyemma', model_name='ala2tica_bayesian_msm')\n",
    "\n",
    "cluster = pyemma.load('nb4.pyemma', model_name='ala2tica_cluster')\n",
    "dtrajs_concatenated = np.concatenate(cluster.dtrajs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 2\n",
    "\n",
    "Do a PCCA++ analysis of the MSM with four metastable states,\n",
    "compute the probability of the metastable sets, and visualize the metastable state memberships."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "nstates = 4\n",
    "#FIXME (PCCA)\n",
    "\n",
    "for i, s in enumerate(msm.metastable_sets):\n",
    "    print('π_{} = {:f}'.format(i + 1, )) #FIXME\n",
    "\n",
    "    \n",
    "fig, axes = plt.subplots(1, 4, figsize=(15, 3))\n",
    "for i, ax in enumerate(axes.flat):\n",
    "    pyemma.plots.plot_contour(\n",
    "        *tica_concatenated.T,\n",
    "        msm.metastable_distributions[i][dtrajs_concatenated],\n",
    "        ax=ax,\n",
    "        cmap='afmhot_r', \n",
    "        mask=True,\n",
    "        cbar_label='metastable distribution {}'.format(i + 1))\n",
    "    ax.set_xlabel('IC 1')\n",
    "axes[0].set_ylabel('IC 2')\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "solution2": "hidden"
   },
   "source": [
    "###### Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "nstates = 4\n",
    "msm.pcca(nstates)\n",
    "\n",
    "for i, s in enumerate(msm.metastable_sets):\n",
    "    print('π_{} = {:f}'.format(i + 1, msm.pi[s].sum()))\n",
    "\n",
    "fig, axes = plt.subplots(1, 4, figsize=(15, 3))\n",
    "for i, ax in enumerate(axes.flat):\n",
    "    pyemma.plots.plot_contour(\n",
    "        *tica_concatenated.T,\n",
    "        msm.metastable_distributions[i][dtrajs_concatenated],\n",
    "        ax=ax,\n",
    "        cmap='afmhot_r', \n",
    "        mask=True,\n",
    "        cbar_label='metastable distribution {}'.format(i + 1))\n",
    "    ax.set_xlabel('IC 1')\n",
    "axes[0].set_ylabel('IC 2')\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Did you guess the metastable states correctly?\n",
    "\n",
    "Note the similarities between the MSM built from the backbone torsions and the MSM built from the TICA projection of heavy atom distances.\n",
    "Even though we started from different features, both models found the same kinetic information in the data.\n",
    "\n",
    "#### Exercise 3\n",
    "\n",
    "Compute the pairwise MFPTs and transition rates, and visualize the resulting kinetic network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "mfpt = np.zeros((nstates, nstates))\n",
    "for i in range(nstates):\n",
    "    for j in range(nstates):\n",
    "        mfpt[i, j] = #FIXME\n",
    "\n",
    "inverse_mfpt = np.zeros_like(mfpt)\n",
    "nz = mfpt.nonzero()\n",
    "inverse_mfpt[nz] = 1.0 / mfpt[nz]\n",
    "\n",
    "pyemma.plots.plot_network(\n",
    "    inverse_mfpt,\n",
    "    pos=np.asarray([[0, 0], [4, 0], [2, 4], [6, 4]]),\n",
    "    arrow_label_format='%.1f ps',\n",
    "    arrow_labels=mfpt,\n",
    "    arrow_scale=3.0,\n",
    "    state_labels=range(1, nstates + 1),\n",
    "    size=12);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "solution2": "hidden"
   },
   "source": [
    "###### Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "mfpt = np.zeros((nstates, nstates))\n",
    "for i in range(nstates):\n",
    "    for j in range(nstates):\n",
    "        mfpt[i, j] = msm.mfpt(\n",
    "            msm.metastable_sets[i],\n",
    "            msm.metastable_sets[j])\n",
    "\n",
    "inverse_mfpt = np.zeros_like(mfpt)\n",
    "nz = mfpt.nonzero()\n",
    "inverse_mfpt[nz] = 1.0 / mfpt[nz]\n",
    "\n",
    "pyemma.plots.plot_network(\n",
    "    inverse_mfpt,\n",
    "    pos=np.asarray([[0, 0], [4, 0], [2, 4], [6, 4]]),\n",
    "    arrow_label_format='%.1f ps',\n",
    "    arrow_labels=mfpt,\n",
    "    arrow_scale=3.0,\n",
    "    state_labels=range(1, nstates + 1),\n",
    "    size=12);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Exercise 4\n",
    "Compute the TPT object, coarse grain it onto the PCCA++ metastable sets and visualize the flux along with the committor probabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden",
    "solution2_first": true
   },
   "outputs": [],
   "source": [
    "A = msm.metastable_sets[0]\n",
    "B = msm.metastable_sets[3]\n",
    "flux = #FIXME\n",
    "\n",
    "cg, cgflux = #FIXME\n",
    "\n",
    "highest_membership = msm.metastable_distributions.argmax(1)\n",
    "coarse_state_centers = cluster.clustercenters[msm.active_set[highest_membership]]\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 7))\n",
    "\n",
    "pyemma.plots.plot_contour(\n",
    "    *tica_concatenated.T,\n",
    "    flux.committor[dtrajs_concatenated],\n",
    "    cmap='brg',\n",
    "    ax=ax,\n",
    "    mask=True,\n",
    "    cbar_label=r'committor 1 $\\to$ 4',\n",
    "    alpha=0.8,\n",
    "    zorder=-1)\n",
    "\n",
    "pyemma.plots.plot_flux(\n",
    "    cgflux,\n",
    "    coarse_state_centers,\n",
    "    cgflux.stationary_distribution,\n",
    "    ax=ax,\n",
    "    show_committor=False,\n",
    "    figpadding=0.2,\n",
    "    state_labels=['A', '', '', 'B'],\n",
    "    arrow_label_format='%2.e / ps')\n",
    "\n",
    "ax.set_aspect('equal')\n",
    "ax.set_xlim(tica_concatenated[:, 0].min(), tica_concatenated[:, 0].max())\n",
    "ax.set_ylim(tica_concatenated[:, 1].min(), tica_concatenated[:, 1].max())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "solution2": "hidden"
   },
   "source": [
    "###### Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "A = msm.metastable_sets[0]\n",
    "B = msm.metastable_sets[3]\n",
    "flux = pyemma.msm.tpt(msm, A, B)\n",
    "\n",
    "cg, cgflux = flux.coarse_grain(msm.metastable_sets)\n",
    "\n",
    "highest_membership = msm.metastable_distributions.argmax(1)\n",
    "coarse_state_centers = cluster.clustercenters[msm.active_set[highest_membership]]\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(10, 7))\n",
    "\n",
    "pyemma.plots.plot_contour(\n",
    "    *tica_concatenated.T,\n",
    "    flux.committor[dtrajs_concatenated],\n",
    "    cmap='brg',\n",
    "    ax=ax,\n",
    "    mask=True,\n",
    "    cbar_label=r'committor 1 $\\to$ 4',\n",
    "    zorder=-1)\n",
    "\n",
    "pyemma.plots.plot_flux(\n",
    "    cgflux,\n",
    "    coarse_state_centers,\n",
    "    cgflux.stationary_distribution,\n",
    "    ax=ax,\n",
    "    show_committor=False,\n",
    "    figpadding=0.2,\n",
    "    state_labels=['A', '', '', 'B'],\n",
    "    arrow_label_format='%2.e / ps')\n",
    "\n",
    "ax.set_xlabel('IC 1')\n",
    "ax.set_ylabel('IC 2')\n",
    "ax.set_aspect('equal')\n",
    "ax.set_xlim(tica_concatenated[:, 0].min(), tica_concatenated[:, 0].max())\n",
    "ax.set_ylim(tica_concatenated[:, 1].min(), tica_concatenated[:, 1].max())\n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrapping up\n",
    "In this notebook, we have learned how to use PCCA++ using an existing MSM and how to extract kinetic information from the model.\n",
    "In detail, we have used\n",
    "- the `pcca()` method of an MSM object to find metastable states,\n",
    "- the `mfpt()` method of an MSM object to compute mean first passage times between metastable states which, in turn, are accessible via\n",
    "- the `metastable_sets` and `metastable_assignments` attributes of an MSM object.\n",
    "\n",
    "For visualizing MSMs or kinetic networks we used\n",
    "- `pyemma.plots.plot_density()`, `pyemma.plots.plot_contour()` and\n",
    "- `pyemma.plots.plot_network()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "<a id=\"cite-pcca_plus_plus\"/><sup><a href=#ref-1>[^]</a><a href=#ref-4>[^]</a></sup>Susanna Röblitz and Marcus Weber. 2013. _Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification_. [URL](https://doi.org/10.1007/s11634-013-0134-6)\n",
    "\n",
    "<a id=\"cite-weinan-tpt\"/><sup><a href=#ref-2>[^]</a></sup>Weinan E. and Eric Vanden-Eijnden. 2006. _Towards a Theory of Transition Paths_. [URL](https://doi.org/10.1007/s10955-005-9003-9)\n",
    "\n",
    "<a id=\"cite-metzner-msm-tpt\"/><sup><a href=#ref-3>[^]</a></sup>Philipp Metzner and Christof Schütte and Eric Vanden-Eijnden. 2009. _Transition Path Theory for Markov Jump Processes_. [URL](https://doi.org/10.1137/070699500)\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}