MSM-analysis for alanine-dipeptide
==================================

This notebook collects usage examples for MSM-analysis using members of
the pyemma.msm.analysis package.

A given MSM, estimated from alanine-dipeptide simulation data at lagtime
:math:`\tau=6ps`, is used as an example to carry out analysis.

The necessary inputs are:

1. the transition matrix, 'T.dat'
2. the centers of the :math:`(\phi, \psi)` dihedral angle space regular
   grid discretization, 'grid\_centers20x20.dat'
3. the largest set of connected microstates, 'lcc.dat'

Auxiliary functions in 'plotting.py' are used to generate figures of the
estimated quantities.

Use ipythons magic % commands to activate plotting within notebook cells
------------------------------------------------------------------------

.. code:: python

    %matplotlib inline

Imports are ordered as
----------------------

1. standard library imports
2. third party imports
3. local application/library specific imports

.. code:: python

    import numpy as np

.. code:: python

    from pyemma.msm.io import read_matrix

.. code:: python

    from pyemma.msm.analysis import stationary_distribution, eigenvectors, eigenvalues, timescales, pcca

.. code:: python

    import plotting

Load necessary input data
-------------------------

Use pyemma.msm.io.read\_matrix function to read dense arrays from ascii
files. The returned objects will be dense arrays (numpy.ndarray).

.. code:: python

    T=read_matrix('T.dat')

This notebook collects usage examples for MSM-analysis using members of
the pyemma.msm.analysis package.

Starting from

.. code:: python

    centers=read_matrix('grid_centers20x20.dat')

The optional dtype (data-type) keyword allows you to specify the type of
the read data. The default value is dtype=float.

.. code:: python

    lcc=read_matrix('lcc.dat', dtype=int)

Use the integer values given by the largest connected set as indices to
"slice" the array of grid-center points. The returned array contains
only those centers corresponding to the mircrostates in the largest
connected set.

.. code:: python

    centers=centers[lcc, :]

Compute the stationary distribution using the
pyemma.msm.analysis.stationary\_distribution method.

.. code:: python

    pi=stationary_distribution(T)

The (centers, pi) tuple is fed into an adapted plotting subroutine
producing a contour plot from the scattered data. Since scatterd data
can not directly be used to produce a contour plot over the whole
:math:`(\phi, \psi)`-plane the given data is interpolated onto a regular
grid before producing a contour plot. Some of the strange-looking low
probability iso-lines may be artefacts of the interpolation.
Interpolation on the level of free energies is probably a better idea.

.. code:: python

    plotting.stationary_distribution(centers, pi)


.. image:: analysis_files/analysis_19_0.png


For :math:`T=300K` we have :math:`\beta=0.4 \frac{mol}{kJ}`. The
free-energy is defined as :math:`A_i=-\frac{1}{\beta} \log \pi_i`

.. code:: python

    A=-1.0/0.4*np.log(pi)

Since we can only estimate free-energy differences we set the
:math:`\min{A_i}=0`

.. code:: python

    A=A-A.min()

For plotting we chose the equally spaced contour levels in the interval
:math:`[0, 30] \frac{kJ}{mol}`. For the interpolation onto a regular
grid we chose cubic splines. Grid points that lie outside of the
convex-hull of the given center points are assigned the maximum value of
:math:`A`.

The plot shows the separation of the dihedral plane into a low
(free-)energy region $ 0 $ and the region of high free-energy
:math:`\phi >0`.

-  The low lying region contains three metastable sets seperated by a
   barrier of approximately :math:`4 \frac{kJ}{mol}` and approximately
   :math:`10 \frac{kJ}{mol}`.

-  There are two metastable sets in the high energy region seperated by
   a barrier of approximately :math:`7 \frac{kJ}{mol}`.

-  The barrier between the low energy and the high energy region is
   approximately :math:`23 \frac{kJ}{mol}`.

.. code:: python

    plotting.free_energy(centers, A, levels=np.linspace(0.0, 30.0, 10), method='cubic', fill_value=A.max())


.. image:: analysis_files/analysis_25_0.png


Eigenvectors
------------

We compute the right eigenvectors corresponding to the 4 largest
eigenvalues.

.. code:: python

    R=eigenvectors(T, k=4)

The first eigenvector shows a sign change from the most stable region
with :math:`\phi \leq 0` to the :math:`\phi>0` region. The slowest
process corresponds to a transition between the two most stable states
and the metastable regions with :math:`\phi>0`.

.. code:: python

    ev=R[:, 1].real
    plotting.eigenvector(centers, ev, levels=np.linspace(ev.min(), ev.max(), 10))


.. image:: analysis_files/analysis_30_0.png


The second eigenvector shows a sign change from :math:`\phi \leq 0` to
:math:`\phi>0`. The second slowest process is the transition between the
low-probability region :math:`\phi>0` and the high probability region
:math:`\phi \leq 0`.

.. code:: python

    ev=R[:, 2].real
    plotting.eigenvector(centers, ev, levels=np.linspace(ev.min(), ev.max(), 11), fmt='%.e')


.. image:: analysis_files/analysis_32_0.png


The third eigenvector shows the transition process between the least
probable meta-stable state and the rest of the accessible state space.

.. code:: python

    ev=R[:, 3].real
    plotting.eigenvector(centers, ev, levels=np.linspace(ev.min(), ev.max(), 11), fmt='%.e')


.. image:: analysis_files/analysis_34_0.png


Eigenvalues
-----------

Compute the 10 largest eigenvalues of the MSM

.. code:: python

    eigvals=eigenvalues(T)[0:11]


The first :math:`5` eigenvalues are purely real. The remaining
eigenvalues occur in complex-conjugate pairs. That is because :math:`T`
is a matrix with purely real entries.

.. code:: python

    eigvals


.. parsed-literal::

    array([ 1.00000000+0.j        ,  0.94808553+0.j        ,
            0.94092025+0.j        ,  0.66447475+0.j        ,
            0.38530146+0.j        ,  0.34550046+0.00929879j,
            0.34550046-0.00929879j,  0.24977533+0.25204877j,
            0.24977533-0.25204877j,  0.23257796+0.19019451j,
            0.23257796-0.19019451j])


There is a distinct gap in the spectrum betwenn the third and the fourth
eigenvalue.

.. code:: python

    plotting.eigenvalues(eigvals)


.. image:: analysis_files/analysis_40_0.png


Implied time scales
-------------------

Implied time scales are computed via msm.analysis.timescales. The
lagtime of the Markov model, :math:`\tau=6 ps`, can be specified via the
optional keyword tau. The default value is tau=1.

.. code:: python

    ts=timescales(T, k=5, tau=6)

.. code:: python

    ts


.. parsed-literal::

    array([          inf,  112.54805277,   98.52720209,   14.67859726,
              6.29109374])


PCCA
----

.. code:: python

    membership=pcca(T, 5)

.. code:: python

    membership_crisp=np.where(membership>0.4)

PCCA gives accurate memberships for the high probability region.
Assigning correct memberships for the low probability states,
:math:`\phi>0`, is problematic.

.. code:: python

    plotting.pcca(centers, membership_crisp)


.. image:: analysis_files/analysis_48_0.png


Summary
-------

The pyemma.msm.analysis module can be used to analyse an estimated
transition matrix. Starting from the transition matrix :math:`T` It is
possible to

-  compute the stationary vector :math:`\pi` to analyze the free-energy
   landscape given suitable (low-dimensional) coordinates
-  compute the right eigenvectors to investigate slowest dynamical
   processes
-  compute eigenvalues and time scales as quantitative information about
   system-dynamics

The pyemma.msm-API is designed to allow fast and flexible scripting of
the whole estimation and analysis process. There is a multitude of
functions for MSM analysis provided in the emma2.msm.analysis module.
Further functions are

-  checks for stochasticity, ergodicity, etc.
-  commitor computation
-  TPT
-  mean-first-passage time (mfpt) computations
-  fingerprint: expectation and autocorrelation
-  decompositions in eigenvalues, left, and right eigenvectors

We are happy for your feedback and suggestions. Please feel free to
contact our mailing list at emma@lists.fu-berlin.de