pyemma.coordinates.transform.TICA¶

class pyemma.coordinates.transform.TICA(*args, **kwargs)¶

Time-lagged independent component analysis (TICA)

__init__(lag, dim=- 1, var_cutoff=0.95, kinetic_map=True, commute_map=False, epsilon=1e-06, stride=1, skip=0, reversible=True, weights=None, ncov_max=inf)¶

Time-lagged independent component analysis (TICA) 1, 2, 3.

Parameters

lag (int) – lag time
dim (int, optional, default -1) – Maximum number of significant independent components to use to reduce dimension of input data. -1 means all numerically available dimensions (see epsilon) will be used unless reduced by var_cutoff. Setting dim to a positive value is exclusive with var_cutoff.
var_cutoff (float in the range [0,1], optional, default 0.95) – Determines the number of output dimensions by including dimensions until their cumulative kinetic variance exceeds the fraction subspace_variance. var_cutoff=1.0 means all numerically available dimensions (see epsilon) will be used, unless set by dim. Setting var_cutoff smaller than 1.0 is exclusive with dim
kinetic_map (bool, optional, default True) – Eigenvectors will be scaled by eigenvalues. As a result, Euclidean distances in the transformed data approximate kinetic distances 4. This is a good choice when the data is further processed by clustering.
commute_map (bool, optional, default False) – Eigenvector_i will be scaled by sqrt(timescale_i / 2). As a result, Euclidean distances in the transformed data will approximate commute distances 5.
epsilon (float) – eigenvalue norm cutoff. Eigenvalues of C0 with norms <= epsilon will be cut off. The remaining number of eigenvalues define the size of the output.
stride (int, optional, default = 1) – Use only every stride-th time step. By default, every time step is used.
skip (int, default=0) – skip the first initial n frames per trajectory.
reversible (bool, default=True) – symmetrize correlation matrices C_0, C_{tau}.
weights (object or list of ndarrays, optional, default = None) –
- An object that allows to compute re-weighting factors to estimate equilibrium means and correlations from off-equilibrium data. The only requirement is that weights possesses a method weights(X), that accepts a trajectory X (np.ndarray(T, n)) and returns a vector of re-weighting factors (np.ndarray(T,)).
- A list of ndarrays (ndim=1) specifies the weights for each frame of each trajectory.

Notes

Given a sequence of multivariate data \(X_t\), computes the mean-free covariance and time-lagged covariance matrix:

\[\begin{split}C_0 &= (X_t - \mu)^T (X_t - \mu) \\ C_{\tau} &= (X_t - \mu)^T (X_{t + \tau} - \mu)\end{split}\]

and solves the eigenvalue problem

\[C_{\tau} r_i = C_0 \lambda_i(tau) r_i,\]

where \(r_i\) are the independent components and \(\lambda_i(tau)\) are their respective normalized time-autocorrelations. The eigenvalues are related to the relaxation timescale by

\[t_i(tau) = -\tau / \ln |\lambda_i|.\]

When used as a dimension reduction method, the input data is projected onto the dominant independent components.

References

1(1,2): Perez-Hernandez G, F Paul, T Giorgino, G De Fabritiis and F Noe. 2013. Identification of slow molecular order parameters for Markov model construction J. Chem. Phys. 139, 015102. doi:10.1063/1.4811489
2(1,2): Schwantes C, V S Pande. 2013. Improvements in Markov State Model Construction Reveal Many Non-Native Interactions in the Folding of NTL9 J. Chem. Theory. Comput. 9, 2000-2009. doi:10.1021/ct300878a
3(1,2): L. Molgedey and H. G. Schuster. 1994. Separation of a mixture of independent signals using time delayed correlations Phys. Rev. Lett. 72, 3634.
4: Noe, F. and Clementi, C. 2015. Kinetic distance and kinetic maps from molecular dynamics simulation. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.5b00553
5: Noe, F., Banisch, R., Clementi, C. 2016. Commute maps: separating slowly-mixing molecular configurations for kinetic modeling. J. Chem. Theory. Comput. doi:10.1021/acs.jctc.6b00762

Methods

`_Loggable__create_logger`()
`_SerializableMixIn__interpolate`(state, klass)
`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__getstate__`()
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`(lag[, dim, var_cutoff, …])	Time-lagged independent component analysis (TICA) 1, 2, 3.
`__init_subclass__`(args, *kwargs)	This method is called when a class is subclassed.
`__iter__`()
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__my_getstate__`()
`__my_setstate__`(state)
`__ne__`(value, /)	Return self!=value.
`__new__`(cls, args, *kwargs)	Create and return a new object.
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__setstate__`(state)
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`_check_estimated`()
`_chunk_finite`(data)
`_cleanup_logger`(logger_id, logger_name)
`_clear_in_memory`()
`_compute_default_cs`(dim, itemsize[, logger])
`_create_iterator`([skip, chunk, stride, …])	Should be implemented by non-abstract subclasses.
`_data_flow_chain`()	Get a list of all elements in the data flow graph.
`_diagonalize`()
`_estimate`(iterable, **kw)
`_get_classes_to_inspect`()	gets classes self derives from which 1.
`_get_interpolation_map`(cls)
`_get_param_names`()	Get parameter names for the estimator
`_get_private_field`(cls, name[, default])
`_get_serialize_fields`(cls)
`_get_state_of_serializeable_fields`(klass, state)	:return a dictionary {k:v} for k in self.serialize_fields and v=getattr(self, k)
`_get_traj_info`(filename)
`_get_version`(cls[, require])
`_get_version_for_class_from_state`(state, klass)	retrieves the version of the current klass from the state mapping from old locations to new ones.
`_logger_is_active`(level)	@param level: int log level (debug=10, info=20, warn=30, error=40, critical=50)
`_map_to_memory`([stride])	Maps results to memory.
`_set_random_access_strategies`()
`_set_state_from_serializeable_fields_and_state`(…)	set only fields from state, which are present in klass.__serialize_fields
`_source_from_memory`([data_producer])
`_transform_array`(X)	Projects the data onto the dominant independent components.
`describe`()	Get a descriptive string representation of this class.
`dimension`()	output dimension
`estimate`(X, **kwargs)	Chunk-based parameterization of TICA.
`fit`(X[, y])	Estimates parameters - for compatibility with sklearn.
`fit_transform`(X[, y])	Fit to data, then transform it.
`get_output`([dimensions, stride, skip, chunk])	Maps all input data of this transformer and returns it as an array or list of arrays
`get_params`([deep])	Get parameters for this estimator.
`iterator`([stride, lag, chunk, …])	creates an iterator to stream over the (transformed) data.
`load`(file_name[, model_name])	Loads a previously saved PyEMMA object from disk.
`n_chunks`(chunksize[, stride, skip])	how many chunks an iterator of this sourcde will output, starting (eg.
`n_frames_total`([stride, skip])	Returns total number of frames.
`number_of_trajectories`([stride])	Returns the number of trajectories.
`output_type`()	By default transformers return single precision floats.
`partial_fit`(X)	incrementally update the covariances and mean.
`save`(file_name[, model_name, overwrite, …])	saves the current state of this object to given file and name.
`set_params`(**params)	Set the parameters of this estimator.
`trajectory_length`(itraj[, stride, skip])	Returns the length of trajectory of the requested index.
`trajectory_lengths`([stride, skip])	Returns the length of each trajectory.
`transform`(X)	Maps the input data through the transformer to correspondingly shaped output data array/list.
`write_to_csv`([filename, extension, …])	write all data to csv with numpy.savetxt
`write_to_hdf5`(filename[, group, …])	writes all data of this Iterable to a given HDF5 file.

Attributes

`_DEFAULT_VARIANCE_CUTOFF`
`_DataSource__serialize_fields`
`_Estimator__serialize_fields`
`_FALLBACK_CHUNKSIZE`
`_InMemoryMixin__serialize_fields`
`_InMemoryMixin__serialize_version`
`_Loggable__ids`
`_Loggable__refs`
`_SerializableMixIn__serialize_fields`
`_SerializableMixIn__serialize_modifications_map`
`_SerializableMixIn__serialize_version`
`_TICA__serialize_version`
`__abstractmethods__`
`__dict__`
`__doc__`
`__module__`
`__weakref__`	list of weak references to the object (if defined)
`_abc_impl`
`_estimated`
`_loglevel_CRITICAL`
`_loglevel_DEBUG`
`_loglevel_ERROR`
`_loglevel_INFO`
`_loglevel_WARN`
`_save_data_producer`
`_serialize_version`
`chunksize`	chunksize defines how much data is being processed at once.
`cov`	covariance matrix of input data.
`cov_tau`	covariance matrix of time-lagged input data.
`cumvar`	Cumulative sum of the the TICA eigenvalues
`data_producer`	The data producer for this data source object (can be another data source object).
`default_chunksize`	How much data will be processed at once, in case no chunksize has been provided.
`dim`	output dimension (input parameter).
`eigenvalues`	Eigenvalues of the TICA problem (usually denoted \(\lambda\))
`eigenvectors`	Eigenvectors of the TICA problem, columnwise
`feature_TIC_correlation`	Instantaneous correlation matrix between mean-free input features and TICs
`filenames`	list of file names the data is originally being read from.
`in_memory`	are results stored in memory?
`is_random_accessible`	Check if self._is_random_accessible is set to true and if all the random access strategies are implemented.
`is_reader`	Property telling if this data source is a reader or not.
`lag`	lag time of correlation matrix \(C_{ au}\)
`logger`	The logger for this class instance
`mean`	mean of input features
`model`	The model estimated by this Estimator
`name`	The name of this instance
`ndim`
`ntraj`
`ra_itraj_cuboid`	Implementation of random access with slicing that can be up to 3-dimensional, where the first dimension corresponds to the trajectory index, the second dimension corresponds to the frames and the third dimension corresponds to the dimensions of the frames.
`ra_itraj_jagged`	Behaves like ra_itraj_cuboid just that the trajectories are not truncated and returned as a list.
`ra_itraj_linear`	Implementation of random access that takes arguments as the default random access (i.e., up to three dimensions with trajs, frames and dims, respectively), but which considers the frame indexing to be contiguous.
`ra_linear`	Implementation of random access that takes a (maximal) two-dimensional slice where the first component corresponds to the frames and the second component corresponds to the dimensions.
`timescales`	Implied timescales of the TICA transformation
`var_cutoff`	Kinetic variance cutoff