pyemma.coordinates.transform.PCA

class pyemma.coordinates.transform.PCA(*args, **kwargs)

Principal component analysis.

__init__(dim=- 1, var_cutoff=0.95, mean=None, stride=1, skip=0)

Principal component analysis.

Given a sequence of multivariate data \(X_t\), computes the mean-free covariance matrix.

\[C = (X - \mu)^T (X - \mu)\]

and solves the eigenvalue problem

\[C r_i = \sigma_i r_i,\]

where \(r_i\) are the principal components and \(\sigma_i\) are their respective variances.

When used as a dimension reduction method, the input data is projected onto the dominant principal components.

Parameters
  • dim (int, optional, default -1) – the number of dimensions (independent components) to project onto. A call to the map function reduces the d-dimensional input to only dim dimensions such that the data preserves the maximum possible autocorrelation amongst dim-dimensional linear projections. -1 means all numerically available dimensions will be used unless reduced by var_cutoff. Setting dim to a positive value is exclusive with var_cutoff.

  • var_cutoff (float in the range [0,1], optional, default 0.95) – Determines the number of output dimensions by including dimensions until their cumulative kinetic variance exceeds the fraction subspace_variance. var_cutoff=1.0 means all numerically available dimensions (see epsilon) will be used, unless set by dim. Setting var_cutoff smaller than 1.0 is exclusive with dim

  • mean (ndarray, optional, default None) – Optionally pass pre-calculated means to avoid their re-computation. The shape has to match the input dimension.

  • skip (int, default 0) – skip the first n frames of each trajectory.

Methods

_Loggable__create_logger()

_SerializableMixIn__interpolate(state, klass)

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__getstate__()

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__([dim, var_cutoff, mean, stride, skip])

Principal component analysis.

__init_subclass__(*args, **kwargs)

This method is called when a class is subclassed.

__iter__()

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__my_getstate__()

__my_setstate__(state)

__ne__(value, /)

Return self!=value.

__new__(cls, *args, **kwargs)

Create and return a new object.

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__setstate__(state)

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_check_estimated()

_chunk_finite(data)

_cleanup_logger(logger_id, logger_name)

_clear_in_memory()

_compute_default_cs(dim, itemsize[, logger])

_create_iterator([skip, chunk, stride, …])

Should be implemented by non-abstract subclasses.

_data_flow_chain()

Get a list of all elements in the data flow graph.

_diagonalize()

_estimate(iterable, **kw)

_get_classes_to_inspect()

gets classes self derives from which 1.

_get_interpolation_map(cls)

_get_param_names()

Get parameter names for the estimator

_get_private_field(cls, name[, default])

_get_serialize_fields(cls)

_get_state_of_serializeable_fields(klass, state)

:return a dictionary {k:v} for k in self.serialize_fields and v=getattr(self, k)

_get_traj_info(filename)

_get_version(cls[, require])

_get_version_for_class_from_state(state, klass)

retrieves the version of the current klass from the state mapping from old locations to new ones.

_init_covar(partial_fit, n_chunks)

_logger_is_active(level)

@param level: int log level (debug=10, info=20, warn=30, error=40, critical=50)

_map_to_memory([stride])

Maps results to memory.

_set_random_access_strategies()

_set_state_from_serializeable_fields_and_state(…)

set only fields from state, which are present in klass.__serialize_fields

_source_from_memory([data_producer])

_transform_array(X)

Projects the data onto the dominant principal components.

describe()

Get a descriptive string representation of this class.

dimension()

output dimension

estimate(X, **kwargs)

Estimates the model given the data X

fit(X[, y])

Estimates parameters - for compatibility with sklearn.

fit_transform(X[, y])

Fit to data, then transform it.

get_output([dimensions, stride, skip, chunk])

Maps all input data of this transformer and returns it as an array or list of arrays

get_params([deep])

Get parameters for this estimator.

iterator([stride, lag, chunk, …])

creates an iterator to stream over the (transformed) data.

load(file_name[, model_name])

Loads a previously saved PyEMMA object from disk.

n_chunks(chunksize[, stride, skip])

how many chunks an iterator of this sourcde will output, starting (eg.

n_frames_total([stride, skip])

Returns total number of frames.

number_of_trajectories([stride])

Returns the number of trajectories.

output_type()

By default transformers return single precision floats.

partial_fit(X)

save(file_name[, model_name, overwrite, …])

saves the current state of this object to given file and name.

set_params(**params)

Set the parameters of this estimator.

trajectory_length(itraj[, stride, skip])

Returns the length of trajectory of the requested index.

trajectory_lengths([stride, skip])

Returns the length of each trajectory.

transform(X)

Maps the input data through the transformer to correspondingly shaped output data array/list.

write_to_csv([filename, extension, …])

write all data to csv with numpy.savetxt

write_to_hdf5(filename[, group, …])

writes all data of this Iterable to a given HDF5 file.

Attributes

_DataSource__serialize_fields

_Estimator__serialize_fields

_FALLBACK_CHUNKSIZE

_InMemoryMixin__serialize_fields

_InMemoryMixin__serialize_version

_Loggable__ids

_Loggable__refs

_PCA__serialize_version

_SerializableMixIn__serialize_fields

_SerializableMixIn__serialize_modifications_map

_SerializableMixIn__serialize_version

__abstractmethods__

__dict__

__doc__

__module__

__weakref__

list of weak references to the object (if defined)

_abc_impl

_estimated

_loglevel_CRITICAL

_loglevel_DEBUG

_loglevel_ERROR

_loglevel_INFO

_loglevel_WARN

_save_data_producer

_serialize_version

chunksize

chunksize defines how much data is being processed at once.

cumvar

data_producer

The data producer for this data source object (can be another data source object).

default_chunksize

How much data will be processed at once, in case no chunksize has been provided.

eigenvalues

eigenvectors

feature_PC_correlation

Instantaneous correlation matrix between input features and PCs

filenames

list of file names the data is originally being read from.

in_memory

are results stored in memory?

is_random_accessible

Check if self._is_random_accessible is set to true and if all the random access strategies are implemented.

is_reader

Property telling if this data source is a reader or not.

logger

The logger for this class instance

mean

model

The model estimated by this Estimator

name

The name of this instance

ndim

ntraj

ra_itraj_cuboid

Implementation of random access with slicing that can be up to 3-dimensional, where the first dimension corresponds to the trajectory index, the second dimension corresponds to the frames and the third dimension corresponds to the dimensions of the frames.

ra_itraj_jagged

Behaves like ra_itraj_cuboid just that the trajectories are not truncated and returned as a list.

ra_itraj_linear

Implementation of random access that takes arguments as the default random access (i.e., up to three dimensions with trajs, frames and dims, respectively), but which considers the frame indexing to be contiguous.

ra_linear

Implementation of random access that takes a (maximal) two-dimensional slice where the first component corresponds to the frames and the second component corresponds to the dimensions.