pyemma.coordinates.data.FeatureReader¶

class pyemma.coordinates.data.FeatureReader(trajectories, topologyfile=None, chunksize=100, featurizer=None)¶

Reads features from MD data.

To select a feature, access the featurizer and call a feature selecting method (e.g) distances.

Parameters:	trajectories (list of strings) – paths to trajectory files topologyfile (string) – path to topology file (e.g. pdb)

Examples

>>> from pyemma.datasets import get_bpti_test_data

Iterator access:

>>> reader = FeatureReader(get_bpti_test_data()['trajs'], get_bpti_test_data()['top'])

Optionally set a chunksize

>>> reader.chunksize = 300

Store chunks by their trajectory index

>>> chunks = {i : [] for i in range(reader.number_of_trajectories())}
>>> for itraj, X in reader:
...     chunks[itraj].append(X)

Calculate some distances of protein during feature reading:

>>> reader.featurizer.add_distances([[0, 3], [10, 15]])
>>> X = reader.get_output()

__init__(trajectories, topologyfile=None, chunksize=100, featurizer=None)¶

Methods

`__init__`(trajectories[, topologyfile, ...])
`describe`()	Returns a description of this transformer
`dimension`()	Returns the number of output dimensions
`fit`(X, **kwargs)	For compatibility with sklearn
`fit_transform`(X, **kwargs)	For compatibility with sklearn
`get_output`([dimensions, stride])	Maps all input data of this transformer and returns it as an array or list of arrays.
`iterator`([stride, lag])	Returns an iterator that allows to access the transformed data.
`map`(X)	Deprecated: use transform(X)
`n_frames_total`([stride])	Returns the total number of frames, over all trajectories
`number_of_trajectories`()	Returns the number of trajectories
`output_type`()	By default transformers return single precision floats.
`parametrize`([stride])
`register_progress_callback`(call_back[, stage])	Registers the progress reporter.
`trajectory_length`(itraj[, stride])	Returns the length of trajectory
`trajectory_lengths`([stride])	Returns the length of each trajectory
`transform`(X)

Attributes

`chunksize`	chunksize defines how much data is being processed at once.
`data_producer`	where the transformer obtains its data.
`in_memory`	are results stored in memory?
`logger`	The logger for this class instance
`name`	The name of this instance
`ntraj`

chunksize¶: chunksize defines how much data is being processed at once.

data_producer¶: where the transformer obtains its data.

describe()¶

Returns a description of this transformer

Returns:

dimension()¶

Returns the number of output dimensions

Returns:

fit(X, **kwargs)¶: For compatibility with sklearn

fit_transform(X, **kwargs)¶: For compatibility with sklearn

get_output(dimensions=slice(0, None, None), stride=1)¶

Maps all input data of this transformer and returns it as an array or list of arrays.

Parameters:	dimensions (list-like of indexes or slice) – indices of dimensions you like to keep, default = all stride (int) – only take every n’th frame, default = 1
Returns:	output – the mapped data, where T is the number of time steps of the input data, or if stride > 1, floor(T_in / stride). d is the output dimension of this transformer. If the input consists of a list of trajectories, Y will also be a corresponding list of trajectories
Return type:	ndarray(T, d) or list of ndarray(T_i, d)

Notes

This function may be RAM intensive if stride is too large or too many dimensions are selected.
if in_memory attribute is True, then results of this methods are cached.

Example

plotting trajectories

>>> import pyemma.coordinates as coor 
>>> import matplotlib.pyplot as plt 

Fill with some actual data!

>>> tica = coor.tica() 
>>> trajs = tica.get_output(dimensions=(0,), stride=100) 
>>> for traj in trajs: 
...     plt.figure() 
...     plt.plot(traj[:, 0]) 

in_memory¶: are results stored in memory?

iterator(stride=1, lag=0)¶

Returns an iterator that allows to access the transformed data.

Parameters:

stride (int) – Only transform every N’th frame, default = 1
lag (int) – Configure the iterator such that it will return time-lagged data with a lag time of lag. If lag is used together with stride the operation will work as if the striding operation is applied before the time-lagged trajectory is shifted by lag steps. Therefore the effective lag time will be stride*lag.

Returns:

iterator – If lag = 0, a call to the .next() method of this iterator will return the pair (itraj, X) : (int, ndarray(n, m)), where itraj corresponds to input sequence number (eg. trajectory index) and X is the transformed data, n = chunksize or n < chunksize at end of input.

If lag > 0, a call to the .next() method of this iterator will return the tuple (itraj, X, Y) : (int, ndarray(n, m), ndarray(p, m)) where itraj and X are the same as above and Y contain the time-lagged data.

Return type:

a TransformerIterator

logger¶: The logger for this class instance

map(X)¶

Deprecated: use transform(X)

Maps the input data through the transformer to correspondingly shaped output data array/list.

n_frames_total(stride=1)¶

Returns the total number of frames, over all trajectories

Parameters:	stride – return value is the number of frames in trajectories when running through them with a step size of stride
Returns:	the total number of frames, over all trajectories

name¶: The name of this instance

number_of_trajectories()¶

Returns the number of trajectories

Returns:	number of trajectories

output_type()¶: By default transformers return single precision floats.

register_progress_callback(call_back, stage=0)¶

Registers the progress reporter.

Parameters:	call_back (function) – This function will be called with the following arguments: stage (int) instance of pyemma.utils.progressbar.ProgressBar optional args and named keywords (kw), for future changes stage* (int, optional, default=0) – The stage you want the given call back function to be fired.

trajectory_length(itraj, stride=1)¶

Returns the length of trajectory

Parameters:	itraj – trajectory index stride – return value is the number of frames in trajectory when running through it with a step size of stride
Returns:	length of trajectory

trajectory_lengths(stride=1)¶

Returns the length of each trajectory

Parameters:	stride – return value is the number of frames in trajectories when running through them with a step size of stride
Returns:	numpy array containing length of each trajectory