pyemma.coordinates.data.MDFeaturizer¶
-
class
pyemma.coordinates.data.MDFeaturizer(topfile)¶ -
__init__(topfile)¶ extracts features from MD trajectories.
Parameters: topfile (str) – a path to a topology file (pdb etc.)
Methods
__init__(topfile)extracts features from MD trajectories. add_all()Adds all atom coordinates to the feature list. add_angles(indexes[, deg, cossin])Adds the list of angles to the feature list add_backbone_torsions([selstr, deg, cossin])Adds all backbone phi/psi angles or the ones specified in selstr to the feature list. add_chi1_torsions([selstr, deg, cossin])Adds all chi1 angles or the ones specified in selstr to the feature list. add_contacts(indices[, indices2, threshold, ...])Adds the contacts to the feature list. add_custom_feature(feature)Adds a custom feature to the feature list. add_custom_func(func, dim[, desc])adds a user defined function to extract features add_dihedrals(indexes[, deg, cossin])Adds the list of dihedrals to the feature list add_distances(indices[, periodic, indices2])Adds the distances between atoms to the feature list. add_distances_ca([periodic])Adds the distances between all Ca’s (except for 1- and 2-neighbors) to the feature list. add_inverse_distances(indices[, periodic, ...])Adds the inverse distances between atoms to the feature list. add_minrmsd_to_ref(ref[, ref_frame, ...])Adds the minimum root-mean-square-deviation (minrmsd) with respect to a reference structure to the feature list. add_selection(indexes)Adds the coordinates of the selected atom indexes to the feature list. angles(func)backbone_torsions(func)contacts(func)describe()Returns a list of strings, one for each feature selected, with human-readable descriptions of the features. dimension()current dimension due to selected features distances(func)distancesCa(func)inverse_distances(func)map(traj)Maps an mdtraj Trajectory object to the selected output features pairs(sel)Creates all pairs between indexes, except for 1 and 2-neighbors select(selstring)Returns the indexes of atoms matching the given selection select_Backbone()Returns the indexes of backbone C, CA and N atoms select_Ca()Returns the indexes of all Ca-atoms select_Heavy()Returns the indexes of all heavy atoms (Mass >= 2) -
add_all()¶ Adds all atom coordinates to the feature list. The coordinates are flattened as follows: [x1, y1, z1, x2, y2, z2, ...]
-
add_angles(indexes, deg=False, cossin=False)¶ Adds the list of angles to the feature list
Parameters: - indexes (np.ndarray, shape=(num_pairs, 3), dtype=int) – an array with triplets of atom indices
- deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
- cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
-
add_backbone_torsions(selstr=None, deg=False, cossin=False)¶ Adds all backbone phi/psi angles or the ones specified in selstr to the feature list. :param selstr: selection string specifying the atom selection used to specify a specific set of backbone angles
If “” (default), all chi1 angles found in the topology will be computedParameters: - deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
- cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
-
add_chi1_torsions(selstr='', deg=False, cossin=False)¶ Adds all chi1 angles or the ones specified in selstr to the feature list. :param selstr: selection string specifying the atom selection used to specify a specific set of backbone angles
If “” (default), all chi1 angles found in the topology will be computedParameters: - deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
- cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
-
add_contacts(indices, indices2=None, threshold=5.0, periodic=True)¶ Adds the contacts to the feature list.
Parameters: - indices (can be of two types:) –
- ndarray((n, 2), dtype=int):
- n x 2 array with the pairs of atoms between which the contacts shall be computed
- iterable of integers (either list or ndarray(n, dtype=int)):
- indices (not pairs of indices) of the atoms between which the contacts shall be computed.
Note that this will produce a pairlist different from the pairlist produced by
pairs()in that this does not exclude 1-2 neighbors.
- indices2 (iterable of integers (either list or ndarray(n, dtype=int)), optional:) – Only has effect if
indicesis an iterable of integers. Instead of the above behaviour, only the contacts between the atoms inindicesandindices2will be computed. - threshold (float, optional, default = 5.0) – distances below this threshold will result in a feature 1.0, distances above will result in 0.0. The default is set with Angstrom distances in mind. Make sure that you know whether your coordinates are in Angstroms or nanometers when setting this threshold.
Note
When using the iterable of integers input,
indicesandindices2will be sorted numerically and made unique before converting them to a pairlist. Please look carefully at the output ofdescribe()to see what features exactly have been added.- indices (can be of two types:) –
-
add_custom_feature(feature)¶ Adds a custom feature to the feature list.
Parameters: feature (object) – an object with interface like CustomFeature (map, describe methods)
-
add_custom_func(func, dim, desc='', *args, **kwargs)¶ adds a user defined function to extract features
Parameters: - func (function) – a user-defined function, which accepts mdtraj.Trajectory object as first parameter and as many optional and named arguments as desired. Has to return a numpy.ndarray
- dim (int) – output dimension of function
- desc (str) – description of your feature function
- args (list) – positional arguments passed to func
- kwargs (dictionary) – named arguments passed to func
-
add_dihedrals(indexes, deg=False, cossin=False)¶ Adds the list of dihedrals to the feature list
Parameters: - indexes (np.ndarray, shape=(num_pairs, 4), dtype=int) – an array with quadruplets of atom indices
- deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
- cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
-
add_distances(indices, periodic=True, indices2=None)¶ Adds the distances between atoms to the feature list.
Parameters: - indices (can be of two types:) –
- ndarray((n, 2), dtype=int):
- n x 2 array with the pairs of atoms between which the distances shall be computed
- iterable of integers (either list or ndarray(n, dtype=int)):
- indices (not pairs of indices) of the atoms between which the distances shall be computed.
Note that this will produce a pairlist different from the pairlist produced by
pairs()in that this does not exclude 1-2 neighbors.
- indices2 (iterable of integers (either list or ndarray(n, dtype=int)), optional:) – Only has effect if
indicesis an iterable of integers. Instead of the above behaviour, only the distances between the atoms inindicesandindices2will be computed.
Note
When using the iterable of integers input,
indicesandindices2will be sorted numerically and made unique before converting them to a pairlist. Please look carefully at the output ofdescribe()to see what features exactly have been added.- indices (can be of two types:) –
-
add_distances_ca(periodic=True)¶ Adds the distances between all Ca’s (except for 1- and 2-neighbors) to the feature list.
-
add_inverse_distances(indices, periodic=True, indices2=None)¶ Adds the inverse distances between atoms to the feature list.
Parameters: - indices (can be of two types:) –
- ndarray((n, 2), dtype=int):
- n x 2 array with the pairs of atoms between which the inverse distances shall be computed
- iterable of integers (either list or ndarray(n, dtype=int)):
- indices (not pairs of indices) of the atoms between which the inverse distances shall be computed.
Note that this will produce a pairlist different from the pairlist produced by
pairs()in that this does not exclude 1-2 neighbors.
- indices2 (iterable of integers (either list or ndarray(n, dtype=int)), optional:) – Only has effect if
indicesis an iterable of integers. Instead of the above behaviour, only the inverse distances between the atoms inindicesandindices2will be computed.
Note
When using the iterable of integers input,
indicesandindices2will be sorted numerically and made unique before converting them to a pairlist. Please look carefully at the output ofdescribe()to see what features exactly have been added.- indices (can be of two types:) –
-
add_minrmsd_to_ref(ref, ref_frame=0, atom_indices=None, precentered=False)¶ Adds the minimum root-mean-square-deviation (minrmsd) with respect to a reference structure to the feature list.
Parameters: - ref –
Reference structure for computing the minrmsd. Can be of two types:
mdtraj.Trajectoryobject- filename for mdtraj to load. In this case, only the
ref_frameof that file will be used.
- ref_frame (integer, default=0) – Reference frame of the filename specified in
ref. This parameter has no effect ifrefis not a filename. - atom_indices (array_like, default=None) –
Atoms that will be used for:
- aligning the target and reference geometries.
- computing rmsd after the alignment.
If left to None, all atoms of
refwill be used. - precentered (bool, default=False) – Use this boolean at your own risk to let mdtraj know that the target conformations are already centered at the origin, i.e., their (uniformly weighted) center of mass lies at the origin. This will speed up the computation of the rmsd.
- ref –
-
add_selection(indexes)¶ Adds the coordinates of the selected atom indexes to the feature list. The coordinates of the selection [1, 2, ...] are flattened as follows: [x1, y1, z1, x2, y2, z2, ...]
Parameters: indexes (ndarray((n), dtype=int)) – array with selected atom indexes
-
describe()¶ Returns a list of strings, one for each feature selected, with human-readable descriptions of the features.
Returns: labels – An ordered list of strings, one for each feature selected, with human-readable descriptions of the features. Return type: list of str
-
dimension()¶ current dimension due to selected features
Returns: dim – total dimension due to all selection features Return type: int
-
map(traj)¶ Maps an mdtraj Trajectory object to the selected output features
Parameters: traj (mdtraj Trajectory) – Trajectory object used as an input Returns: out – Output features: For each of T time steps in the given trajectory, a vector with all n output features selected. Return type: ndarray((T, n), dtype=float32)
-
static
pairs(sel)¶ Creates all pairs between indexes, except for 1 and 2-neighbors
Parameters: - sel (ndarray((m,2), dtype=int)) – array with selected atom indexes
- Return –
- ------- –
- sel – m x 2 array with all pair indexes between different atoms that are at least 3 indexes apart, i.e. if i is the index of an atom, the pairs [i,i-2], [i,i-1], [i,i], [i,i+1], [i,i+2], will not be in sel. Moreover, the list is non-redundant, i.e. if [i,j] is in sel, then [j,i] is not.
-
select(selstring)¶ Returns the indexes of atoms matching the given selection
Parameters: selstring (str) – Selection string. See mdtraj documentation for details: http://mdtraj.org/latest/atom_selection.html Returns: indexes – array with selected atom indexes Return type: ndarray((n), dtype=int)
-
select_Backbone()¶ Returns the indexes of backbone C, CA and N atoms
Returns: indexes – array with selected atom indexes Return type: ndarray((n), dtype=int)
-
select_Ca()¶ Returns the indexes of all Ca-atoms
Returns: indexes – array with selected atom indexes Return type: ndarray((n), dtype=int)
-
select_Heavy()¶ Returns the indexes of all heavy atoms (Mass >= 2)
Returns: indexes – array with selected atom indexes Return type: ndarray((n), dtype=int)
-