Interaction fingerprint#
Calculate a Protein-Ligand Interaction Fingerprint — prolif.fingerprint
#
In [1]: import MDAnalysis as mda
In [2]: from rdkit.DataStructs import TanimotoSimilarity
In [3]: import prolif
In [4]: u = mda.Universe(prolif.datafiles.TOP, prolif.datafiles.TRAJ)
In [5]: prot = u.select_atoms("protein")
In [6]: lig = u.select_atoms("resname LIG")
In [7]: fp = prolif.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking", "CationPi",
...: "Cationic"])
...:
In [8]: fp.run(u.trajectory[:5], lig, prot)
Out[8]: <prolif.fingerprint.Fingerprint: 5 interactions: ['HBAcceptor', 'HBDonor', 'Cationic', 'CationPi', 'PiStacking'] at 0x7fbb006be090>
In [9]: df = fp.to_dataframe()
In [10]: df
Out[10]:
ligand LIG1.G
protein ASP129.A VAL201.A SER212.A PHE331.B
interaction HBDonor Cationic HBAcceptor HBDonor PiStacking
Frame
0 True True True True True
1 True True True False True
2 True True True False False
3 True True True False True
4 True True True False False
In [11]: bv = fp.to_bitvectors()
In [12]: TanimotoSimilarity(bv[0], bv[1])
Out[12]: 0.8
# save results
In [13]: fp.to_pickle("fingerprint.pkl")
# load
In [14]: fp = prolif.Fingerprint.from_pickle("fingerprint.pkl")
- class prolif.fingerprint.Fingerprint(interactions=['Hydrophobic', 'HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation', 'VdWContact'], parameters=None, count=False, vicinity_cutoff=6.0)[source]#
Class that generates an interaction fingerprint between two molecules
While in most cases the fingerprint will be generated between a ligand and a protein, it is also possible to use this class for protein-protein interactions, or host-guest systems.
- Parameters:
interactions (list) – List of names (str) of interaction classes as found in the
prolif.interactions
module.parameters (dict, optional) – New parameters for the interactions. Mapping between an interaction name and a dict of parameters as they appear in the interaction class.
count (bool) –
For a given interaction class and pair of residues, there might be multiple combinations of atoms that satisfy the interaction constraints. This parameter controls how the fingerprint treats these different combinations:
False
: returns the first combination that satisfy the constraints (fast)True
: returns all the combinations (slow)
vicinity_cutoff (float) – Automatically restrict the analysis to residues within this range of the ligand. This parameter is ignored if the
residues
parameter of therun
methods is set to anything other thanNone
.
- interactions#
Dictionary of interaction classes indexed by class name. For more details, see
prolif.interactions
- Type:
- vicinity_cutoff#
Used when calling
prolif.utils.get_residues_near_ligand()
.- Type:
- ifp#
Dict of interaction fingerprints in a sparse format for the given trajectory or docking poses:
{<frame number>: <IFP>}
. See theIFP
class for more information.- Type:
dict, optional
- Raises:
NameError – Unknown interaction in the
interactions
orparameters
parameters.
Notes
You can use the fingerprint generator in multiple ways:
On a trajectory directly from MDAnalysis objects:
In [1]: prot = u.select_atoms("protein") In [2]: lig = u.select_atoms("resname LIG") In [3]: fp = prolif.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking", ...: "Hydrophobic"]) ...: In [4]: fp.run(u.trajectory[:5], lig, prot) Out[4]: <prolif.fingerprint.Fingerprint: 4 interactions: ['Hydrophobic', 'HBAcceptor', 'HBDonor', 'PiStacking'] at 0x7fbb002e8a10> In [5]: fp.to_dataframe() Out[5]: ligand LIG1.G ... protein TYR109.A TRP125.A ... MET337.B PHE351.B interaction Hydrophobic Hydrophobic ... Hydrophobic Hydrophobic Frame ... 0 True True ... True True 1 False True ... True True 2 True True ... False True 3 False True ... True True 4 True True ... True True [5 rows x 19 columns]
On two single structures (from RDKit or MDAnalysis):
In [6]: u.trajectory[0] # use coordinates of the first frame Out[6]: < Timestep 0 with unit cell dimensions [ 89.6909 87.95726 102.02041 90. 90. 90. ] > In [7]: prot = prolif.Molecule.from_mda(prot) In [8]: lig = prolif.Molecule.from_mda(lig) In [9]: ifp = fp.generate(lig, prot, metadata=True) In [10]: prolif.to_dataframe({0: ifp}, fp.interactions) Out[10]: ligand LIG1.G ... protein TYR109.A TRP125.A ... MET337.B PHE351.B interaction Hydrophobic Hydrophobic ... Hydrophobic Hydrophobic Frame ... 0 True True ... True True [1 rows x 18 columns]
On a specific pair of residues for a specific interaction:
# ligand-protein In [11]: next(fp.hbdonor(lig, prot["ASP129.A"])) Out[11]: True # protein-protein In [12]: next(fp.hbacceptor(prot["ASP129.A"], prot["CYS133.A"])) Out[12]: True
You can also obtain the indices of atoms responsible for the interaction:
In [13]: fp.metadata(lig, prot["ASP129.A"]) Out[13]: {'HBDonor': ({'indices': {'ligand': (13, 52), 'protein': (8,)}, 'parent_indices': {'ligand': (13, 52), 'protein': (1489,)}, 'distance': 2.7534162113079534, 'DHA_angle': 158.09471525304645},)} In [14]: next(fp.hbdonor(lig, prot["ASP129.A"], metadata=True), None) Out[14]: {'indices': {'ligand': (13, 52), 'protein': (8,)}, 'parent_indices': {'ligand': (13, 52), 'protein': (1489,)}, 'distance': 2.7534162113079534, 'DHA_angle': 158.09471525304645}
Changed in version 1.0.0: Added pickle support
Changed in version 2.0.0: Changed the format of the
ifp
attribute to be a dictionary containing more complete interaction metadata instead of just atom indices. Removed thereturn_atoms
argument into_dataframe()
. Users should directly useifp
instead. Added theplot_lignetwork()
method to generate theLigNetwork
plot. Replaced theFingerprint.bitvector_atoms
method withFingerprint.metadata()
. Added avicinity_cutoff
parameter controlling the distance used to automatically restrict the IFP calculation to residues within the specified range of the ligand. Removed the__wrapped__
attribute on interaction methods that are available from the fingerprint object. These methods now accept ametadata
parameter instead. Added theparameters
argument to set the interaction classes parameters without having to create a new class. Added thecount
argument to control for a given pair of residues whether to return all occurences of an interaction or only the first one.- bitvector(res1, res2)[source]#
Generates the complete bitvector for the interactions between two residues. To access metadata for each interaction, see
metadata()
.- Parameters:
res1 (prolif.residue.Residue) – A residue, usually from a ligand
res2 (prolif.residue.Residue) – A residue, usually from a protein
- Returns:
bitvector – An array storing the encoded interactions between res1 and res2. Depending on
Fingerprint.count
, the dtype of the array will be bool or uint8.- Return type:
- classmethod from_pickle(path_or_bytes)[source]#
Creates a fingerprint object from a pickle dump.
- Parameters:
path_or_bytes (str, pathlib.Path or bytes) – The path to the pickle file, or bytes corresponding to a pickle dump.
See also
to_pickle()
for creating the pickle dump.New in version 1.0.0.
Changed in version 2.0.0: Switched to dill instead of pickle
- generate(lig, prot, residues=None, metadata=False)[source]#
Generates the interaction fingerprint between 2 molecules
- Parameters:
lig (prolif.molecule.Molecule) – Molecule for the ligand
prot (prolif.molecule.Molecule) – Molecule for the protein
residues (list or "all" or None) – A list of protein residues (
str
,int
orResidueId
) to take into account for the fingerprint extraction. If"all"
, all residues will be used. IfNone
, at each frame theget_residues_near_ligand()
function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue (seevicinity_cutoff
)metadata (bool) – For each residue pair and interaction, return an interaction metadata dictionary instead of bits.
- Returns:
ifp – A dictionary indexed by
(ligand, protein)
residue pairs. The format for values will depend onmetadata
:A single numpy array if
metadata=False
A sparse dictionary of metadata tuples indexed by interaction name if
metadata=True
- Return type:
Example
>>> u = mda.Universe("complex.pdb") >>> lig = prolif.Molecule.from_mda(u, "resname LIG") >>> prot = prolif.Molecule.from_mda(u, "protein") >>> fp = prolif.Fingerprint() >>> ifp = fp.generate(lig, prot)
New in version 0.3.2.
Changed in version 2.0.0:
return_atoms
replaced bymetadata
, and it now returns a sparse dictionary of metadata tuples indexed by interaction name instead of a tuple of arrays.
- static list_available(show_hidden=False)[source]#
List interactions available to the Fingerprint class.
- Parameters:
show_hidden (bool) – Show hidden classes (base classes meant to be inherited from to create custom interactions).
- metadata(res1, res2)[source]#
Generates a metadata dictionary for the interactions between two residues.
- Parameters:
res1 (prolif.residue.Residue) – A residue, usually from a ligand
res2 (prolif.residue.Residue) – A residue, usually from a protein
- Returns:
metadata – Dict containing tuples of metadata dictionaries indexed by interaction name. If a specific interaction is not present between residues, it is filtered out of the dictionary.
- Return type:
Changed in version 0.3.2: Atom indices are returned as two separate lists instead of a single list of tuples
Changed in version 2.0.0: Returns a dictionnary with all available metadata for each interaction, rather than just atom indices.
- plot_3d(ligand_mol: Molecule, protein_mol: Molecule, *, frame: int, size: Tuple[int, int] = (650, 600), display_all: bool = False)[source]#
Generate and display the complex in 3D with py3Dmol from a fingerprint object that has been used to run an analysis.
- Parameters:
ligand_mol (Molecule) – The ligand molecule to display.
protein_mol (Molecule) – The protein molecule to display.
frame (int) – The frame number chosen to select which interactions are going to be displayed.
size (Tuple[int, int] = (650, 600)) – The size of the py3Dmol widget view.
display_all (bool) – Display all occurences for a given pair of residues and interaction, or only the shortest one. Not relevant if
count=False
in theFingerprint
object.
See also
- plot_barcode(*, figsize: Tuple[int, int] = (8, 10), dpi: int = 100, interactive: bool = False, n_frame_ticks: int = 10, residues_tick_location: Literal['top', 'bottom'] = 'top', xlabel: str = 'Frame', subplots_kwargs: Optional[dict] = None, tight_layout_kwargs: Optional[dict] = None)[source]#
Generate and display a
Barcode
plot from a fingerprint object that has been used to run an analysis.- Parameters:
figsize (Tuple[int, int] = (8, 10)) – Size of the matplotlib figure.
dpi (int = 100) – DPI used for the matplotlib figure.
interactive (bool) – Add hover interactivity to the plot (only relevant for notebooks). You may need to add
%matplotlib notebook
or%matplotlib ipympl
for it to work as expected.n_frame_ticks (int = 10) – Number of ticks on the X axis. May use ±1 tick to have them evenly spaced.
residues_tick_location (Literal["top", "bottom"] = "top") – Whether the Y ticks appear at the top or at the bottom of the series of interactions of each residue.
xlabel (str = "Frame") – Label displayed for the X axis.
subplots_kwargs (Optional[dict] = None) – Other parameters passed to
matplotlib.pyplot.subplots()
.tight_layout_kwargs (Optional[dict] = None) – Other parameters passed to
matplotlib.figure.Figure.tight_layout()
.
See also
- plot_lignetwork(ligand_mol, *, kind: Literal['aggregate', 'frame'] = 'aggregate', frame: int = 0, display_all: bool = False, threshold: float = 0.3, use_coordinates: bool = False, flatten_coordinates: bool = True, kekulize: bool = False, molsize: int = 35, rotation: float = 0, carbon: float = 0.16, width: str = '100%', height: str = '500px')[source]#
Generate and display a
LigNetwork
plot from a fingerprint object that has been used to run an analysis.- Parameters:
ligand_mol (rdkit.Chem.rdChem.Mol) – Ligand molecule.
kind (str) – One of
"aggregate"
or"frame"
.frame (int) – Frame number (see
ifp
). Only applicable forkind="frame"
.display_all (bool) – Display all occurences for a given pair of residues and interaction, or only the shortest one. Only applicable for
kind="frame"
. Not relevant ifcount=False
in theFingerprint
object.threshold (float) – Frequency threshold, between 0 and 1. Only applicable for
kind="aggregate"
.use_coordinates (bool) – If
True
, uses the coordinates of the molecule directly, otherwise generates 2D coordinates from scratch. See alsoflatten_coordinates
.flatten_coordinates (bool) – If this is
True
anduse_coordinates=True
, generates 2D coordinates that are constrained to fit the 3D conformation of the ligand as best as possible.kekulize (bool) – Kekulize the ligand.
molsize (int) – Multiply the coordinates by this number to create a bigger and more readable depiction.
rotation (int) – Rotate the structure on the XY plane.
carbon (float) – Size of the carbon atom dots on the depiction. Use 0 to hide the carbon dots.
width (str) – Width of the IFrame window.
height (str) – Height of the IFrame window.
Notes
Two kinds of diagrams can be rendered: either for a designated frame or by aggregating the results on the whole IFP and optionnally discarding interactions that occur less frequently than a threshold. In the latter case (aggregate), only the group of atoms most frequently involved in each interaction is used to draw the edge.
See also
- run(traj, lig, prot, *, residues=None, converter_kwargs=None, progress=True, n_jobs=None)[source]#
Generates the fingerprint on a trajectory for a ligand and a protein
- Parameters:
traj (MDAnalysis.coordinates.base.ProtoReader or MDAnalysis.coordinates.base.FrameIteratorSliced) – Iterate over this Universe trajectory or sliced trajectory object to extract the frames used for the fingerprint extraction
lig (MDAnalysis.core.groups.AtomGroup) – An MDAnalysis AtomGroup for the ligand
prot (MDAnalysis.core.groups.AtomGroup) – An MDAnalysis AtomGroup for the protein (with multiple residues)
residues (list or "all" or None) – A list of protein residues (
str
,int
orResidueId
) to take into account for the fingerprint extraction. If"all"
, all residues will be used. IfNone
, at each frame theget_residues_near_ligand()
function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue.converter_kwargs (tuple[dict, dict], optional) – Tuple of kwargs passed to the underlying
RDKitConverter
from MDAnalysis: the first for the ligand, and the second for the proteinprogress (bool) – Display a
tqdm
progressbar while running the calculationn_jobs (int or None) – Number of processes to run in parallel. If
n_jobs=None
, the analysis will use all available CPU threads, while ifn_jobs=1
, the analysis will run in serial.
- Raises:
ValueError – If
n_jobs <= 0
- Returns:
The Fingerprint instance that generated the fingerprint
- Return type:
Example
>>> u = mda.Universe("top.pdb", "traj.nc") >>> lig = u.select_atoms("resname LIG") >>> prot = u.select_atoms("protein") >>> fp = prolif.Fingerprint().run(u.trajectory[:10], lig, prot)
See also
Fingerprint.generate()
to generate the fingerprint between two single structures.Fingerprint.run_from_iterable()
to generate the fingerprint between a protein and a collection of ligands.
Changed in version 0.3.2: Moved the
return_atoms
parameter from therun
method to the dataframe conversion codeChanged in version 1.0.0: Added support for multiprocessing
Changed in version 1.1.0: Added support for passing kwargs to the RDKitConverter through the
converter_kwargs
parameterChanged in version 2.0.0: Changed the format of the
ifp
attribute to be a dictionary containing more complete interaction metadata instead of just atom indices.
- run_from_iterable(lig_iterable, prot_mol, *, residues=None, progress=True, n_jobs=None)[source]#
Generates the fingerprint between a list of ligands and a protein
- Parameters:
lig_iterable (list or generator) – An iterable yielding ligands as
Molecule
objectsprot_mol (prolif.molecule.Molecule) – The protein
residues (list or "all" or None) – A list of protein residues (
str
,int
orResidueId
) to take into account for the fingerprint extraction. If"all"
, all residues will be used. IfNone
, at each frame theget_residues_near_ligand()
function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue.progress (bool) – Display a
tqdm
progressbar while running the calculationn_jobs (int or None) – Number of processes to run in parallel. If
n_jobs=None
, the analysis will use all available CPU threads, while ifn_jobs=1
, the analysis will run in serial.
- Raises:
ValueError – If
n_jobs <= 0
- Returns:
The Fingerprint instance that generated the fingerprint
- Return type:
Example
>>> prot = mda.Universe("protein.pdb") >>> prot = prolif.Molecule.from_mda(prot) >>> lig_iter = prolif.mol2_supplier("docking_output.mol2") >>> fp = prolif.Fingerprint() >>> fp.run_from_iterable(lig_iter, prot)
See also
Fingerprint.generate()
to generate the fingerprint between two single structuresChanged in version 0.3.2: Moved the
return_atoms
parameter from therun_from_iterable
method to the dataframe conversion codeChanged in version 1.0.0: Added support for multiprocessing
Changed in version 2.0.0: Changed the format of the
ifp
attribute to be a dictionary containing more complete interaction metadata instead of just atom indices.
- to_bitvectors()[source]#
Converts fingerprints to a list of RDKit ExplicitBitVector
- Returns:
bvs – A list of
ExplicitBitVect
for each frame- Return type:
- Raises:
AttributeError – If the
run()
method hasn’t been used
Example
>>> from rdkit.DataStructs import TanimotoSimilarity >>> bv = fp.to_bitvectors() >>> TanimotoSimilarity(bv[0], bv[1]) 0.42
- to_countvectors()[source]#
Converts fingerprints to a list of RDKit UIntSparseIntVect
- Returns:
cvs – A list of
UIntSparseIntVect
for each frame- Return type:
- Raises:
AttributeError – If the
run()
method hasn’t been used
Example
>>> from rdkit.DataStructs import TanimotoSimilarity >>> cv = fp.to_countvectors() >>> TanimotoSimilarity(cv[0], cv[1]) 0.42
- to_dataframe(*, count=None, dtype=None, drop_empty=True, index_col='Frame')[source]#
Converts fingerprints to a pandas DataFrame
- Parameters:
- Returns:
df – A DataFrame storing the results frame by frame and residue by residue. See
to_dataframe()
for more information- Return type:
- Raises:
AttributeError – If the
run()
method hasn’t been used
Example
>>> df = fp.to_dataframe(dtype=np.uint8) >>> print(df) ligand LIG1.G protein ILE59 ILE55 TYR93 interaction Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking Frame 0 0 1 0 0 0 ...
Changed in version 2.0.0: Removed the
return_atoms
parameter. You can access more metadata information directly throughifp
. Added thecount
parameter.
- to_pickle(path=None)[source]#
Dumps the fingerprint object as a pickle.
- Parameters:
path (str, pathlib.Path or None) – Output path. If
None
, the method returns the pickle as bytes.- Returns:
obj –
None
ifpath
is set, else the bytes corresponding to the pickle- Return type:
None or bytes
Example
>>> dump = fp.to_pickle() >>> saved_fp = Fingerprint.from_pickle(dump) >>> fp.to_pickle("data/fp.pkl") >>> saved_fp = Fingerprint.from_pickle("data/fp.pkl")
See also
from_pickle()
for loading the pickle dump.New in version 1.0.0.
Changed in version 2.0.0: Switched to dill instead of pickle
Storing interactions — prolif.ifp
#
- class prolif.ifp.IFP(dict=None, /, **kwargs)[source]#
Mapping between residue pairs and interaction fingerprint.
Notes
This class provides an easy way to access interaction data from the
ifp
dictionary. This class is a dictionary formatted as:{ tuple[<residue_id>, <residue_id>]: { <interaction name>: tuple[{ "indices": { "ligand": tuple[int, ...], "protein": tuple[int, ...] }, "parent_indices": { "ligand": tuple[int, ...], "protein": tuple[int, ...] }, <other metadata>: <value> }, ...] } }
Here
<residue_id>
corresponds to aResidueId
object. For convenience, one can directly use strings rather thanResidueId
objects when indexing the IFP, e.g.ifp[("LIG1.G", "ASP129.A")]
. You can also use a singleResidueId
or string to return a filtered IFP that only contains interactions with the specified residue, e.g.ifp["ASP129.A"]
.
Detecting interactions between residues — prolif.interactions.interactions
#
Note that some of the SMARTS patterns used in the interaction classes are inspired from Pharmit and RDKit.
- class prolif.interactions.interactions.Anionic(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', anion='[-{1-},$(O=[C,S,P]-[O-])]', distance=4.5)#
Bases:
Cationic
Ionic interaction between a ligand (anion) and a residue (cation)
Changed in version 1.1.0: Handles resonance forms for common acids, amidine and guanidine.
- class prolif.interactions.interactions.CationPi(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), distance=4.5, angle=(0, 30))[source]#
Bases:
Interaction
Cation-Pi interaction between a ligand (cation) and a residue (aromatic ring)
- Parameters:
cation (str) – SMARTS for cation
pi_ring (tuple) – SMARTS for aromatic rings (5 and 6 membered rings only)
distance (float) – Cutoff distance between the centroid and the cation
angle (tuple) – Min and max values for the angle between the vector normal to the ring plane and the vector going from the centroid to the cation
Changed in version 1.1.0: Handles resonance forms for amidine and guanidine as cations.
Changed in version 2.0.0:
angles
parameter renamed toangle
.
- class prolif.interactions.interactions.Cationic(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', anion='[-{1-},$(O=[C,S,P]-[O-])]', distance=4.5)[source]#
Bases:
Distance
Ionic interaction between a ligand (cation) and a residue (anion)
Changed in version 1.1.0: Handles resonance forms for common acids, amidine and guanidine.
- class prolif.interactions.interactions.EdgeToFace(distance=6.5, plane_angle=(50, 90), normal_to_centroid_angle=(0, 30), pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), intersect_radius=1.5)[source]#
Bases:
BasePiStacking
Edge-to-face Pi-Stacking interaction between a ligand and a residue
- Parameters:
distance (float) – Cutoff distance between each rings centroid
plane_angles (tuple) – Min and max values for the angle between the ring planes
normal_centroids_angles (tuple) – Min and max angles allowed between the vector normal to a ring’s plane, and the vector between the centroid of both rings.
pi_ring (list) – List of SMARTS for aromatic rings
intersect_radius (float) – Used to check whether the intersect point between ring planes falls within
intersect_radius
of the opposite ring’s centroid.
Changed in version 1.1.0: In addition to the changes made to the base pi-stacking interaction, this implementation makes sure that the intersection between the perpendicular ring’s plane and the other’s plane falls inside the ring.
Changed in version 2.0.0: Renamed
centroid_distance
to distance, addedintersect_radius
parameter.
- class prolif.interactions.interactions.FaceToFace(distance=5.5, plane_angle=(0, 35), normal_to_centroid_angle=(0, 33), pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'))[source]#
Bases:
BasePiStacking
Face-to-face Pi-Stacking interaction between a ligand and a residue
- Parameters:
distance (float) – Cutoff distance between each rings centroid
plane_angles (tuple) – Min and max values for the angle between the ring planes
normal_centroids_angles (tuple) – Min and max angles allowed between the vector normal to a ring’s plane, and the vector between the centroid of both rings.
pi_ring (list) – List of SMARTS for aromatic rings
Changed in version 2.0.0: Renamed
centroid_distance
to distance.
- class prolif.interactions.interactions.HBAcceptor(acceptor='[#7&!$([nX3])&!$([NX3]-*=[O,N,P,S])&!$([NX3]-[a])&!$([Nv4&+1]),O&!$([OX2](C)C=O)&!$(O(~a)~a)&!$(O=N-*)&!$([O-]-N=O),o+0,F&$(F-[#6])&!$(F-[#6][F,Cl,Br,I])]', donor='[$([O,S;+0]),$([N;v3,v4&+1]),n+0]-[H]', distance=3.5, DHA_angle=(130, 180))[source]#
Bases:
SingleAngle
Hbond interaction between a ligand (acceptor) and a residue (donor)
- Parameters:
Changed in version 1.1.0: The initial SMARTS pattern was too broad.
Changed in version 2.0.0:
angles
parameter renamed toDHA_angle
.
- class prolif.interactions.interactions.HBDonor(acceptor='[#7&!$([nX3])&!$([NX3]-*=[O,N,P,S])&!$([NX3]-[a])&!$([Nv4&+1]),O&!$([OX2](C)C=O)&!$(O(~a)~a)&!$(O=N-*)&!$([O-]-N=O),o+0,F&$(F-[#6])&!$(F-[#6][F,Cl,Br,I])]', donor='[$([O,S;+0]),$([N;v3,v4&+1]),n+0]-[H]', distance=3.5, DHA_angle=(130, 180))#
Bases:
HBAcceptor
Hbond interaction between a ligand (donor) and a residue (acceptor)
- Parameters:
Changed in version 1.1.0: The initial SMARTS pattern was too broad.
Changed in version 2.0.0:
angles
parameter renamed toDHA_angle
.
- class prolif.interactions.interactions.Hydrophobic(hydrophobic='[c,s,Br,I,S&H0&v2,$([D3,D4;#6])&!$([#6]~[#7,#8,#9])&!$([#6X4H0]);+0]', distance=4.5)[source]#
Bases:
Distance
Hydrophobic interaction
- Parameters:
Changed in version 1.1.0: The initial SMARTS pattern was too broad.
- class prolif.interactions.interactions.MetalAcceptor(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,#7&!$([nX3])&!$([NX3]-*=[!#6])&!$([NX3]-[a])&!$([NX4]),-{1-};!+{1-}]', distance=2.8)#
Bases:
MetalDonor
Metal complexation interaction between a ligand (chelated) and a residue (metal)
- Parameters:
Changed in version 1.1.0: The initial SMARTS pattern was too broad.
- class prolif.interactions.interactions.MetalDonor(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,#7&!$([nX3])&!$([NX3]-*=[!#6])&!$([NX3]-[a])&!$([NX4]),-{1-};!+{1-}]', distance=2.8)[source]#
Bases:
Distance
Metal complexation interaction between a ligand (metal) and a residue (chelated)
- Parameters:
Changed in version 1.1.0: The initial SMARTS pattern was too broad.
- class prolif.interactions.interactions.PiCation(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), distance=4.5, angle=(0, 30))#
Bases:
CationPi
Cation-Pi interaction between a ligand (aromatic ring) and a residue (cation)
- Parameters:
cation (str) – SMARTS for cation
pi_ring (tuple) – SMARTS for aromatic rings (5 and 6 membered rings only)
distance (float) – Cutoff distance between the centroid and the cation
angle (tuple) – Min and max values for the angle between the vector normal to the ring plane and the vector going from the centroid to the cation
Changed in version 1.1.0: Handles resonance forms for amidine and guanidine as cations.
Changed in version 2.0.0:
angles
parameter renamed toangle
.
- class prolif.interactions.interactions.PiStacking(ftf_kwargs=None, etf_kwargs=None)[source]#
Bases:
Interaction
Pi-Stacking interaction between a ligand and a residue
- Parameters:
Changed in version 0.3.4: shortest_distance has been replaced by angle_normal_centroid
Changed in version 1.1.0: The implementation now directly calls
EdgeToFace
andFaceToFace
instead of overwriting the default parameters with more generic ones.
- class prolif.interactions.interactions.VdWContact(tolerance=0.0, vdwradii=None)[source]#
Bases:
Interaction
Interaction based on the van der Waals radii of interacting atoms.
- Parameters:
tolerance (float) – Tolerance added to the sum of vdW radii of atoms before comparing to the interatomic distance. If
distance <= sum_vdw + tolerance
the atoms are identified as a contactvdwradii (dict, optional) – Updates to the vdW radii dictionary, with elements (first letter uppercase) as a key and the radius as a value.
- Raises:
ValueError –
tolerance
parameter cannot be negative
Changed in version 2.0.0: Added the
vdwradii
parameter.
- class prolif.interactions.interactions.XBAcceptor(acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]', donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', distance=3.5, AXD_angle=(130, 180), XAR_angle=(80, 140))[source]#
Bases:
DoubleAngle
Halogen bonding between a ligand (acceptor) and a residue (donor).
- Parameters:
acceptor (str) – SMARTS for
[Acceptor]-[R]
donor (str) – SMARTS for
[Donor]-[Halogen]
distance (float) – Cutoff distance between the acceptor and halogen atoms
AXD_angle (tuple) – Min and max values for the
[Acceptor]...[Halogen]-[Donor]
angleXAR_angle (tuple) – Min and max values for the
[Halogen]...[Acceptor]-[R]
angle
Notes
Distance and angle adapted from Auffinger et al. PNAS 2004
Changed in version 2.0.0:
axd_angles
andxar_angles
parameters renamed toAXD_angle
andXAR_angle
.Changed in version 2.0.3: Fixed the SMARTS pattern for acceptors that was not allowing carbonyles and other groups with double bonds to match.
- class prolif.interactions.interactions.XBDonor(acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]', donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', distance=3.5, AXD_angle=(130, 180), XAR_angle=(80, 140))#
Bases:
XBAcceptor
Halogen bonding between a ligand (donor) and a residue (acceptor)
- Parameters:
acceptor (str) – SMARTS for
[Acceptor]-[R]
donor (str) – SMARTS for
[Donor]-[Halogen]
distance (float) – Cutoff distance between the acceptor and halogen atoms
AXD_angle (tuple) – Min and max values for the
[Acceptor]...[Halogen]-[Donor]
angleXAR_angle (tuple) – Min and max values for the
[Halogen]...[Acceptor]-[R]
angle
Notes
Distance and angle adapted from Auffinger et al. PNAS 2004
Changed in version 2.0.0:
axd_angles
andxar_angles
parameters renamed toAXD_angle
andXAR_angle
.Changed in version 2.0.3: Fixed the SMARTS pattern for acceptors that was not allowing carbonyles and other groups with double bonds to match.
Base interaction classes — prolif.interactions.base
#
This module contains the base classes used to build most of the interactions.
- class prolif.interactions.base.BasePiStacking(distance, plane_angle, normal_to_centroid_angle, pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), intersect=False, intersect_radius=1.5)[source]#
Bases:
Interaction
Base class for Pi-Stacking interactions
- Parameters:
distance (float) – Cutoff distance between each rings centroid
plane_angles (tuple) – Min and max values for the angle between the ring planes
normal_centroids_angles (tuple) – Min and max angles allowed between the vector normal to a ring’s plane, and the vector between the centroid of both rings.
pi_ring (list) – List of SMARTS for aromatic rings
intersect (bool) – Whether to look for the point of intersection between both rings (for T-shaped interaction).
intersect_radius (float) – If
intersect=True
, used to check whether the intersect point falls withinintersect_radius
of the opposite ring’s centroid.
Changed in version 1.1.0: The implementation now relies on the angle between the vector normal to a ring’s plane and the vector between centroids (
normal_centroids_angles
) instead of theshortest_distance
parameter.Changed in version 2.0.0: Renamed
centroid_distance
to distance. Addedintersect
andring_radius
parameters.
- class prolif.interactions.base.Distance(lig_pattern, prot_pattern, distance)[source]#
Bases:
Interaction
Generic class for distance-based interactions
- class prolif.interactions.base.DoubleAngle(lig_pattern, prot_pattern, distance, L1P2P1_angle, L2L1P2_angle, distance_atoms=('L1', 'P2'), metadata_mapping=None)[source]#
Bases:
Interaction
Generic class for interactions using constraints on a distance and two angles.
- Parameters:
lig_pattern (str) – SMARTS for
[L1]-[L2]
prot_pattern (str) – SMARTS for
[P1]-[P2]
distance (float) – Distance threshold
L1P2P1_angle (tuple) – Min and max values for the
[L1]...[P2]-[P1]
angleL2L1P2_angle (tuple) – Min and max values for the
[L2]-[L1]...[P2]
angledistance_atoms (tuple[str, str]) – Which atoms to use for the distance calculation: L1 or L2, and P1 or P2
metadata_mapping (dict, optional) – Mapping for names used in the metadata dict for the distance and angle variables
New in version 2.0.0.
- class prolif.interactions.base.Interaction[source]#
Bases:
object
Base class for interactions
All interaction classes must inherit this class and define a
detect()
method.Changed in version 2.0.0: Changed the return type of interactions. Added some helper methods to easily update/derive interaction classes.
- class prolif.interactions.base.SingleAngle(lig_pattern, prot_pattern, distance, angle, distance_atom, metadata_mapping=None)[source]#
Bases:
Interaction
Generic class for interactions using constraints on a distance and an angle.
- Parameters:
lig_pattern (str) – SMARTS pattern for atoms in ligand residues.
prot_pattern (str) – SMARTS pattern for atoms in protein residues, should include (at least) two atoms, the first two atoms will be used for the angle calculation.
distance (float) – Distance threshold.
angle (tuple) – Min and max values for the
[P1]-[P2]...[L1]
angle, where L1 is the first atom in thelig_pattern
SMARTS, and P1 and P2 are the first two atoms in theprot_pattern
SMARTS.distance_atom (str) – Which atom from the protein (P1 or P2) to use for the distance calculation with L1.
metadata_mapping (dict, optional) – Mapping for names used in the metadata dict for the distance and angle variables
New in version 2.0.0.
Generating an IFP in parallel — prolif.parallel
#
This module provides two classes, TrajectoryPool
and MolIterablePool
to execute the analysis in parallel. These are used in the parallel implementation used
in run()
and
run_from_iterable()
respectively.
- class prolif.parallel.MolIterablePool(n_processes, fingerprint, prot_mol, residues, tqdm_kwargs)[source]#
Process pool for a parallelized IFP analysis on an iterable of ligands. Must be used in a
with
statement.- Parameters:
n_processes (int) – Max number of processes
fingerprint (prolif.fingerprint.Fingerprint) – Fingerprint instance used to generate the IFP
prot_mol (prolif.molecule.Molecule) – Protein molecule
residues (list, optional) – List of protein residues considered for the IFP
tqdm_kwargs (dict) – Parameters for the
tqdm
progress bar
- pool#
The underlying pool instance.
- Type:
- classmethod executor(mol)[source]#
Classmethod executed by each child process on a single ligand molecule from the input iterable.
- Returns:
ifp_data – A dictionary indexed by
(ligand, protein)
residue pairs, and each value is a sparse dictionary of metadata indexed by interaction name.- Return type:
- classmethod initializer(fingerprint, prot_mol, residues)[source]#
Initializer classmethod passed to the pool so that each child process can access these objects without copying them.
- process(args_iterable)[source]#
Maps the input iterable of molecules to the executor function.
- Parameters:
args_iterable (Iterable[prolif.molecule.Molecule]) – An iterable yielding ligand molecules
- Returns:
ifp – An iterable of
IFP
dictionaries.- Return type:
- class prolif.parallel.Progress(killswitch, tracker, **kwargs)[source]#
Helper class to track the number of frames processed by the
TrajectoryPool
.- Parameters:
killswitch (threading.Event) – A threading.Event instance created and controlled by the
TrajectoryPool
to kill the thread updating thetqdm
progress bar.tracker (multiprocess.Value) – Value holding a
ctypes.c_uint32
ctype updated by theTrajectoryPool
, storing how many frames were processed since the last progress bar update.**kwargs (object) – Used to create the
tqdm
progress bar.
- delay#
Delay between progress bar updates. This requires locking access to the
tracker
object which theTrajectoryPool
needs access to, so too small values might cause delays in the analysis.- Type:
float, default = 0.5
- tqdm_pbar#
The progress bar displayed
- Type:
tqdm.std.tqdm
- close()[source]#
Cleanup after the
TrajectoryPool
computation is done
- class prolif.parallel.TrajectoryPool(n_processes, fingerprint, residues, tqdm_kwargs, rdkitconverter_kwargs)[source]#
Process pool for a parallelized IFP analysis on an MD trajectory. Must be used in a
with
statement.- Parameters:
n_processes (int) – Max number of processes
fingerprint (prolif.fingerprint.Fingerprint) – Fingerprint instance used to generate the IFP
residues (list, optional) – List of protein residues considered for the IFP
tqdm_kwargs (dict) – Parameters for the
tqdm
progress barrdkitconverter_kwargs (tuple[dict, dict]) – Parameters for the
RDKitConverter
from MDAnalysis: the first for the ligand, and the second for the protein
- tracker#
Value holding a
ctypes.c_uint32
ctype storing how many frames were processed since the last progress bar update.- Type:
multiprocess.Value
- pool#
The underlying pool instance.
- Type:
- classmethod executor(args)[source]#
Classmethod executed by each child process on a chunk of the trajectory
- Returns:
ifp_chunk – A dictionary of
IFP
indexed by frame number- Return type:
- classmethod initializer(tracker, fingerprint, residues, rdkitconverter_kwargs)[source]#
Initializer classmethod passed to the pool so that each child process can access these objects without copying them.