Interaction fingerprint#

Calculate a Protein-Ligand Interaction Fingerprint — prolif.fingerprint#

In [1]: import MDAnalysis as mda

In [2]: from rdkit.DataStructs import TanimotoSimilarity

In [3]: import prolif

In [4]: u = mda.Universe(prolif.datafiles.TOP, prolif.datafiles.TRAJ)

In [5]: prot = u.select_atoms("protein")

In [6]: lig = u.select_atoms("resname LIG")

In [7]: fp = prolif.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking", "CationPi",
   ...:                          "Cationic"])
   ...: 

In [8]: fp.run(u.trajectory[:5], lig, prot)
Out[8]: <prolif.fingerprint.Fingerprint: 5 interactions: ['HBAcceptor', 'HBDonor', 'Cationic', 'CationPi', 'PiStacking'] at 0x7fbb006be090>

In [9]: df = fp.to_dataframe()

In [10]: df
Out[10]: 
ligand        LIG1.G                                        
protein     ASP129.A            VAL201.A SER212.A   PHE331.B
interaction  HBDonor Cationic HBAcceptor  HBDonor PiStacking
Frame                                                       
0               True     True       True     True       True
1               True     True       True    False       True
2               True     True       True    False      False
3               True     True       True    False       True
4               True     True       True    False      False

In [11]: bv = fp.to_bitvectors()

In [12]: TanimotoSimilarity(bv[0], bv[1])
Out[12]: 0.8

# save results
In [13]: fp.to_pickle("fingerprint.pkl")

# load
In [14]: fp = prolif.Fingerprint.from_pickle("fingerprint.pkl")
class prolif.fingerprint.Fingerprint(interactions=['Hydrophobic', 'HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation', 'VdWContact'], parameters=None, count=False, vicinity_cutoff=6.0)[source]#

Class that generates an interaction fingerprint between two molecules

While in most cases the fingerprint will be generated between a ligand and a protein, it is also possible to use this class for protein-protein interactions, or host-guest systems.

Parameters:
  • interactions (list) – List of names (str) of interaction classes as found in the prolif.interactions module.

  • parameters (dict, optional) – New parameters for the interactions. Mapping between an interaction name and a dict of parameters as they appear in the interaction class.

  • count (bool) –

    For a given interaction class and pair of residues, there might be multiple combinations of atoms that satisfy the interaction constraints. This parameter controls how the fingerprint treats these different combinations:

    • False: returns the first combination that satisfy the constraints (fast)

    • True: returns all the combinations (slow)

  • vicinity_cutoff (float) – Automatically restrict the analysis to residues within this range of the ligand. This parameter is ignored if the residues parameter of the run methods is set to anything other than None.

interactions#

Dictionary of interaction classes indexed by class name. For more details, see prolif.interactions

Type:

dict

n_interactions#

Number of interaction functions registered by the fingerprint

Type:

int

vicinity_cutoff#

Used when calling prolif.utils.get_residues_near_ligand().

Type:

float

count#

Whether to keep track of all interaction occurences or just the first one.

Type:

bool

ifp#

Dict of interaction fingerprints in a sparse format for the given trajectory or docking poses: {<frame number>: <IFP>}. See the IFP class for more information.

Type:

dict, optional

Raises:

NameError – Unknown interaction in the interactions or parameters parameters.

Notes

You can use the fingerprint generator in multiple ways:

  • On a trajectory directly from MDAnalysis objects:

In [1]: prot = u.select_atoms("protein")

In [2]: lig = u.select_atoms("resname LIG")

In [3]: fp = prolif.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking",
   ...:                          "Hydrophobic"])
   ...: 

In [4]: fp.run(u.trajectory[:5], lig, prot)
Out[4]: <prolif.fingerprint.Fingerprint: 4 interactions: ['Hydrophobic', 'HBAcceptor', 'HBDonor', 'PiStacking'] at 0x7fbb002e8a10>

In [5]: fp.to_dataframe()
Out[5]: 
ligand           LIG1.G              ...                        
protein        TYR109.A    TRP125.A  ...    MET337.B    PHE351.B
interaction Hydrophobic Hydrophobic  ... Hydrophobic Hydrophobic
Frame                                ...                        
0                  True        True  ...        True        True
1                 False        True  ...        True        True
2                  True        True  ...       False        True
3                 False        True  ...        True        True
4                  True        True  ...        True        True

[5 rows x 19 columns]
  • On two single structures (from RDKit or MDAnalysis):

In [6]: u.trajectory[0]  # use coordinates of the first frame
Out[6]: < Timestep 0 with unit cell dimensions [ 89.6909   87.95726 102.02041  90.       90.       90.     ] >

In [7]: prot = prolif.Molecule.from_mda(prot)

In [8]: lig = prolif.Molecule.from_mda(lig)

In [9]: ifp = fp.generate(lig, prot, metadata=True)

In [10]: prolif.to_dataframe({0: ifp}, fp.interactions)
Out[10]: 
ligand           LIG1.G              ...                        
protein        TYR109.A    TRP125.A  ...    MET337.B    PHE351.B
interaction Hydrophobic Hydrophobic  ... Hydrophobic Hydrophobic
Frame                                ...                        
0                  True        True  ...        True        True

[1 rows x 18 columns]
  • On a specific pair of residues for a specific interaction:

# ligand-protein
In [11]: next(fp.hbdonor(lig, prot["ASP129.A"]))
Out[11]: True

# protein-protein
In [12]: next(fp.hbacceptor(prot["ASP129.A"], prot["CYS133.A"]))
Out[12]: True

You can also obtain the indices of atoms responsible for the interaction:

In [13]: fp.metadata(lig, prot["ASP129.A"])
Out[13]: 
{'HBDonor': ({'indices': {'ligand': (13, 52), 'protein': (8,)},
   'parent_indices': {'ligand': (13, 52), 'protein': (1489,)},
   'distance': 2.7534162113079534,
   'DHA_angle': 158.09471525304645},)}

In [14]: next(fp.hbdonor(lig, prot["ASP129.A"], metadata=True), None)
Out[14]: 
{'indices': {'ligand': (13, 52), 'protein': (8,)},
 'parent_indices': {'ligand': (13, 52), 'protein': (1489,)},
 'distance': 2.7534162113079534,
 'DHA_angle': 158.09471525304645}

Changed in version 1.0.0: Added pickle support

Changed in version 2.0.0: Changed the format of the ifp attribute to be a dictionary containing more complete interaction metadata instead of just atom indices. Removed the return_atoms argument in to_dataframe(). Users should directly use ifp instead. Added the plot_lignetwork() method to generate the LigNetwork plot. Replaced the Fingerprint.bitvector_atoms method with Fingerprint.metadata(). Added a vicinity_cutoff parameter controlling the distance used to automatically restrict the IFP calculation to residues within the specified range of the ligand. Removed the __wrapped__ attribute on interaction methods that are available from the fingerprint object. These methods now accept a metadata parameter instead. Added the parameters argument to set the interaction classes parameters without having to create a new class. Added the count argument to control for a given pair of residues whether to return all occurences of an interaction or only the first one.

bitvector(res1, res2)[source]#

Generates the complete bitvector for the interactions between two residues. To access metadata for each interaction, see metadata().

Parameters:
Returns:

bitvector – An array storing the encoded interactions between res1 and res2. Depending on Fingerprint.count, the dtype of the array will be bool or uint8.

Return type:

numpy.ndarray

classmethod from_pickle(path_or_bytes)[source]#

Creates a fingerprint object from a pickle dump.

Parameters:

path_or_bytes (str, pathlib.Path or bytes) – The path to the pickle file, or bytes corresponding to a pickle dump.

See also

to_pickle() for creating the pickle dump.

New in version 1.0.0.

Changed in version 2.0.0: Switched to dill instead of pickle

generate(lig, prot, residues=None, metadata=False)[source]#

Generates the interaction fingerprint between 2 molecules

Parameters:
  • lig (prolif.molecule.Molecule) – Molecule for the ligand

  • prot (prolif.molecule.Molecule) – Molecule for the protein

  • residues (list or "all" or None) – A list of protein residues (str, int or ResidueId) to take into account for the fingerprint extraction. If "all", all residues will be used. If None, at each frame the get_residues_near_ligand() function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue (see vicinity_cutoff)

  • metadata (bool) – For each residue pair and interaction, return an interaction metadata dictionary instead of bits.

Returns:

ifp – A dictionary indexed by (ligand, protein) residue pairs. The format for values will depend on metadata:

  • A single numpy array if metadata=False

  • A sparse dictionary of metadata tuples indexed by interaction name if metadata=True

Return type:

prolif.ifp.IFP

Example

>>> u = mda.Universe("complex.pdb")
>>> lig = prolif.Molecule.from_mda(u, "resname LIG")
>>> prot = prolif.Molecule.from_mda(u, "protein")
>>> fp = prolif.Fingerprint()
>>> ifp = fp.generate(lig, prot)

New in version 0.3.2.

Changed in version 2.0.0: return_atoms replaced by metadata, and it now returns a sparse dictionary of metadata tuples indexed by interaction name instead of a tuple of arrays.

static list_available(show_hidden=False)[source]#

List interactions available to the Fingerprint class.

Parameters:

show_hidden (bool) – Show hidden classes (base classes meant to be inherited from to create custom interactions).

metadata(res1, res2)[source]#

Generates a metadata dictionary for the interactions between two residues.

Parameters:
Returns:

metadata – Dict containing tuples of metadata dictionaries indexed by interaction name. If a specific interaction is not present between residues, it is filtered out of the dictionary.

Return type:

dict[str, tuple[dict, …]]

Changed in version 0.3.2: Atom indices are returned as two separate lists instead of a single list of tuples

Changed in version 2.0.0: Returns a dictionnary with all available metadata for each interaction, rather than just atom indices.

plot_3d(ligand_mol: Molecule, protein_mol: Molecule, *, frame: int, size: Tuple[int, int] = (650, 600), display_all: bool = False)[source]#

Generate and display the complex in 3D with py3Dmol from a fingerprint object that has been used to run an analysis.

Parameters:
  • ligand_mol (Molecule) – The ligand molecule to display.

  • protein_mol (Molecule) – The protein molecule to display.

  • frame (int) – The frame number chosen to select which interactions are going to be displayed.

  • size (Tuple[int, int] = (650, 600)) – The size of the py3Dmol widget view.

  • display_all (bool) – Display all occurences for a given pair of residues and interaction, or only the shortest one. Not relevant if count=False in the Fingerprint object.

plot_barcode(*, figsize: Tuple[int, int] = (8, 10), dpi: int = 100, interactive: bool = False, n_frame_ticks: int = 10, residues_tick_location: Literal['top', 'bottom'] = 'top', xlabel: str = 'Frame', subplots_kwargs: Optional[dict] = None, tight_layout_kwargs: Optional[dict] = None)[source]#

Generate and display a Barcode plot from a fingerprint object that has been used to run an analysis.

Parameters:
  • figsize (Tuple[int, int] = (8, 10)) – Size of the matplotlib figure.

  • dpi (int = 100) – DPI used for the matplotlib figure.

  • interactive (bool) – Add hover interactivity to the plot (only relevant for notebooks). You may need to add %matplotlib notebook or %matplotlib ipympl for it to work as expected.

  • n_frame_ticks (int = 10) – Number of ticks on the X axis. May use ±1 tick to have them evenly spaced.

  • residues_tick_location (Literal["top", "bottom"] = "top") – Whether the Y ticks appear at the top or at the bottom of the series of interactions of each residue.

  • xlabel (str = "Frame") – Label displayed for the X axis.

  • subplots_kwargs (Optional[dict] = None) – Other parameters passed to matplotlib.pyplot.subplots().

  • tight_layout_kwargs (Optional[dict] = None) – Other parameters passed to matplotlib.figure.Figure.tight_layout().

plot_lignetwork(ligand_mol, *, kind: Literal['aggregate', 'frame'] = 'aggregate', frame: int = 0, display_all: bool = False, threshold: float = 0.3, use_coordinates: bool = False, flatten_coordinates: bool = True, kekulize: bool = False, molsize: int = 35, rotation: float = 0, carbon: float = 0.16, width: str = '100%', height: str = '500px')[source]#

Generate and display a LigNetwork plot from a fingerprint object that has been used to run an analysis.

Parameters:
  • ligand_mol (rdkit.Chem.rdChem.Mol) – Ligand molecule.

  • kind (str) – One of "aggregate" or "frame".

  • frame (int) – Frame number (see ifp). Only applicable for kind="frame".

  • display_all (bool) – Display all occurences for a given pair of residues and interaction, or only the shortest one. Only applicable for kind="frame". Not relevant if count=False in the Fingerprint object.

  • threshold (float) – Frequency threshold, between 0 and 1. Only applicable for kind="aggregate".

  • use_coordinates (bool) – If True, uses the coordinates of the molecule directly, otherwise generates 2D coordinates from scratch. See also flatten_coordinates.

  • flatten_coordinates (bool) – If this is True and use_coordinates=True, generates 2D coordinates that are constrained to fit the 3D conformation of the ligand as best as possible.

  • kekulize (bool) – Kekulize the ligand.

  • molsize (int) – Multiply the coordinates by this number to create a bigger and more readable depiction.

  • rotation (int) – Rotate the structure on the XY plane.

  • carbon (float) – Size of the carbon atom dots on the depiction. Use 0 to hide the carbon dots.

  • width (str) – Width of the IFrame window.

  • height (str) – Height of the IFrame window.

Notes

Two kinds of diagrams can be rendered: either for a designated frame or by aggregating the results on the whole IFP and optionnally discarding interactions that occur less frequently than a threshold. In the latter case (aggregate), only the group of atoms most frequently involved in each interaction is used to draw the edge.

run(traj, lig, prot, *, residues=None, converter_kwargs=None, progress=True, n_jobs=None)[source]#

Generates the fingerprint on a trajectory for a ligand and a protein

Parameters:
  • traj (MDAnalysis.coordinates.base.ProtoReader or MDAnalysis.coordinates.base.FrameIteratorSliced) – Iterate over this Universe trajectory or sliced trajectory object to extract the frames used for the fingerprint extraction

  • lig (MDAnalysis.core.groups.AtomGroup) – An MDAnalysis AtomGroup for the ligand

  • prot (MDAnalysis.core.groups.AtomGroup) – An MDAnalysis AtomGroup for the protein (with multiple residues)

  • residues (list or "all" or None) – A list of protein residues (str, int or ResidueId) to take into account for the fingerprint extraction. If "all", all residues will be used. If None, at each frame the get_residues_near_ligand() function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue.

  • converter_kwargs (tuple[dict, dict], optional) – Tuple of kwargs passed to the underlying RDKitConverter from MDAnalysis: the first for the ligand, and the second for the protein

  • progress (bool) – Display a tqdm progressbar while running the calculation

  • n_jobs (int or None) – Number of processes to run in parallel. If n_jobs=None, the analysis will use all available CPU threads, while if n_jobs=1, the analysis will run in serial.

Raises:

ValueError – If n_jobs <= 0

Returns:

The Fingerprint instance that generated the fingerprint

Return type:

prolif.fingerprint.Fingerprint

Example

>>> u = mda.Universe("top.pdb", "traj.nc")
>>> lig = u.select_atoms("resname LIG")
>>> prot = u.select_atoms("protein")
>>> fp = prolif.Fingerprint().run(u.trajectory[:10], lig, prot)

See also

Changed in version 0.3.2: Moved the return_atoms parameter from the run method to the dataframe conversion code

Changed in version 1.0.0: Added support for multiprocessing

Changed in version 1.1.0: Added support for passing kwargs to the RDKitConverter through the converter_kwargs parameter

Changed in version 2.0.0: Changed the format of the ifp attribute to be a dictionary containing more complete interaction metadata instead of just atom indices.

run_from_iterable(lig_iterable, prot_mol, *, residues=None, progress=True, n_jobs=None)[source]#

Generates the fingerprint between a list of ligands and a protein

Parameters:
  • lig_iterable (list or generator) – An iterable yielding ligands as Molecule objects

  • prot_mol (prolif.molecule.Molecule) – The protein

  • residues (list or "all" or None) – A list of protein residues (str, int or ResidueId) to take into account for the fingerprint extraction. If "all", all residues will be used. If None, at each frame the get_residues_near_ligand() function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue.

  • progress (bool) – Display a tqdm progressbar while running the calculation

  • n_jobs (int or None) – Number of processes to run in parallel. If n_jobs=None, the analysis will use all available CPU threads, while if n_jobs=1, the analysis will run in serial.

Raises:

ValueError – If n_jobs <= 0

Returns:

The Fingerprint instance that generated the fingerprint

Return type:

prolif.fingerprint.Fingerprint

Example

>>> prot = mda.Universe("protein.pdb")
>>> prot = prolif.Molecule.from_mda(prot)
>>> lig_iter = prolif.mol2_supplier("docking_output.mol2")
>>> fp = prolif.Fingerprint()
>>> fp.run_from_iterable(lig_iter, prot)

See also

Fingerprint.generate() to generate the fingerprint between two single structures

Changed in version 0.3.2: Moved the return_atoms parameter from the run_from_iterable method to the dataframe conversion code

Changed in version 1.0.0: Added support for multiprocessing

Changed in version 2.0.0: Changed the format of the ifp attribute to be a dictionary containing more complete interaction metadata instead of just atom indices.

to_bitvectors()[source]#

Converts fingerprints to a list of RDKit ExplicitBitVector

Returns:

bvs – A list of ExplicitBitVect for each frame

Return type:

list

Raises:

AttributeError – If the run() method hasn’t been used

Example

>>> from rdkit.DataStructs import TanimotoSimilarity
>>> bv = fp.to_bitvectors()
>>> TanimotoSimilarity(bv[0], bv[1])
0.42
to_countvectors()[source]#

Converts fingerprints to a list of RDKit UIntSparseIntVect

Returns:

cvs – A list of UIntSparseIntVect for each frame

Return type:

list

Raises:

AttributeError – If the run() method hasn’t been used

Example

>>> from rdkit.DataStructs import TanimotoSimilarity
>>> cv = fp.to_countvectors()
>>> TanimotoSimilarity(cv[0], cv[1])
0.42
to_dataframe(*, count=None, dtype=None, drop_empty=True, index_col='Frame')[source]#

Converts fingerprints to a pandas DataFrame

Parameters:
  • count (bool or None) – Whether to output a count fingerprint or not.

  • dtype (object or None) – Cast the dataframe values to this type. If None, uses np.uint8 if count=True, else bool.

  • drop_empty (bool) – Drop columns with only empty values

  • index_col (str) – Name of the index column in the DataFrame

Returns:

df – A DataFrame storing the results frame by frame and residue by residue. See to_dataframe() for more information

Return type:

pandas.DataFrame

Raises:

AttributeError – If the run() method hasn’t been used

Example

>>> df = fp.to_dataframe(dtype=np.uint8)
>>> print(df)
ligand             LIG1.G
protein             ILE59                  ILE55       TYR93
interaction   Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking
Frame
0                       0          1           0           0          0
...

Changed in version 2.0.0: Removed the return_atoms parameter. You can access more metadata information directly through ifp. Added the count parameter.

to_pickle(path=None)[source]#

Dumps the fingerprint object as a pickle.

Parameters:

path (str, pathlib.Path or None) – Output path. If None, the method returns the pickle as bytes.

Returns:

objNone if path is set, else the bytes corresponding to the pickle

Return type:

None or bytes

Example

>>> dump = fp.to_pickle()
>>> saved_fp = Fingerprint.from_pickle(dump)
>>> fp.to_pickle("data/fp.pkl")
>>> saved_fp = Fingerprint.from_pickle("data/fp.pkl")

See also

from_pickle() for loading the pickle dump.

New in version 1.0.0.

Changed in version 2.0.0: Switched to dill instead of pickle

Storing interactions — prolif.ifp#

class prolif.ifp.IFP(dict=None, /, **kwargs)[source]#

Mapping between residue pairs and interaction fingerprint.

Notes

This class provides an easy way to access interaction data from the ifp dictionary. This class is a dictionary formatted as:

{
    tuple[<residue_id>, <residue_id>]: {
        <interaction name>: tuple[{
            "indices": {
                "ligand": tuple[int, ...],
                "protein": tuple[int, ...]
            },
            "parent_indices": {
                "ligand": tuple[int, ...],
                "protein": tuple[int, ...]
            },
            <other metadata>: <value>
        }, ...]
    }
}

Here <residue_id> corresponds to a ResidueId object. For convenience, one can directly use strings rather than ResidueId objects when indexing the IFP, e.g. ifp[("LIG1.G", "ASP129.A")]. You can also use a single ResidueId or string to return a filtered IFP that only contains interactions with the specified residue, e.g. ifp["ASP129.A"].

Detecting interactions between residues — prolif.interactions.interactions#

Note that some of the SMARTS patterns used in the interaction classes are inspired from Pharmit and RDKit.

class prolif.interactions.interactions.Anionic(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', anion='[-{1-},$(O=[C,S,P]-[O-])]', distance=4.5)#

Bases: Cationic

Ionic interaction between a ligand (anion) and a residue (cation)

Changed in version 1.1.0: Handles resonance forms for common acids, amidine and guanidine.

class prolif.interactions.interactions.CationPi(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), distance=4.5, angle=(0, 30))[source]#

Bases: Interaction

Cation-Pi interaction between a ligand (cation) and a residue (aromatic ring)

Parameters:
  • cation (str) – SMARTS for cation

  • pi_ring (tuple) – SMARTS for aromatic rings (5 and 6 membered rings only)

  • distance (float) – Cutoff distance between the centroid and the cation

  • angle (tuple) – Min and max values for the angle between the vector normal to the ring plane and the vector going from the centroid to the cation

Changed in version 1.1.0: Handles resonance forms for amidine and guanidine as cations.

Changed in version 2.0.0: angles parameter renamed to angle.

class prolif.interactions.interactions.Cationic(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', anion='[-{1-},$(O=[C,S,P]-[O-])]', distance=4.5)[source]#

Bases: Distance

Ionic interaction between a ligand (cation) and a residue (anion)

Changed in version 1.1.0: Handles resonance forms for common acids, amidine and guanidine.

class prolif.interactions.interactions.EdgeToFace(distance=6.5, plane_angle=(50, 90), normal_to_centroid_angle=(0, 30), pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), intersect_radius=1.5)[source]#

Bases: BasePiStacking

Edge-to-face Pi-Stacking interaction between a ligand and a residue

Parameters:
  • distance (float) – Cutoff distance between each rings centroid

  • plane_angles (tuple) – Min and max values for the angle between the ring planes

  • normal_centroids_angles (tuple) – Min and max angles allowed between the vector normal to a ring’s plane, and the vector between the centroid of both rings.

  • pi_ring (list) – List of SMARTS for aromatic rings

  • intersect_radius (float) – Used to check whether the intersect point between ring planes falls within intersect_radius of the opposite ring’s centroid.

Changed in version 1.1.0: In addition to the changes made to the base pi-stacking interaction, this implementation makes sure that the intersection between the perpendicular ring’s plane and the other’s plane falls inside the ring.

Changed in version 2.0.0: Renamed centroid_distance to distance, added intersect_radius parameter.

class prolif.interactions.interactions.FaceToFace(distance=5.5, plane_angle=(0, 35), normal_to_centroid_angle=(0, 33), pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'))[source]#

Bases: BasePiStacking

Face-to-face Pi-Stacking interaction between a ligand and a residue

Parameters:
  • distance (float) – Cutoff distance between each rings centroid

  • plane_angles (tuple) – Min and max values for the angle between the ring planes

  • normal_centroids_angles (tuple) – Min and max angles allowed between the vector normal to a ring’s plane, and the vector between the centroid of both rings.

  • pi_ring (list) – List of SMARTS for aromatic rings

Changed in version 2.0.0: Renamed centroid_distance to distance.

class prolif.interactions.interactions.HBAcceptor(acceptor='[#7&!$([nX3])&!$([NX3]-*=[O,N,P,S])&!$([NX3]-[a])&!$([Nv4&+1]),O&!$([OX2](C)C=O)&!$(O(~a)~a)&!$(O=N-*)&!$([O-]-N=O),o+0,F&$(F-[#6])&!$(F-[#6][F,Cl,Br,I])]', donor='[$([O,S;+0]),$([N;v3,v4&+1]),n+0]-[H]', distance=3.5, DHA_angle=(130, 180))[source]#

Bases: SingleAngle

Hbond interaction between a ligand (acceptor) and a residue (donor)

Parameters:
  • acceptor (str) – SMARTS for [Acceptor]

  • donor (str) – SMARTS for [Donor]-[Hydrogen]

  • distance (float) – Distance threshold between the acceptor and donor atoms

  • DHA_angle (tuple) – Min and max values for the [Acceptor]...[Hydrogen]-[Donor] angle

Changed in version 1.1.0: The initial SMARTS pattern was too broad.

Changed in version 2.0.0: angles parameter renamed to DHA_angle.

class prolif.interactions.interactions.HBDonor(acceptor='[#7&!$([nX3])&!$([NX3]-*=[O,N,P,S])&!$([NX3]-[a])&!$([Nv4&+1]),O&!$([OX2](C)C=O)&!$(O(~a)~a)&!$(O=N-*)&!$([O-]-N=O),o+0,F&$(F-[#6])&!$(F-[#6][F,Cl,Br,I])]', donor='[$([O,S;+0]),$([N;v3,v4&+1]),n+0]-[H]', distance=3.5, DHA_angle=(130, 180))#

Bases: HBAcceptor

Hbond interaction between a ligand (donor) and a residue (acceptor)

Parameters:
  • acceptor (str) – SMARTS for [Acceptor]

  • donor (str) – SMARTS for [Donor]-[Hydrogen]

  • distance (float) – Distance threshold between the acceptor and donor atoms

  • DHA_angle (tuple) – Min and max values for the [Acceptor]...[Hydrogen]-[Donor] angle

Changed in version 1.1.0: The initial SMARTS pattern was too broad.

Changed in version 2.0.0: angles parameter renamed to DHA_angle.

class prolif.interactions.interactions.Hydrophobic(hydrophobic='[c,s,Br,I,S&H0&v2,$([D3,D4;#6])&!$([#6]~[#7,#8,#9])&!$([#6X4H0]);+0]', distance=4.5)[source]#

Bases: Distance

Hydrophobic interaction

Parameters:
  • hydrophobic (str) – SMARTS query for hydrophobic atoms

  • distance (float) – Distance threshold for the interaction

Changed in version 1.1.0: The initial SMARTS pattern was too broad.

class prolif.interactions.interactions.MetalAcceptor(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,#7&!$([nX3])&!$([NX3]-*=[!#6])&!$([NX3]-[a])&!$([NX4]),-{1-};!+{1-}]', distance=2.8)#

Bases: MetalDonor

Metal complexation interaction between a ligand (chelated) and a residue (metal)

Parameters:
  • metal (str) – SMARTS for a transition metal

  • ligand (str) – SMARTS for a ligand

  • distance (float) – Cutoff distance

Changed in version 1.1.0: The initial SMARTS pattern was too broad.

class prolif.interactions.interactions.MetalDonor(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,#7&!$([nX3])&!$([NX3]-*=[!#6])&!$([NX3]-[a])&!$([NX4]),-{1-};!+{1-}]', distance=2.8)[source]#

Bases: Distance

Metal complexation interaction between a ligand (metal) and a residue (chelated)

Parameters:
  • metal (str) – SMARTS for a transition metal

  • ligand (str) – SMARTS for a ligand

  • distance (float) – Cutoff distance

Changed in version 1.1.0: The initial SMARTS pattern was too broad.

class prolif.interactions.interactions.PiCation(cation='[+{1-},$([NX3&!$([NX3]-O)]-[C]=[NX3+])]', pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), distance=4.5, angle=(0, 30))#

Bases: CationPi

Cation-Pi interaction between a ligand (aromatic ring) and a residue (cation)

Parameters:
  • cation (str) – SMARTS for cation

  • pi_ring (tuple) – SMARTS for aromatic rings (5 and 6 membered rings only)

  • distance (float) – Cutoff distance between the centroid and the cation

  • angle (tuple) – Min and max values for the angle between the vector normal to the ring plane and the vector going from the centroid to the cation

Changed in version 1.1.0: Handles resonance forms for amidine and guanidine as cations.

Changed in version 2.0.0: angles parameter renamed to angle.

class prolif.interactions.interactions.PiStacking(ftf_kwargs=None, etf_kwargs=None)[source]#

Bases: Interaction

Pi-Stacking interaction between a ligand and a residue

Parameters:
  • ftf_kwargs (dict) – Parameters to pass to the underlying FaceToFace class

  • etf_kwargs (dict) – Parameters to pass to the underlying EdgeToFace class

Changed in version 0.3.4: shortest_distance has been replaced by angle_normal_centroid

Changed in version 1.1.0: The implementation now directly calls EdgeToFace and FaceToFace instead of overwriting the default parameters with more generic ones.

class prolif.interactions.interactions.VdWContact(tolerance=0.0, vdwradii=None)[source]#

Bases: Interaction

Interaction based on the van der Waals radii of interacting atoms.

Parameters:
  • tolerance (float) – Tolerance added to the sum of vdW radii of atoms before comparing to the interatomic distance. If distance <= sum_vdw + tolerance the atoms are identified as a contact

  • vdwradii (dict, optional) – Updates to the vdW radii dictionary, with elements (first letter uppercase) as a key and the radius as a value.

Raises:

ValueErrortolerance parameter cannot be negative

Changed in version 2.0.0: Added the vdwradii parameter.

class prolif.interactions.interactions.XBAcceptor(acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]', donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', distance=3.5, AXD_angle=(130, 180), XAR_angle=(80, 140))[source]#

Bases: DoubleAngle

Halogen bonding between a ligand (acceptor) and a residue (donor).

Parameters:
  • acceptor (str) – SMARTS for [Acceptor]-[R]

  • donor (str) – SMARTS for [Donor]-[Halogen]

  • distance (float) – Cutoff distance between the acceptor and halogen atoms

  • AXD_angle (tuple) – Min and max values for the [Acceptor]...[Halogen]-[Donor] angle

  • XAR_angle (tuple) – Min and max values for the [Halogen]...[Acceptor]-[R] angle

Notes

Distance and angle adapted from Auffinger et al. PNAS 2004

Changed in version 2.0.0: axd_angles and xar_angles parameters renamed to AXD_angle and XAR_angle.

Changed in version 2.0.3: Fixed the SMARTS pattern for acceptors that was not allowing carbonyles and other groups with double bonds to match.

class prolif.interactions.interactions.XBDonor(acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}]!#[*]', donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', distance=3.5, AXD_angle=(130, 180), XAR_angle=(80, 140))#

Bases: XBAcceptor

Halogen bonding between a ligand (donor) and a residue (acceptor)

Parameters:
  • acceptor (str) – SMARTS for [Acceptor]-[R]

  • donor (str) – SMARTS for [Donor]-[Halogen]

  • distance (float) – Cutoff distance between the acceptor and halogen atoms

  • AXD_angle (tuple) – Min and max values for the [Acceptor]...[Halogen]-[Donor] angle

  • XAR_angle (tuple) – Min and max values for the [Halogen]...[Acceptor]-[R] angle

Notes

Distance and angle adapted from Auffinger et al. PNAS 2004

Changed in version 2.0.0: axd_angles and xar_angles parameters renamed to AXD_angle and XAR_angle.

Changed in version 2.0.3: Fixed the SMARTS pattern for acceptors that was not allowing carbonyles and other groups with double bonds to match.

Base interaction classes — prolif.interactions.base#

This module contains the base classes used to build most of the interactions.

class prolif.interactions.base.BasePiStacking(distance, plane_angle, normal_to_centroid_angle, pi_ring=('[a;r6]1:[a;r6]:[a;r6]:[a;r6]:[a;r6]:[a;r6]:1', '[a;r5]1:[a;r5]:[a;r5]:[a;r5]:[a;r5]:1'), intersect=False, intersect_radius=1.5)[source]#

Bases: Interaction

Base class for Pi-Stacking interactions

Parameters:
  • distance (float) – Cutoff distance between each rings centroid

  • plane_angles (tuple) – Min and max values for the angle between the ring planes

  • normal_centroids_angles (tuple) – Min and max angles allowed between the vector normal to a ring’s plane, and the vector between the centroid of both rings.

  • pi_ring (list) – List of SMARTS for aromatic rings

  • intersect (bool) – Whether to look for the point of intersection between both rings (for T-shaped interaction).

  • intersect_radius (float) – If intersect=True, used to check whether the intersect point falls within intersect_radius of the opposite ring’s centroid.

Changed in version 1.1.0: The implementation now relies on the angle between the vector normal to a ring’s plane and the vector between centroids (normal_centroids_angles) instead of the shortest_distance parameter.

Changed in version 2.0.0: Renamed centroid_distance to distance. Added intersect and ring_radius parameters.

class prolif.interactions.base.Distance(lig_pattern, prot_pattern, distance)[source]#

Bases: Interaction

Generic class for distance-based interactions

Parameters:
  • lig_pattern (str) – SMARTS pattern for atoms in ligand residues

  • prot_pattern (str) – SMARTS pattern for atoms in protein residues

  • distance (float) – Cutoff distance, measured between the first atom of each pattern

class prolif.interactions.base.DoubleAngle(lig_pattern, prot_pattern, distance, L1P2P1_angle, L2L1P2_angle, distance_atoms=('L1', 'P2'), metadata_mapping=None)[source]#

Bases: Interaction

Generic class for interactions using constraints on a distance and two angles.

Parameters:
  • lig_pattern (str) – SMARTS for [L1]-[L2]

  • prot_pattern (str) – SMARTS for [P1]-[P2]

  • distance (float) – Distance threshold

  • L1P2P1_angle (tuple) – Min and max values for the [L1]...[P2]-[P1] angle

  • L2L1P2_angle (tuple) – Min and max values for the [L2]-[L1]...[P2] angle

  • distance_atoms (tuple[str, str]) – Which atoms to use for the distance calculation: L1 or L2, and P1 or P2

  • metadata_mapping (dict, optional) – Mapping for names used in the metadata dict for the distance and angle variables

New in version 2.0.0.

class prolif.interactions.base.Interaction[source]#

Bases: object

Base class for interactions

All interaction classes must inherit this class and define a detect() method.

Changed in version 2.0.0: Changed the return type of interactions. Added some helper methods to easily update/derive interaction classes.

classmethod invert_role(name, docstring)[source]#

Creates a new interaction class where the role of the ligand and protein residues have been swapped. Usefull to create e.g. an acceptor class from a donor class.

metadata(lig_res, prot_res, lig_indices, prot_indices, **data)[source]#

Returns a dict containing the indices of atoms responsible for the interaction, alongside any other metrics (e.g. distance, angle…etc.).

class prolif.interactions.base.SingleAngle(lig_pattern, prot_pattern, distance, angle, distance_atom, metadata_mapping=None)[source]#

Bases: Interaction

Generic class for interactions using constraints on a distance and an angle.

Parameters:
  • lig_pattern (str) – SMARTS pattern for atoms in ligand residues.

  • prot_pattern (str) – SMARTS pattern for atoms in protein residues, should include (at least) two atoms, the first two atoms will be used for the angle calculation.

  • distance (float) – Distance threshold.

  • angle (tuple) – Min and max values for the [P1]-[P2]...[L1] angle, where L1 is the first atom in the lig_pattern SMARTS, and P1 and P2 are the first two atoms in the prot_pattern SMARTS.

  • distance_atom (str) – Which atom from the protein (P1 or P2) to use for the distance calculation with L1.

  • metadata_mapping (dict, optional) – Mapping for names used in the metadata dict for the distance and angle variables

New in version 2.0.0.

Generating an IFP in parallel — prolif.parallel#

This module provides two classes, TrajectoryPool and MolIterablePool to execute the analysis in parallel. These are used in the parallel implementation used in run() and run_from_iterable() respectively.

class prolif.parallel.MolIterablePool(n_processes, fingerprint, prot_mol, residues, tqdm_kwargs)[source]#

Process pool for a parallelized IFP analysis on an iterable of ligands. Must be used in a with statement.

Parameters:
  • n_processes (int) – Max number of processes

  • fingerprint (prolif.fingerprint.Fingerprint) – Fingerprint instance used to generate the IFP

  • prot_mol (prolif.molecule.Molecule) – Protein molecule

  • residues (list, optional) – List of protein residues considered for the IFP

  • tqdm_kwargs (dict) – Parameters for the tqdm progress bar

pool#

The underlying pool instance.

Type:

multiprocess.pool.Pool

classmethod executor(mol)[source]#

Classmethod executed by each child process on a single ligand molecule from the input iterable.

Returns:

ifp_data – A dictionary indexed by (ligand, protein) residue pairs, and each value is a sparse dictionary of metadata indexed by interaction name.

Return type:

prolif.ifp.IFP

classmethod initializer(fingerprint, prot_mol, residues)[source]#

Initializer classmethod passed to the pool so that each child process can access these objects without copying them.

process(args_iterable)[source]#

Maps the input iterable of molecules to the executor function.

Parameters:

args_iterable (Iterable[prolif.molecule.Molecule]) – An iterable yielding ligand molecules

Returns:

ifp – An iterable of IFP dictionaries.

Return type:

Iterable[prolif.ifp.IFP]

class prolif.parallel.Progress(killswitch, tracker, **kwargs)[source]#

Helper class to track the number of frames processed by the TrajectoryPool.

Parameters:
  • killswitch (threading.Event) – A threading.Event instance created and controlled by the TrajectoryPool to kill the thread updating the tqdm progress bar.

  • tracker (multiprocess.Value) – Value holding a ctypes.c_uint32 ctype updated by the TrajectoryPool, storing how many frames were processed since the last progress bar update.

  • **kwargs (object) – Used to create the tqdm progress bar.

delay#

Delay between progress bar updates. This requires locking access to the tracker object which the TrajectoryPool needs access to, so too small values might cause delays in the analysis.

Type:

float, default = 0.5

tqdm_pbar#

The progress bar displayed

Type:

tqdm.std.tqdm

close()[source]#

Cleanup after the TrajectoryPool computation is done

event_loop()[source]#

Event loop targeted by a separate thread

update()[source]#

Update the value displayed by the progress bar

class prolif.parallel.TrajectoryPool(n_processes, fingerprint, residues, tqdm_kwargs, rdkitconverter_kwargs)[source]#

Process pool for a parallelized IFP analysis on an MD trajectory. Must be used in a with statement.

Parameters:
  • n_processes (int) – Max number of processes

  • fingerprint (prolif.fingerprint.Fingerprint) – Fingerprint instance used to generate the IFP

  • residues (list, optional) – List of protein residues considered for the IFP

  • tqdm_kwargs (dict) – Parameters for the tqdm progress bar

  • rdkitconverter_kwargs (tuple[dict, dict]) – Parameters for the RDKitConverter from MDAnalysis: the first for the ligand, and the second for the protein

tracker#

Value holding a ctypes.c_uint32 ctype storing how many frames were processed since the last progress bar update.

Type:

multiprocess.Value

pool#

The underlying pool instance.

Type:

multiprocess.pool.Pool

classmethod executor(args)[source]#

Classmethod executed by each child process on a chunk of the trajectory

Returns:

ifp_chunk – A dictionary of IFP indexed by frame number

Return type:

dict[int, prolif.ifp.IFP]

classmethod initializer(tracker, fingerprint, residues, rdkitconverter_kwargs)[source]#

Initializer classmethod passed to the pool so that each child process can access these objects without copying them.

process(args_iterable)[source]#

Maps the input iterable of arguments to the executor function.

Parameters:

args_iterable (Iterable[tuple]) – Iterable of tuple of trajectory, ligand atomgroup, protein atomgroup, and array of frame indices.

Returns:

ifp – An iterable of dictionaries of IFP indexed by frame number.

Return type:

Iterable[dict[int, prolif.ifp.IFP]]