Interaction fingerprint

Calculate a Protein-Ligand Interaction Fingerprint — prolif.fingerprint

In [1]: import MDAnalysis as mda

In [2]: from rdkit.DataStructs import TanimotoSimilarity

In [3]: import prolif

In [4]: u = mda.Universe(prolif.datafiles.TOP, prolif.datafiles.TRAJ)

In [5]: prot = u.select_atoms("protein")

In [6]: lig = u.select_atoms("resname LIG")

In [7]: fp = prolif.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking", "CationPi",
   ...:                          "Cationic"])
   ...: 

In [8]: fp.run(u.trajectory[::10], lig, prot)
Out[8]: <prolif.fingerprint.Fingerprint: 5 interactions: ['HBDonor', 'HBAcceptor', 'Cationic', 'CationPi', 'PiStacking'] at 0x7f9bb6d8f7f0>

In [9]: df = fp.to_dataframe()

In [10]: df
Out[10]: 
ligand        LIG1.G                    ...                                 
protein     ASP129.A          THR134.A  ...   PHE330.B   PHE331.B   PHE351.B
interaction  HBDonor Cationic  HBDonor  ... PiStacking PiStacking PiStacking
Frame                                   ...                                 
0               True     True    False  ...      False       True      False
10              True     True    False  ...       True       True       True
20              True     True    False  ...      False       True      False
30              True     True    False  ...      False      False       True
40              True     True    False  ...      False       True       True
50              True     True    False  ...      False       True       True
60              True     True    False  ...      False       True      False
70              True     True    False  ...      False       True      False
80              True     True    False  ...       True       True      False
90              True     True    False  ...      False      False       True
100             True     True    False  ...      False      False      False
110             True     True    False  ...       True       True      False
120             True     True    False  ...       True       True       True
130             True     True     True  ...      False      False      False
140             True     True    False  ...      False      False      False
150             True     True    False  ...      False      False      False
160             True     True    False  ...      False       True      False
170             True     True    False  ...      False       True      False
180             True     True    False  ...      False       True      False
190             True     True    False  ...      False       True      False
200             True     True    False  ...       True      False       True
210             True     True    False  ...       True       True       True
220             True     True    False  ...      False       True      False
230             True     True    False  ...      False       True       True
240             True     True    False  ...       True       True       True

[25 rows x 9 columns]

In [11]: bv = fp.to_bitvectors()

In [12]: TanimotoSimilarity(bv[0], bv[1])
Out[12]: 0.7142857142857143

# save results
In [13]: fp.to_pickle("fingerprint.pkl")

# load
In [14]: fp = prolif.Fingerprint.from_pickle("fingerprint.pkl")

In [15]: df.equals(fp.to_dataframe())
Out[15]: True
class prolif.fingerprint.Fingerprint(interactions=['Hydrophobic', 'HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation'])[source]

Class that generates an interaction fingerprint between two molecules

While in most cases the fingerprint will be generated between a ligand and a protein, it is also possible to use this class for protein-protein interactions, or host-guest systems.

Parameters

interactions (list) – List of names (str) of interaction classes as found in the prolif.interactions module

interactions

Dictionnary of interaction functions indexed by class name. For more details, see prolif.interactions

Type

dict

n_interactions

Number of interaction functions registered by the fingerprint

Type

int

ifp

List of interactions fingerprints for the given trajectory.

Type

list, optionnal

:raises NameError : Unknown interaction in the interactions parameter:

Notes

You can use the fingerprint generator in multiple ways:

  • On a trajectory directly from MDAnalysis objects:

In [1]: prot = u.select_atoms("protein")

In [2]: lig = u.select_atoms("resname LIG")

In [3]: fp = prolif.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking",
   ...:                          "Hydrophobic"])
   ...: 

In [4]: fp.run(u.trajectory[:5], lig, prot)
Out[4]: <prolif.fingerprint.Fingerprint: 4 interactions: ['Hydrophobic', 'HBDonor', 'HBAcceptor', 'PiStacking'] at 0x7f9bb605a4f0>

In [5]: fp.to_dataframe()
Out[5]: 
ligand           LIG1.G              ...                        
protein         TYR38.A    TYR109.A  ...    THR355.B    TYR359.B
interaction Hydrophobic Hydrophobic  ... Hydrophobic Hydrophobic
Frame                                ...                        
0                 False        True  ...        True        True
1                 False       False  ...        True        True
2                 False        True  ...        True        True
3                  True       False  ...        True        True
4                  True        True  ...        True        True

[5 rows x 31 columns]
  • On two single structures (from RDKit or MDAnalysis):

In [6]: u.trajectory[0]  # use coordinates of the first frame
Out[6]: < Timestep 0 with unit cell dimensions [ 89.6909   87.95726 102.02041  90.       90.       90.     ] >

In [7]: prot = prolif.Molecule.from_mda(prot)

In [8]: lig = prolif.Molecule.from_mda(lig)

In [9]: ifp = fp.generate(lig, prot)

In [10]: ifp["Frame"] = 0

In [11]: prolif.to_dataframe([ifp], fp.interactions.keys())
Out[11]: 
ligand           LIG1.G              ...                        
protein        TYR109.A    TRP125.A  ...    THR355.B    TYR359.B
interaction Hydrophobic Hydrophobic  ... Hydrophobic Hydrophobic
Frame                                ...                        
0                  True        True  ...        True        True

[1 rows x 27 columns]
  • On a specific pair of residues for a specific interaction:

# ligand-protein
In [12]: fp.hbdonor(lig, prot["ASP129.A"])
Out[12]: True

# protein-protein (alpha helix)
In [13]: fp.hbacceptor(prot["ASP129.A"], prot["CYS133.A"])
Out[13]: True

You can also obtain the indices of atoms responsible for the interaction:

In [14]: fp.bitvector_atoms(lig, prot["ASP129.A"])
Out[14]: (array([ True,  True, False, False]), [8, 52, None, None], [7, 8, None, None])

In [15]: fp.hbdonor.__wrapped__(lig, prot["ASP129.A"])
Out[15]: (True, 52, 8)

Changed in version 1.0.0: Added pickle support

bitvector(res1, res2)[source]

Generates the complete bitvector for the interactions between two residues. To get the indices of atoms responsible for each interaction, see bitvector_atoms()

Parameters
Returns

bitvector – An array storing the encoded interactions between res1 and res2

Return type

numpy.ndarray

bitvector_atoms(res1, res2)[source]

Generates the complete bitvector for the interactions between two residues, and returns the indices of atoms responsible for these interactions

Parameters
Returns

  • bitvector (numpy.ndarray) – An array storing the encoded interactions between res1 and res2

  • lig_atoms (list) – A list containing indices for the ligand atoms responsible for each interaction

  • pro_atoms (list) – A list containing indices for the protein atoms responsible for each interaction

Changed in version 0.3.2: Atom indices are returned as two separate lists instead of a single list of tuples

classmethod from_pickle(path_or_bytes)[source]

Creates a fingerprint object from a pickle dump.

Parameters
  • path_or_bytes (str, pathlib.Path or bytes) – The path to the pickle file, or bytes corresponding to a pickle dump

  • seealso:: (.) – to_pickle() for creating the pickle dump.

  • versionadded: (.) – 1.0.0:

generate(lig, prot, residues=None, return_atoms=False)[source]

Generates the interaction fingerprint between 2 molecules

Parameters
  • lig (prolif.molecule.Molecule) – Molecule for the ligand

  • prot (prolif.molecule.Molecule) – Molecule for the protein

  • residues (list or "all" or None) – A list of protein residues (str, int or ResidueId) to take into account for the fingerprint extraction. If "all", all residues will be used. If None, at each frame the get_residues_near_ligand() function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue.

  • return_atoms (bool) – For each residue pair and interaction, return indices of atoms responsible for the interaction instead of bits.

Returns

ifp – A dictionnary indexed by (ligand, protein) residue pairs. The format for values will depend on return_atoms:

  • A single bitvector if return_atoms=False

  • A tuple of bitvector, ligand atom indices and protein atom indices if return_atoms=True

Return type

dict

Example

>>> u = mda.Universe("complex.pdb")
>>> lig = prolif.Molecule.from_mda(u, "resname LIG")
>>> prot = prolif.Molecule.from_mda(u, "protein")
>>> fp = prolif.Fingerprint()
>>> ifp = fp.generate(lig, prot)

New in version 0.3.2.

static list_available(show_hidden=False)[source]

List interactions available to the Fingerprint class.

Parameters

show_hidden (bool) – Show hidden classes (usually base classes whose name starts with an underscore _. Those are not supposed to be called directly)

run(traj, lig, prot, residues=None, progress=True, n_jobs=None)[source]

Generates the fingerprint on a trajectory for a ligand and a protein

Parameters

:raises ValueError : if n_jobs <= 0:

Returns

The Fingerprint instance that generated the fingerprint

Return type

prolif.fingerprint.Fingerprint

Example

>>> u = mda.Universe("top.pdb", "traj.nc")
>>> lig = u.select_atoms("resname LIG")
>>> prot = u.select_atoms("protein")
>>> fp = prolif.Fingerprint().run(u.trajectory[:10], lig, prot)

See also

Changed in version 0.3.2: Moved the return_atoms parameter from the run method to the dataframe conversion code

Changed in version 1.0.0: Added support for multiprocessing

run_from_iterable(lig_iterable, prot_mol, residues=None, progress=True, n_jobs=None)[source]

Generates the fingerprint between a list of ligands and a protein

Parameters
  • lig_iterable (list or generator) – An iterable yielding ligands as Molecule objects

  • prot_mol (prolif.molecule.Molecule) – The protein

  • residues (list or "all" or None) – A list of protein residues (str, int or ResidueId) to take into account for the fingerprint extraction. If "all", all residues will be used. If None, at each frame the get_residues_near_ligand() function is used to automatically use protein residues that are distant of 6.0 Å or less from each ligand residue.

  • progress (bool) –

    Use the tqdm package to display a progressbar while running the calculation

  • n_jobs (int or None) – Number of processes to run in parallel. If n_jobs=None, the analysis will use all available CPU threads, while if n_jobs=1, the analysis will run in serial.

:raises ValueError : if n_jobs <= 0:

Returns

The Fingerprint instance that generated the fingerprint

Return type

prolif.fingerprint.Fingerprint

Example

>>> prot = mda.Universe("protein.pdb")
>>> prot = prolif.Molecule.from_mda(prot)
>>> lig_iter = prolif.mol2_supplier("docking_output.mol2")
>>> fp = prolif.Fingerprint()
>>> fp.run_from_iterable(lig_iter, prot)

See also

Fingerprint.generate() to generate the fingerprint between two single structures

Changed in version 0.3.2: Moved the return_atoms parameter from the run_from_iterable method to the dataframe conversion code

Changed in version 1.0.0: Added support for multiprocessing

to_bitvectors()[source]

Converts fingerprints to a list of RDKit ExplicitBitVector

Returns

bvs – A list of ExplicitBitVect for each frame

Return type

list

Raises

AttributeError – If the run() method hasn’t been used

Example

>>> from rdkit.DataStructs import TanimotoSimilarity
>>> bv = fp.to_bitvectors()
>>> TanimotoSimilarity(bv[0], bv[1])
0.42
to_dataframe(**kwargs)[source]

Converts fingerprints to a pandas DataFrame

Parameters
  • dtype (object or None) – Cast the input of each bit in the bitvector to this type. If None, keep the data as is

  • drop_empty (bool) – Drop columns with only empty values

  • return_atoms (bool) – For each residue pair and interaction, return indices of atoms responsible for the interaction instead of bits

Returns

df – A DataFrame storing the results frame by frame and residue by residue. See to_dataframe() for more information

Return type

pandas.DataFrame

Raises

AttributeError – If the run() method hasn’t been used

Example

>>> df = fp.to_dataframe(dtype=np.uint8)
>>> print(df)
ligand             LIG1.G
protein             ILE59                  ILE55       TYR93
interaction   Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking
Frame
0                       0          1           0           0          0
...
to_pickle(path=None)[source]

Dumps the fingerprint object as a pickle.

Parameters

path (str, pathlib.Path or None) – Output path. If None, the method returns the pickle as bytes.

Returns

objNone if path is set, else the bytes corresponding to the pickle

Return type

None or bytes

Example

>>> dump = fp.to_pickle()
>>> saved_fp = Fingerprint.from_pickle(dump)
>>> fp.to_pickle("data/fp.pkl")
>>> saved_fp = Fingerprint.from_pickle("data/fp.pkl")

See also

from_pickle() for loading the pickle dump.

New in version 1.0.0.

Detecting interactions between residues — prolif.interactions

You can declare your own interaction class like this:

from scipy.spatial import distance_matrix

class CloseContact(prolif.Interaction):
    def detect(self, res1, res2, threshold=2.0):
        dist_matrix = distance_matrix(res1.xyz, res2.xyz)
        if (dist_matrix <= threshold).any():
            # return format: bool, ligand atom index, protein atom index
            return True, None, None
        return False, None, None

Warning

Your custom class must inherit from prolif.interactions.Interaction

The new “CloseContact” class is then automatically added to the list of interactions available to the fingerprint generator:

>>> u = mda.Universe(prolif.datafiles.TOP, prolif.datafiles.TRAJ)
>>> prot = u.select_atoms("protein")
>>> lig = u.select_atoms("resname LIG")
>>> fp = prolif.Fingerprint(interactions="all")
>>> df = fp.run(u.trajectory[0:1], lig, prot).to_dataframe()
>>> df.xs("CloseContact", level=1, axis=1)
   ASP129.A  VAL201.A
0         1         1
>>> lmol = prolif.Molecule.from_mda(lig)
>>> pmol = prolif.Molecule.from_mda(prot)
>>> fp.closecontact(lmol, pmol["ASP129.A"])
True
class prolif.interactions.Anionic(cation='[+{1-}]', anion='[-{1-}]', distance=4.5)[source]

Bases: prolif.interactions._BaseIonic

Ionic interaction between a ligand (anion) and a residue (cation)

class prolif.interactions.CationPi(cation='[+{1-}]', pi_ring=('a1:a:a:a:a:a:1', 'a1:a:a:a:a:1'), distance=4.5, angles=(0, 30))[source]

Bases: prolif.interactions._BaseCationPi

Cation-Pi interaction between a ligand (cation) and a residue (aromatic ring)

class prolif.interactions.Cationic(cation='[+{1-}]', anion='[-{1-}]', distance=4.5)[source]

Bases: prolif.interactions._BaseIonic

Ionic interaction between a ligand (cation) and a residue (anion)

class prolif.interactions.EdgeToFace(centroid_distance=6.0, plane_angles=(50, 90), **kwargs)[source]

Bases: prolif.interactions.PiStacking

Edge-to-face Pi-Stacking interaction between a ligand and a residue

class prolif.interactions.FaceToFace(centroid_distance=4.5, plane_angles=(0, 40), **kwargs)[source]

Bases: prolif.interactions.PiStacking

Face-to-face Pi-Stacking interaction between a ligand and a residue

class prolif.interactions.HBAcceptor(donor='[#7,#8,#16][H]', acceptor='[N,O,F,-{1-};!+{1-}]', distance=3.5, angles=(130, 180))[source]

Bases: prolif.interactions._BaseHBond

Hbond interaction between a ligand (acceptor) and a residue (donor)

class prolif.interactions.HBDonor(donor='[#7,#8,#16][H]', acceptor='[N,O,F,-{1-};!+{1-}]', distance=3.5, angles=(130, 180))[source]

Bases: prolif.interactions._BaseHBond

Hbond interaction between a ligand (donor) and a residue (acceptor)

class prolif.interactions.Hydrophobic(hydrophobic='[#6,#16,F,Cl,Br,I,At;+0]', distance=4.5)[source]

Bases: prolif.interactions._Distance

Hydrophobic interaction

Parameters
  • hydrophobic (str) – SMARTS query for hydrophobic atoms

  • distance (float) – Cutoff distance for the interaction

class prolif.interactions.Interaction[source]

Bases: abc.ABC

Abstract class for interactions

All interaction classes must inherit this class and define a detect() method

class prolif.interactions.MetalAcceptor(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,N,-{1-};!+{1-}]', distance=2.8)[source]

Bases: prolif.interactions._BaseMetallic

Metallic interaction between a ligand (chelated) and a metal residue

class prolif.interactions.MetalDonor(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,N,-{1-};!+{1-}]', distance=2.8)[source]

Bases: prolif.interactions._BaseMetallic

Metallic interaction between a metal and a residue (chelated)

class prolif.interactions.PiCation(cation='[+{1-}]', pi_ring=('a1:a:a:a:a:a:1', 'a1:a:a:a:a:1'), distance=4.5, angles=(0, 30))[source]

Bases: prolif.interactions._BaseCationPi

Cation-Pi interaction between a ligand (aromatic ring) and a residue (cation)

class prolif.interactions.PiStacking(centroid_distance=6.0, shortest_distance=3.8, plane_angles=(0, 90), pi_ring=('a1:a:a:a:a:a:1', 'a1:a:a:a:a:1'))[source]

Bases: prolif.interactions.Interaction

Pi-Stacking interaction between a ligand and a residue

Parameters
  • centroid_distance (float) – Cutoff distance between each rings centroid

  • shortest_distance (float) – Shortest distance allowed between the closest atoms of both rings

  • plane_angles (tuple) – Min and max values for the angle between the ring planes

  • pi_ring (list) – List of SMARTS for aromatic rings

class prolif.interactions.VdWContact(tolerance=0.5)[source]

Bases: prolif.interactions.Interaction

Interaction based on the van der Waals radii of interacting atoms.

Parameters

tolerance (float) – Tolerance added to the sum of vdW radii of atoms before comparing to the interatomic distance. If distance <= sum_vdw + tolerance the atoms are identified as a contact

:raises ValueError : tolerance parameter cannot be negative:

class prolif.interactions.XBAcceptor(donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}][*]', distance=3.5, axd_angles=(130, 180), xar_angles=(80, 140))[source]

Bases: prolif.interactions._BaseXBond

Halogen bonding between a ligand (acceptor) and a residue (donor)

class prolif.interactions.XBDonor(donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}][*]', distance=3.5, axd_angles=(130, 180), xar_angles=(80, 140))[source]

Bases: prolif.interactions._BaseXBond

Halogen bonding between a ligand (donor) and a residue (acceptor)

class prolif.interactions._BaseCationPi(cation='[+{1-}]', pi_ring=('a1:a:a:a:a:a:1', 'a1:a:a:a:a:1'), distance=4.5, angles=(0, 30))[source]

Bases: prolif.interactions.Interaction

Base class for cation-pi interactions

Parameters
  • cation (str) – SMARTS for cation

  • pi_ring (tuple) – SMARTS for aromatic rings (5 and 6 membered rings only)

  • distance (float) – Cutoff distance between the centroid and the cation

  • angles (tuple) – Min and max values for the angle between the vector normal to the ring plane and the vector going from the centroid to the cation

class prolif.interactions._BaseHBond(donor='[#7,#8,#16][H]', acceptor='[N,O,F,-{1-};!+{1-}]', distance=3.5, angles=(130, 180))[source]

Bases: prolif.interactions.Interaction

Base class for Hydrogen bond interactions

Parameters
  • donor (str) – SMARTS for [Donor]-[Hydrogen]

  • acceptor (str) – SMARTS for [Acceptor]

  • distance (float) – Cutoff distance between the donor and acceptor atoms

  • angles (tuple) – Min and max values for the [Donor]-[Hydrogen]...[Acceptor] angle

class prolif.interactions._BaseIonic(cation='[+{1-}]', anion='[-{1-}]', distance=4.5)[source]

Bases: prolif.interactions._Distance

Base class for ionic interactions

class prolif.interactions._BaseMetallic(metal='[Ca,Cd,Co,Cu,Fe,Mg,Mn,Ni,Zn]', ligand='[O,N,-{1-};!+{1-}]', distance=2.8)[source]

Bases: prolif.interactions._Distance

Base class for metal complexation

Parameters
  • metal (str) – SMARTS for a transition metal

  • ligand (str) – SMARTS for a ligand

  • distance (float) – Cutoff distance

class prolif.interactions._BaseXBond(donor='[#6,#7,Si,F,Cl,Br,I]-[Cl,Br,I,At]', acceptor='[#7,#8,P,S,Se,Te,a;!+{1-}][*]', distance=3.5, axd_angles=(130, 180), xar_angles=(80, 140))[source]

Bases: prolif.interactions.Interaction

Base class for Halogen bond interactions

Parameters
  • donor (str) – SMARTS for [Donor]-[Halogen]

  • acceptor (str) – SMARTS for [Acceptor]-[R]

  • distance (float) – Cutoff distance between the halogen and acceptor atoms

  • axd_angles (tuple) – Min and max values for the [Acceptor]...[Halogen]-[Donor] angle

  • xar_angles (tuple) – Min and max values for the [R]-[Acceptor]...[Halogen] angle

Notes

Distance and angle adapted from Auffinger et al. PNAS 2004

class prolif.interactions._Distance(lig_pattern, prot_pattern, distance)[source]

Bases: prolif.interactions.Interaction

Generic class for distance-based interactions

Parameters
  • lig_pattern (str) – SMARTS pattern for atoms in ligand residues

  • prot_pattern (str) – SMARTS pattern for atoms in protein residues

  • distance (float) – Cutoff distance, measured between the first atom of each pattern

class prolif.interactions._InteractionMeta(name, bases, namespace, **kwargs)[source]

Bases: abc.ABCMeta

Metaclass to register interactions automatically

prolif.interactions.get_mapindex(res, index)[source]

Get the index of the atom in the original molecule

Parameters
Returns

mapindex – The index of the atom in the Molecule

Return type

int