Advanced usage#

Prerequisite#

We’ll setup the protein and ligand files for the different usecases here:

import MDAnalysis as mda
import prolif as plf

# load protein
protein_file = str(plf.datafiles.datapath / "vina" / "rec.pdb")
u = mda.Universe(protein_file)
protein_mol = plf.Molecule.from_mda(u)

# load docking poses
poses_path = str(plf.datafiles.datapath / "vina" / "vina_output.sdf")
pose_iterable = plf.sdf_supplier(poses_path)

# load 1 ligand from poses
ligand_mol = pose_iterable[0]

Interactions#

This section shows how to modify existing interactions or defined new ones.

Modifying interaction parameters#

In ProLIF, each interaction is defined as a Python class. You can get a list of all the available interactions (including base abstract classes) with the following code snippet:

plf.Fingerprint.list_available(show_hidden=True)
['BasePiStacking',
 'Distance',
 'DoubleAngle',
 'SingleAngle',
 'Anionic',
 'CationPi',
 'Cationic',
 'EdgeToFace',
 'FaceToFace',
 'HBAcceptor',
 'HBDonor',
 'Hydrophobic',
 'MetalAcceptor',
 'MetalDonor',
 'PiCation',
 'PiStacking',
 'VdWContact',
 'XBAcceptor',
 'XBDonor']

When you create an interaction fingerprint with prolif.Fingerprint, those classes are created with default parameters and attached to the new fingerprint object as methods:

fp = plf.Fingerprint(["Hydrophobic", "HBDonor", "HBAcceptor"])
fp.hydrophobic
<prolif.interactions.interactions.Hydrophobic at 0x7f5ab8040b90>

These method will yield all occurences for that specific interaction between 2 residues (1 for the ligand and 1 for the protein).

In this example, we’ll reparametrize the hydrophobic interaction with a shorter distance and see how this affects the number of occurences of an interaction for a given pair of residues.

Note

To know which parameters are available for interactions, see the prolif.interactions module.

Let’s start with a test case. With the default parameters, TYR109 is interacting with our ligand in 4 different occasions:

fp = plf.Fingerprint()
hydrophobic_interactions_tyr109 = list(
    fp.hydrophobic(ligand_mol[0], protein_mol["TYR109.A"])
)
len(hydrophobic_interactions_tyr109)
4

Now we’ll simply change the distance threshold to 4.0 instead of the default 4.5. To do this we simply provide the parameters argument with a dictionnary mapping the name of the interaction to reconfigure with updated parameters.

This modification will affect all the interaction analysis code run by the fingerprint object, i.e. when using run or run_from_iterable.

fp = plf.Fingerprint(parameters={"Hydrophobic": {"distance": 4.0}})
hydrophobic_interactions_tyr109 = list(
    fp.hydrophobic(ligand_mol[0], protein_mol["TYR109.A"])
)
len(hydrophobic_interactions_tyr109)
0

As you can see, by reducing the distance threshold the interaction between our ligand and TYR109 is now ignored.

Repurposing an existing interaction#

In case you want to reuse an existing class for a different type of interaction, the easiest way is to define an interaction class that inherits one of the classes listed in the prolif.interactions module, and just update its __init__ method with the appropriate parameters.

There are some generic interactions, like the Distance class, if you just need to define two chemical moieties within a certain distance. Both the Hydrophobic, Ionic, and Metallic interactions inherit from this class!

In most cases, defining an interaction only based on a distance is not enough and requires one or two angles constraints as well. For this purpose, the SingleAngle and DoubleAngle interactions can be used.

Here we’ll define a C-H...O HBond interaction by reusing the existing HBond classes:

class CHOAcceptor(plf.interactions.HBAcceptor):
    def __init__(
        self,
        acceptor="O",
        donor="[#6]-[H]",
        distance=3.5,
        DHA_angle=(90, 180),
    ):
        super().__init__(
            acceptor=acceptor, donor=donor, distance=distance, DHA_angle=DHA_angle
        )


# create inverse interaction as well
CHODonor = CHOAcceptor.invert_role(
    "CHODonor",
    "C-H...O Hbond interaction between a ligand (donor) and a residue (acceptor)",
)

# calculate both classical and weak hbonds
fp = plf.Fingerprint(["HBAcceptor", "CHOAcceptor", "HBDonor", "CHODonor"])
fp.run_from_iterable(pose_iterable, protein_mol)
# show dataframe
df = fp.to_dataframe()
df
ligand UNL1
protein TYR38.A TYR109.A ASP123.A LEU126.A ASP129.A PRO184.A ARG188.A ... SER334.B MET337.B PHE346.B HSE347.B LEU348.B PHE351.B ASP352.B THR355.B TYR359.B
interaction HBAcceptor CHODonor CHODonor CHODonor CHOAcceptor CHODonor HBDonor CHODonor CHOAcceptor CHODonor ... CHODonor CHOAcceptor CHODonor CHODonor CHODonor CHOAcceptor CHODonor CHOAcceptor CHODonor CHODonor
Frame
0 False False True False False False True True False False ... False False False False False False True False False False
1 False False False False False True False True False False ... False False False False False False True False True False
2 False False False False False True False True False False ... False False False False False False True False True False
3 False False False False False False False True False False ... False False False False False False True False False False
4 True False False False False False False False False False ... False True True False True True False False False False
5 False False False False False False False True False False ... False False False False False False True False True False
6 False True False False False False False False False False ... False False False False False False False True False True
7 False False False False False False False True False False ... True False False True True False False False False False
8 False False False True True True False True True True ... False False False False False False False False False False

9 rows × 29 columns

We can also display these new interactions:

from prolif.plotting.complex3d import Complex3D

# assign colors for the new interactions on the 3D plot
Complex3D.COLORS.update(
    {
        "CHODonor": "red",
        "CHOAcceptor": "blue",
    }
)
# show specific docking pose
pose_index = 4
fp.plot_3d(pose_iterable[pose_index], protein_mol, frame=pose_index)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<py3Dmol.view at 0x7f5af1064390>

Defining a custom interaction#

Important

Before you dive into this section, make sure that there isn’t already an interaction that could just be repurposed to do what you want!

For this, the best is to check the prolif.interactions module.

With that being said, there are a few rules that you must respect when writing your own interaction:

Inherit the ProLIF Interaction class#

The Interaction class is the base class that provides some functionalities to automatically register the interactions in Fingerprint objects.

Naming convention#

For non-symmetrical interactions, like hydrogen bonds or salt-bridges, the convention used here is to name the class after the role of the ligand.

For example, the class HBDonor detects if a ligand acts as a hydrogen bond donor, and the class Cationic detects if a ligand acts as a cation.

Define a detect method#

This method takes exactly two positional arguments: a ligand Residue and a protein Residue (in this order).

Return value for the detect method#

You must yield a dictionary containing some basic metadata about the interaction when it is detected. To help with this process, the metadata method should be used (see example below) for which the arguments are listed here:

  • the input residues (lig_res and prot_res arguments, of type rdkit.Chem.Mol),

  • the indices of atoms responsible for the interaction, (lig_indices and prot_indices arguments, of type tuple[int, ...]),

  • any other relevant metric (distances or angles), named as you want. Distances should be in angstroms, and preferably named distance, and angles should be in degrees.

Note

You don’t have to return anything if no interaction is detected for a pair of residues.

Here’s an example implementing a close-contact interaction using numpy:

import numpy as np
from scipy.spatial import distance_matrix


class CloseContact(plf.interactions.Interaction):
    def __init__(self, contact_threshold=2.0):
        self.contact_threshold = contact_threshold

    def detect(self, ligand_residue, protein_residue):
        # distance matrix between atoms of both residues
        dist_matrix = distance_matrix(ligand_residue.xyz, protein_residue.xyz)
        below_threshold = dist_matrix < self.contact_threshold
        for ligand_indices in np.argwhere(below_threshold.any(axis=1)):
            ligand_index = int(ligand_indices[0])
            for protein_indices in np.argwhere(below_threshold[ligand_index]):
                protein_index = int(protein_indices[0])
                # yield dict with metadata on the interaction
                # required arguments: input residues, and tuple of indices of atoms
                #                     responsible for the interaction
                # optional arguments: any additional `key=value` pair (e.g. distance)
                yield self.metadata(
                    lig_res=ligand_residue,
                    prot_res=protein_residue,
                    lig_indices=(ligand_index,),
                    prot_indices=(protein_index,),
                    distance=dist_matrix[ligand_index, protein_index],
                )


# run analysis
fp = plf.Fingerprint(["CloseContact"])
fp.run_from_iterable([ligand_mol], protein_mol)

# show results
df = fp.to_dataframe()
df
ligand UNL1
protein VAL200.A VAL201.A THR209.A THR213.A PHE330.B PHE351.B
interaction CloseContact CloseContact CloseContact CloseContact CloseContact CloseContact
Frame
0 True True True True True True
# assign colors for the new interactions on the 3D plot
Complex3D.COLORS.update(
    {
        "CloseContact": "brown",
    }
)
# display
fp.plot_3d(ligand_mol, protein_mol, frame=0)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<py3Dmol.view at 0x7f5ab5edbc50>

Fingerprint generation#

This section contains information about modifying some aspects of the fingerprint generation.

Ignoring the protein backbone#

In some cases, you might want to dismiss backbone interactions. You cannot simply remove the backbone from your protein input file(s), as it will either result in charges added on the side-chains’ end, or you would need to add dummy atoms at the end, but these could also result in artifacts during the interaction detection.

One workaround is to use a substructure search (SMARTS) to delete the backbone atoms after the structures has been parsed by MDAnalysis.

from rdkit import Chem
from rdkit.Chem.AllChem import DeleteSubstructs

# SMARTS for backbone
backbone_smarts = "[C^2](=O)-[C;X4](-[H])-[N;+0]-[H]"
backbone_query = Chem.MolFromSmarts(backbone_smarts)


def make_mol_and_strip_backbone(atomgroup, **kwargs):
    mol = atomgroup.convert_to.rdkit(**kwargs)
    mol = DeleteSubstructs(mol, backbone_query)
    return plf.Molecule(mol)


# patch the `from_mda` method with our modified version
plf.Molecule.from_mda = make_mol_and_strip_backbone

You can then prepare your system and run the analysis as you normally would.