Advanced usage#
Prerequisite#
We’ll setup the protein and ligand files for the different usecases here:
import MDAnalysis as mda
import prolif as plf
# load protein
protein_file = str(plf.datafiles.datapath / "vina" / "rec.pdb")
u = mda.Universe(protein_file)
protein_mol = plf.Molecule.from_mda(u)
# load docking poses
poses_path = str(plf.datafiles.datapath / "vina" / "vina_output.sdf")
pose_iterable = plf.sdf_supplier(poses_path)
# load 1 ligand from poses
ligand_mol = pose_iterable[0]
Interactions#
This section shows how to modify existing interactions or defined new ones.
Modifying interaction parameters#
In ProLIF, each interaction is defined as a Python class. You can get a list of all the available interactions (including base abstract classes) with the following code snippet:
plf.Fingerprint.list_available(show_hidden=True)
['BasePiStacking',
'Distance',
'DoubleAngle',
'SingleAngle',
'Anionic',
'CationPi',
'Cationic',
'EdgeToFace',
'FaceToFace',
'HBAcceptor',
'HBDonor',
'Hydrophobic',
'MetalAcceptor',
'MetalDonor',
'PiCation',
'PiStacking',
'VdWContact',
'XBAcceptor',
'XBDonor']
When you create an interaction fingerprint with prolif.Fingerprint
, those classes are
created with default parameters and attached to the new fingerprint object as methods:
fp = plf.Fingerprint(["Hydrophobic", "HBDonor", "HBAcceptor"])
fp.hydrophobic
<prolif.interactions.interactions.Hydrophobic at 0x7f5ab8040b90>
These method will yield all occurences for that specific interaction between 2 residues (1 for the ligand and 1 for the protein).
In this example, we’ll reparametrize the hydrophobic interaction with a shorter distance and see how this affects the number of occurences of an interaction for a given pair of residues.
Note
To know which parameters are available for interactions, see the
prolif.interactions
module.
Let’s start with a test case. With the default parameters, TYR109 is interacting with our ligand in 4 different occasions:
fp = plf.Fingerprint()
hydrophobic_interactions_tyr109 = list(
fp.hydrophobic(ligand_mol[0], protein_mol["TYR109.A"])
)
len(hydrophobic_interactions_tyr109)
4
Now we’ll simply change the distance threshold to 4.0
instead of the default 4.5
. To do this we simply provide the parameters
argument with a dictionnary mapping the name of the interaction to reconfigure with updated parameters.
This modification will affect all the interaction analysis code run by the fingerprint object, i.e. when using
run
or run_from_iterable
.
fp = plf.Fingerprint(parameters={"Hydrophobic": {"distance": 4.0}})
hydrophobic_interactions_tyr109 = list(
fp.hydrophobic(ligand_mol[0], protein_mol["TYR109.A"])
)
len(hydrophobic_interactions_tyr109)
0
As you can see, by reducing the distance threshold the interaction between our ligand and TYR109 is now ignored.
Repurposing an existing interaction#
In case you want to reuse an existing class for a different type of interaction, the
easiest way is to define an interaction class that inherits one of the classes listed
in the prolif.interactions
module, and just update its __init__
method with the
appropriate parameters.
There are some generic interactions, like the Distance
class, if you just need to define two chemical moieties within a certain distance. Both
the Hydrophobic, Ionic, and Metallic interactions inherit from this class!
In most cases, defining an interaction only based on a distance is not enough and
requires one or two angles constraints as well. For this purpose, the
SingleAngle
and DoubleAngle
interactions can be used.
Here we’ll define a C-H...O
HBond interaction by reusing the existing HBond classes:
class CHOAcceptor(plf.interactions.HBAcceptor):
def __init__(
self,
acceptor="O",
donor="[#6]-[H]",
distance=3.5,
DHA_angle=(90, 180),
):
super().__init__(
acceptor=acceptor, donor=donor, distance=distance, DHA_angle=DHA_angle
)
# create inverse interaction as well
CHODonor = CHOAcceptor.invert_role(
"CHODonor",
"C-H...O Hbond interaction between a ligand (donor) and a residue (acceptor)",
)
# calculate both classical and weak hbonds
fp = plf.Fingerprint(["HBAcceptor", "CHOAcceptor", "HBDonor", "CHODonor"])
fp.run_from_iterable(pose_iterable, protein_mol)
# show dataframe
df = fp.to_dataframe()
df
ligand | UNL1 | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
protein | TYR38.A | TYR109.A | ASP123.A | LEU126.A | ASP129.A | PRO184.A | ARG188.A | ... | SER334.B | MET337.B | PHE346.B | HSE347.B | LEU348.B | PHE351.B | ASP352.B | THR355.B | TYR359.B | ||||
interaction | HBAcceptor | CHODonor | CHODonor | CHODonor | CHOAcceptor | CHODonor | HBDonor | CHODonor | CHOAcceptor | CHODonor | ... | CHODonor | CHOAcceptor | CHODonor | CHODonor | CHODonor | CHOAcceptor | CHODonor | CHOAcceptor | CHODonor | CHODonor |
Frame | |||||||||||||||||||||
0 | False | False | True | False | False | False | True | True | False | False | ... | False | False | False | False | False | False | True | False | False | False |
1 | False | False | False | False | False | True | False | True | False | False | ... | False | False | False | False | False | False | True | False | True | False |
2 | False | False | False | False | False | True | False | True | False | False | ... | False | False | False | False | False | False | True | False | True | False |
3 | False | False | False | False | False | False | False | True | False | False | ... | False | False | False | False | False | False | True | False | False | False |
4 | True | False | False | False | False | False | False | False | False | False | ... | False | True | True | False | True | True | False | False | False | False |
5 | False | False | False | False | False | False | False | True | False | False | ... | False | False | False | False | False | False | True | False | True | False |
6 | False | True | False | False | False | False | False | False | False | False | ... | False | False | False | False | False | False | False | True | False | True |
7 | False | False | False | False | False | False | False | True | False | False | ... | True | False | False | True | True | False | False | False | False | False |
8 | False | False | False | True | True | True | False | True | True | True | ... | False | False | False | False | False | False | False | False | False | False |
9 rows × 29 columns
We can also display these new interactions:
from prolif.plotting.complex3d import Complex3D
# assign colors for the new interactions on the 3D plot
Complex3D.COLORS.update(
{
"CHODonor": "red",
"CHOAcceptor": "blue",
}
)
# show specific docking pose
pose_index = 4
fp.plot_3d(pose_iterable[pose_index], protein_mol, frame=pose_index)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<py3Dmol.view at 0x7f5af1064390>
Defining a custom interaction#
Important
Before you dive into this section, make sure that there isn’t already an interaction that could just be repurposed to do what you want!
For this, the best is to check the prolif.interactions
module.
With that being said, there are a few rules that you must respect when writing your own interaction:
Inherit the ProLIF Interaction class#
The Interaction
class is the base class that provides
some functionalities to automatically register the interactions in Fingerprint
objects.
Naming convention#
For non-symmetrical interactions, like hydrogen bonds or salt-bridges, the convention used here is to name the class after the role of the ligand.
For example, the class HBDonor
detects if a ligand acts as a hydrogen bond donor, and
the class Cationic
detects if a ligand acts as a cation.
Define a detect
method#
This method takes exactly two positional arguments: a ligand Residue and a protein Residue (in this order).
Return value for the detect
method#
You must yield a dictionary containing some basic metadata about the interaction when it
is detected. To help with this process, the metadata
method should be used (see
example below) for which the arguments are listed here:
the input residues (
lig_res
andprot_res
arguments, of typerdkit.Chem.Mol
),the indices of atoms responsible for the interaction, (
lig_indices
andprot_indices
arguments, of typetuple[int, ...]
),any other relevant metric (distances or angles), named as you want. Distances should be in angstroms, and preferably named
distance
, and angles should be in degrees.
Note
You don’t have to return anything if no interaction is detected for a pair of residues.
Here’s an example implementing a close-contact interaction using numpy:
import numpy as np
from scipy.spatial import distance_matrix
class CloseContact(plf.interactions.Interaction):
def __init__(self, contact_threshold=2.0):
self.contact_threshold = contact_threshold
def detect(self, ligand_residue, protein_residue):
# distance matrix between atoms of both residues
dist_matrix = distance_matrix(ligand_residue.xyz, protein_residue.xyz)
below_threshold = dist_matrix < self.contact_threshold
for ligand_indices in np.argwhere(below_threshold.any(axis=1)):
ligand_index = int(ligand_indices[0])
for protein_indices in np.argwhere(below_threshold[ligand_index]):
protein_index = int(protein_indices[0])
# yield dict with metadata on the interaction
# required arguments: input residues, and tuple of indices of atoms
# responsible for the interaction
# optional arguments: any additional `key=value` pair (e.g. distance)
yield self.metadata(
lig_res=ligand_residue,
prot_res=protein_residue,
lig_indices=(ligand_index,),
prot_indices=(protein_index,),
distance=dist_matrix[ligand_index, protein_index],
)
# run analysis
fp = plf.Fingerprint(["CloseContact"])
fp.run_from_iterable([ligand_mol], protein_mol)
# show results
df = fp.to_dataframe()
df
ligand | UNL1 | |||||
---|---|---|---|---|---|---|
protein | VAL200.A | VAL201.A | THR209.A | THR213.A | PHE330.B | PHE351.B |
interaction | CloseContact | CloseContact | CloseContact | CloseContact | CloseContact | CloseContact |
Frame | ||||||
0 | True | True | True | True | True | True |
# assign colors for the new interactions on the 3D plot
Complex3D.COLORS.update(
{
"CloseContact": "brown",
}
)
# display
fp.plot_3d(ligand_mol, protein_mol, frame=0)
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
<py3Dmol.view at 0x7f5ab5edbc50>
Fingerprint generation#
This section contains information about modifying some aspects of the fingerprint generation.
Ignoring the protein backbone#
In some cases, you might want to dismiss backbone interactions. You cannot simply remove the backbone from your protein input file(s), as it will either result in charges added on the side-chains’ end, or you would need to add dummy atoms at the end, but these could also result in artifacts during the interaction detection.
One workaround is to use a substructure search (SMARTS) to delete the backbone atoms after the structures has been parsed by MDAnalysis.
from rdkit import Chem
from rdkit.Chem.AllChem import DeleteSubstructs
# SMARTS for backbone
backbone_smarts = "[C^2](=O)-[C;X4](-[H])-[N;+0]-[H]"
backbone_query = Chem.MolFromSmarts(backbone_smarts)
def make_mol_and_strip_backbone(atomgroup, **kwargs):
mol = atomgroup.convert_to.rdkit(**kwargs)
mol = DeleteSubstructs(mol, backbone_query)
return plf.Molecule(mol)
# patch the `from_mda` method with our modified version
plf.Molecule.from_mda = make_mol_and_strip_backbone
You can then prepare your system and run the analysis as you normally would.