3. Protein-protein interactions

This notebooks shows how to compute a fingerprint for protein-protein interactions.

Here we will investigate the interactions in a G-protein coupled receptor (GPCR) between a particular helix (called TM3) and the rest of the protein.

This can obviously be applied to proteins that don’t belong to the same chain/segment, as long as you can figure out an appropriate MDAnalysis selection

There is also an example at the end of this tutorial for generating an IFP of PPI without considering the backbone.

[1]:
import MDAnalysis as mda
import prolif as plf
[2]:
# load traj
u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)
tm3 = u.select_atoms("resid 119:152")
prot = u.select_atoms("protein and not group tm3", tm3=tm3)
tm3, prot
[2]:
(<AtomGroup with 531 atoms>, <AtomGroup with 4457 atoms>)
[3]:
# prot-prot interactions
fp = plf.Fingerprint(["HBDonor", "HBAcceptor", "PiStacking", "PiCation", "CationPi", "Anionic", "Cationic"])
fp.run(u.trajectory[::10], tm3, prot)
[3]:
<prolif.fingerprint.Fingerprint: 7 interactions: ['HBDonor', 'HBAcceptor', 'Cationic', 'Anionic', 'PiCation', 'CationPi', 'PiStacking'] at 0x7f79207b0ca0>
[4]:
df = fp.to_dataframe()
df.head()
[4]:
ligand GLN119.A ... ALA150.A ILE151.A THR152.A
protein GLN189.A ALA190.A ALA192.A GLU194.A VAL196.A SER197.A ... ALA154.A ARG238.A ALA154.A ARG238.A ASN245.A ASP153.A GLU234.A ARG238.A LYS241.A
interaction HBAcceptor HBDonor HBDonor HBAcceptor HBDonor HBAcceptor HBDonor HBAcceptor HBDonor HBAcceptor ... HBAcceptor HBAcceptor HBAcceptor HBAcceptor HBAcceptor HBDonor HBAcceptor HBDonor HBAcceptor HBAcceptor
Frame
0 False False True False False True False False False False ... False False False False False False False False False False
10 False False False False False False False False False False ... False False False True False False False False False False
20 False False False False False False False False False True ... False False False True False False True False True False
30 False False False False False False False False False False ... False False False False False False False False False False
40 False False False False True False False True False False ... False False True True False False False False False False

5 rows × 55 columns

[5]:
# show interactions for a specific ligand residue
df.xs("ARG147.A", level="ligand", axis=1).head(5)
[5]:
protein ALA84.A GLU309.B THR313.B
interaction HBDonor HBDonor Cationic HBDonor
Frame
0 False False False False
10 False False False False
20 True False False False
30 False False False False
40 False True True False
[6]:
# same for a protein residue
df.xs("GLU309.B", level="protein", axis=1).head(5)
[6]:
ligand ARG147.A
interaction HBDonor Cationic
Frame
0 False False
10 False False
20 False False
30 False False
40 True True
[7]:
# display a specific type of interaction
df.xs("Cationic", level="interaction", axis=1).head(5)
[7]:
ligand ARG147.A
protein GLU309.B
Frame
0 False
10 False
20 False
30 False
40 True
[8]:
# calculate the occurence of each interaction on the trajectory
occ = df.mean()
# restrict to the frequent ones
occ.loc[occ > 0.3]
[8]:
ligand    protein   interaction
CYS122.A  GLY118.A  HBDonor        0.84
ASP123.A  LYS191.A  Anionic        0.32
TRP125.A  VAL102.A  HBDonor        0.60
          TYR109.A  PiStacking     0.48
          TRP115.A  PiStacking     0.80
SER127.A  SER181.A  HBDonor        0.32
ASP129.A  TYR359.B  HBAcceptor     0.96
HSD139.A  TRP174.A  PiStacking     0.36
ASP146.A  TYR157.A  HBAcceptor     0.88
          ARG161.A  HBAcceptor     0.48
                    Anionic        0.36
ARG147.A  GLU309.B  HBDonor        0.44
                    Cationic       0.44
TRP149.A  ASP153.A  HBAcceptor     0.36
dtype: float64
[9]:
# regroup all interactions together and do the same
g = (df.groupby(level=["ligand", "protein"], axis=1)
       .sum()
       .astype(bool)
       .mean())
g.loc[g > 0.3]
[9]:
ligand    protein
ARG147.A  GLU309.B    0.44
ASP123.A  LYS191.A    0.32
ASP129.A  TYR359.B    0.96
ASP146.A  ARG161.A    0.48
          TYR157.A    0.88
CYS122.A  GLY118.A    0.84
HSD139.A  TRP174.A    0.36
SER127.A  SER181.A    0.32
TRP125.A  TRP115.A    0.80
          TYR109.A    0.48
          VAL102.A    0.60
TRP149.A  ASP153.A    0.36
dtype: float64

3.1. Ignoring backbone interactions

In some cases, you might want to dismiss backbone interactions. While it might be tempting to just modify the MDAnalysis selection with "protein and not backbone", this won’t work as expected and will lead to adding a charges where the backbone was bonding with the sidechain.
However there is a temporary workaround (which will be directly included in the code in the near future):
[10]:
from rdkit import Chem
from rdkit.Chem import AllChem
from tqdm.auto import tqdm

# remove backbone
backbone = Chem.MolFromSmarts("[C^2](=O)-[C;X4](-[H])-[N;+0]")
fix_h = Chem.MolFromSmarts("[H&D0]")

def remove_backbone(atomgroup):
    mol = plf.Molecule.from_mda(atomgroup)
    mol = AllChem.DeleteSubstructs(mol, backbone)
    mol = AllChem.DeleteSubstructs(mol, fix_h)
    return plf.Molecule(mol)

# generate IFP
ifp = []
for ts in tqdm(u.trajectory[::10]):
    tm3_mol = remove_backbone(tm3)
    prot_mol = remove_backbone(prot)
    data = fp.generate(tm3_mol, prot_mol)
    data["Frame"] = ts.frame
    ifp.append(data)
df = plf.to_dataframe(ifp, fp.interactions.keys())
df.head()
[10]:
ligand GLN119.A CYS122.A ASP123.A PHE124.A TRP125.A ... ASP146.A ARG147.A TYR148.A TRP149.A THR152.A
protein GLN189.A SER197.A LYS191.A ARG188.A GLN189.A LYS191.A PHE185.A SER106.A ... TYR157.A ARG161.A GLU309.B THR313.B ARG230.A TYR157.A ASP153.A GLU234.A
interaction HBAcceptor HBAcceptor Anionic HBAcceptor Anionic HBAcceptor HBAcceptor Anionic PiStacking HBDonor ... HBAcceptor HBAcceptor Anionic HBDonor Cationic HBDonor HBAcceptor PiStacking HBDonor HBDonor
Frame
0 False False False False False False False False False True ... True False False False False False False False False False
10 False False False False False False False False False False ... True False False False False False False True False False
20 False True False False False False True True False False ... True False False False False False False True False False
30 False False False True True False False False False False ... True True True False False False False False False False
40 False False False False False False True True False True ... True False False True True False False False False False

5 rows × 28 columns