4. Visualisation¶

This notebook showcases different ways of visualizing lig-prot and prot-prot interactions, either with atomistic details or simply at the residue level.

[1]:

import MDAnalysis as mda
import prolif as plf
import numpy as np
# load topology
u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)
lig = u.select_atoms("resname LIG")
prot = u.select_atoms("protein")

[2]:

# create RDKit-like molecules for visualisation
lmol = plf.Molecule.from_mda(lig)
pmol = plf.Molecule.from_mda(prot)

/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(value, (float, np.float)):
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(value, (int, np.int)):

[3]:

# get lig-prot interactions with atom info
fp = plf.Fingerprint(["HBDonor", "HBAcceptor", "Cationic", "PiStacking"])
fp.run(u.trajectory[0:1], lig, prot, return_atoms=True)
df = fp.to_dataframe()
df.T

[3]:

		Frame	0
ligand	protein	interaction
LIG1.G	ASP129.A	Cationic	(13, 10)
	ASP129.A	HBDonor	(52, 10)
	PHE331.B	PiStacking	(3, 9)
	SER212.A	HBDonor	(44, 2)
	VAL201.A	HBAcceptor	(28, 6)

4.1. py3Dmol (3Dmol.js)¶

With py3Dmol we can easily display the interactions.

For interactions involving a ring (pi-cation, pi-stacking…etc.) ProLIF returns the index of one of the ring atoms, but for visualisation having the centroid of the ring looks nicer. We’ll start by writing a function to find the centroid, given the index of one of the ring atoms.

[4]:

from rdkit import Chem
from rdkit import Geometry

def get_ring_centroid(mol, index):
    # find ring using the atom index
    Chem.SanitizeMol(mol, Chem.SanitizeFlags.SANITIZE_SETAROMATICITY)
    ri = mol.GetRingInfo()
    for r in ri.AtomRings():
        if index in r:
            break
    else:
        raise ValueError("No ring containing this atom index was found in the given molecule")
    # get centroid
    coords = mol.xyz[list(r)]
    ctd = plf.utils.get_centroid(coords)
    return Geometry.Point3D(*ctd)

Finally, the actual visualisation code. The API of py3Dmol is exactly the same as the GLViewer class of 3Dmol.js, for which the documentation can be found here.

[5]:

import py3Dmol

colors = {
    "HBAcceptor": "blue",
    "HBDonor": "red",
    "Cationic": "green",
    "PiStacking": "purple",
}

# JavaScript functions
resid_hover = """function(atom,viewer) {{
    if(!atom.label) {{
        atom.label = viewer.addLabel('{0}:'+atom.atom+atom.serial,
            {{position: atom, backgroundColor: 'mintcream', fontColor:'black'}});
    }}
}}"""
hover_func = """
function(atom,viewer) {
    if(!atom.label) {
        atom.label = viewer.addLabel(atom.interaction,
            {position: atom, backgroundColor: 'black', fontColor:'white'});
    }
}"""
unhover_func = """
function(atom,viewer) {
    if(atom.label) {
        viewer.removeLabel(atom.label);
        delete atom.label;
    }
}"""

v = py3Dmol.view(650, 600)
v.removeAllModels()

models = {}
mid = -1
for i, row in df.T.iterrows():
    lresid, presid, interaction = i
    lindex, pindex = row[0]
    lres = lmol[lresid]
    pres = pmol[presid]
    # set model ids for reusing later
    for resid, res, style in [(lresid, lres, {"colorscheme": "cyanCarbon"}),
                              (presid, pres, {})]:
        if resid not in models.keys():
            mid += 1
            v.addModel(Chem.MolToMolBlock(res), "sdf")
            model = v.getModel()
            model.setStyle({}, {"stick": style})
            # add residue label
            model.setHoverable({}, True, resid_hover.format(resid), unhover_func)
            models[resid] = mid
    # get coordinates for both points of the interaction
    if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "PiCation"]:
        p1 = get_ring_centroid(lres, lindex)
    else:
        p1 = lres.GetConformer().GetAtomPosition(lindex)
    if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "CationPi"]:
        p2 = get_ring_centroid(pres, pindex)
    else:
        p2 = pres.GetConformer().GetAtomPosition(pindex)
    # add interaction line
    v.addCylinder({"start": dict(x=p1.x, y=p1.y, z=p1.z),
                   "end":   dict(x=p2.x, y=p2.y, z=p2.z),
                   "color": colors[interaction],
                   "radius": .15,
                   "dashed": True,
                   "fromCap": 1,
                   "toCap": 1,
                  })
    # add label when hovering the middle of the dashed line by adding a dummy atom
    c = Geometry.Point3D(*plf.utils.get_centroid([p1, p2]))
    modelID = models[lresid]
    model = v.getModel(modelID)
    model.addAtoms([{"elem": 'Z',
                     "x": c.x, "y": c.y, "z": c.z,
                     "interaction": interaction}])
    model.setStyle({"interaction": interaction}, {"clicksphere": {"radius": .5}})
    model.setHoverable(
        {"interaction": interaction}, True,
        hover_func, unhover_func)

# show protein:
# first we need to reorder atoms as in the original MDAnalysis file.
# needed because the RDKitConverter reorders them when infering bond order
# and 3Dmol.js doesn't like when atoms from the same residue are spread accross the whole file
order = np.argsort([atom.GetIntProp("_MDAnalysis_index") for atom in pmol.GetAtoms()])
mol = Chem.RenumberAtoms(pmol, order.astype(int).tolist())
mol = Chem.RemoveAllHs(mol)
pdb = Chem.MolToPDBBlock(mol, flavor=0x20 | 0x10)
v.addModel(pdb, "pdb")
model = v.getModel()
model.setStyle({}, {"cartoon": {"style":"edged"}})

v.zoomTo({"model": list(models.values())})

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

[5]:

<py3Dmol.view at 0x7f75285e1ac0>

4.2. RDKit¶

This script is given as a starting point for visualising the interactions with RDKit. It’s definitely not ideal for 3D visualisation as the drawing ends up being crowded and quite unreadable very quickly. I’m sure there’s a better way to do this but it will do as a code snippet.

[6]:

from rdkit.Chem import Draw
from IPython import display

colors = {
    "HBAcceptor": (0, 1, 1),
    "HBDonor": (1, .7,.3),
    "Cationic": (0, 1, 0),
    "PiStacking": (.5, 0, .5),
}

d = Draw.MolDraw2DSVG(650, 600)
opts = Draw.MolDrawOptions()
opts.fixedBondLength = 25
d.SetDrawOptions(opts)

displayed = []
for i, row in df.T.iterrows():
    lresid, presid, interaction = i
    lindex, pindex = row[0]
    lres = lmol[lresid]
    pres = pmol[presid]
    for resid, res in [(lresid, lres), (presid, pres)]:
        if resid not in displayed:
            displayed.append(resid)
            d.DrawMolecule(res)
    if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "PiCation"]:
        p1 = get_ring_centroid(lres, lindex)
    else:
        p1 = lres.GetConformer().GetAtomPosition(lindex)
    if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "CationPi"]:
        p2 = get_ring_centroid(pres, pindex)
    else:
        p2 = pres.GetConformer().GetAtomPosition(pindex)
    p1 = Geometry.Point2D(p1)
    p2 = Geometry.Point2D(p2)
    d.DrawWavyLine(p1, p2, colors[interaction], colors[interaction])

# display
d.FinishDrawing()
display.SVG(d.GetDrawingText())

[6]:

../_images/notebooks_visualisation_9_0.svg

4.3. NetworkX and pyvis¶

NetworkX is a great library for working with graphs, but the drawing options are quickly limited so we will use networkx to create a graph, and pyvis to create interactive plots. The following code snippet will calculate the IFP, each residue (ligand or protein) is converted to a node, each interaction to an edge, and the occurence of each interaction between residues will be used to control the weight and thickness of each edge.

[7]:

import networkx as nx
from pyvis.network import Network
from tqdm.auto import tqdm
from matplotlib import cm, colors
from IPython.display import IFrame

[8]:

# get lig-prot interactions and distance between residues

fp = plf.Fingerprint()
fp.run(u.trajectory[::10], lig, prot)
df = fp.to_dataframe()
df.head()

[8]:

ligand	LIG1.G
protein	TYR38.A	TYR109.A	THR110.A	TRP125.A	LEU126.A	ASP129.A			ILE130.A	CYS133.A	...	MET337.B	PRO338.B	ALA343.B	CYS344.B	LEU348.B	PHE351.B		ASP352.B	THR355.B	TYR359.B
interaction	Hydrophobic	Hydrophobic	Hydrophobic	Hydrophobic	Hydrophobic	Hydrophobic	HBDonor	Cationic	Hydrophobic	Hydrophobic	...	Hydrophobic	Hydrophobic	Hydrophobic	Hydrophobic	Hydrophobic	Hydrophobic	PiStacking	Hydrophobic	Hydrophobic	Hydrophobic
Frame
0	False	True	False	True	True	True	True	True	True	True	...	True	True	False	False	False	True	False	True	True	True
10	False	True	False	True	True	True	True	True	True	True	...	True	False	True	False	False	True	True	False	True	True
20	False	True	True	True	True	True	True	True	True	True	...	True	False	True	False	False	True	False	False	False	True
30	True	True	True	True	True	True	True	True	True	True	...	True	False	True	True	False	True	True	False	True	True
40	False	True	True	True	True	True	True	True	True	True	...	False	False	False	False	False	True	True	True	False	True

5 rows × 40 columns

[9]:

def make_graph(values, df=None,
               node_color=["#FFB2AC", "#ACD0FF"], node_shape="dot",
               edge_color="#a9a9a9", width_multiplier=1):
    """Convert a pandas DataFrame to a NetworkX object

    Parameters
    ----------
    values : pandas.Series
        Series with 'ligand' and 'protein' levels, and a unique value for
        each lig-prot residue pair that will be used to set the width and weigth
        of each edge. For example:

            ligand  protein
            LIG1.G  ALA216.A    0.66
                    ALA343.B    0.10

    df : pandas.DataFrame
        DataFrame obtained from the fp.to_dataframe() method
        Used to label each edge with the type of interaction

    node_color : list
        Colors for the ligand and protein residues, respectively

    node_shape : str
        One of ellipse, circle, database, box, text or image, circularImage,
        diamond, dot, star, triangle, triangleDown, square, icon.

    edge_color : str
        Color of the edge between nodes

    width_multiplier : int or float
        Each edge's width is defined as `width_multiplier * value`
    """
    lig_res = values.index.get_level_values("ligand").unique().tolist()
    prot_res = values.index.get_level_values("protein").unique().tolist()

    G = nx.Graph()
    # add nodes
    # https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network.add_node
    for res in lig_res:
        G.add_node(res, title=res, shape=node_shape,
                   color=node_color[0], dtype="ligand")
    for res in prot_res:
        G.add_node(res, title=res, shape=node_shape,
                   color=node_color[1], dtype="protein")

    for resids, value in values.items():
        label = "{} - {}<br>{}".format(*resids, "<br>".join([f"{k}: {v}"
                                       for k, v in (df.xs(resids,
                                                          level=["ligand", "protein"],
                                                          axis=1)
                                                      .sum()
                                                      .to_dict()
                                                      .items())]))
        # https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network.add_edge
        G.add_edge(*resids, title=label, color=edge_color,
                   weight=value, width=value*width_multiplier)

    return G

4.3.1. Regrouping all interactions¶

We will regroup all interactions as if they were equivalent.

[10]:

data = (df.groupby(level=["ligand", "protein"], axis=1)
          .sum()
          .astype(bool)
          .mean())

G = make_graph(data, df, width_multiplier=3)

# display graph
net = Network(width=600, height=500, notebook=True, heading="")
net.from_nx(G)
net.write_html("lig-prot_graph.html")
IFrame("lig-prot_graph.html", width=610, height=510)

[10]:

4.3.2. Only plotting a specific interaction¶

We can also plot a specific type of interaction.

[11]:

data = (df.xs("Hydrophobic", level="interaction", axis=1)
          .mean())

G = make_graph(data, df, width_multiplier=3)

# display graph
net = Network(width=600, height=500, notebook=True, heading="")
net.from_nx(G)
net.write_html("lig-prot_hydrophobic_graph.html")
IFrame("lig-prot_hydrophobic_graph.html", width=610, height=510)

[11]:

4.3.3. Protein-protein interaction¶

This kind of “residue-level” visualisation is especially suitable for protein-protein interactions. Here we’ll show the interactions between one helix of our G-Protein coupled receptor (transmembrane helix 3, or TM3) in red and the rest of the protein in blue.

[12]:

tm3 = u.select_atoms("resid 119:152")
prot = u.select_atoms("protein and not group tm3", tm3=tm3)
fp = plf.Fingerprint()
fp.run(u.trajectory[::10], tm3, prot)
df = fp.to_dataframe()
df.head()

/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(value, (float, np.float)):
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(value, (int, np.int)):

[12]:

ligand	GLN119.A										...	THR152.A
protein	GLY118.A	GLN189.A		ALA190.A	LYS191.A	ALA192.A			GLU193.A	GLU194.A	...	ASP153.A			ALA154.A	GLU234.A		ARG238.A		LYS241.A
interaction	Hydrophobic	Hydrophobic	HBAcceptor	HBDonor	Hydrophobic	Hydrophobic	HBDonor	HBAcceptor	Hydrophobic	Hydrophobic	...	Hydrophobic	HBDonor	HBAcceptor	Hydrophobic	Hydrophobic	HBDonor	Hydrophobic	HBAcceptor	Hydrophobic	HBAcceptor
Frame
0	True	False	False	False	False	False	True	False	True	True	...	True	False	False	False	True	False	False	False	False	False
10	True	False	False	False	False	False	False	False	False	False	...	True	False	False	True	True	False	False	False	False	False
20	True	False	False	False	False	False	False	False	True	False	...	True	False	True	True	True	False	False	True	False	False
30	True	False	False	False	False	True	False	False	False	False	...	True	False	False	False	False	False	True	False	False	False
40	True	False	False	False	True	False	False	False	False	False	...	True	False	False	False	False	False	False	False	False	False

5 rows × 232 columns

[13]:

data = (df.groupby(level=["ligand", "protein"], axis=1, sort=False)
          .sum()
          .astype(bool)
          .mean())

G = make_graph(data, df, width_multiplier=8)

# color each node based on its degree
max_nbr = len(max(G.adj.values(), key=lambda x: len(x)))
blues = cm.get_cmap('Blues', max_nbr)
reds = cm.get_cmap('Reds', max_nbr)
for n, d in G.nodes(data=True):
    n_neighbors = len(G.adj[n])
    # show TM3 in red and the rest of the protein in blue
    palette = reds if d["dtype"] == "ligand" else blues
    d["color"] = colors.to_hex( palette(n_neighbors / max_nbr) )

# convert to pyvis network
net = Network(width=640, height=500, notebook=True, heading="")
net.from_nx(G)
net.write_html("prot-prot_graph.html")
IFrame("prot-prot_graph.html", width=650, height=510)

[13]:

4.3.4. Residue interaction network¶

Another possible application is the visualisation of the residue interaction network of the whole protein. Since this protein is a GPCR, the graph will mostly display the HBond interactions reponsible for the secondary structure of the protein (7 alpha-helices). It would also show hydrophobic interactions between neighbor residues, so I’m simply going to disable it in the Fingerprint.

[14]:

prot = u.select_atoms("protein")
fp = plf.Fingerprint(['HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation'])
fp.run(u.trajectory[::10], prot, prot)
df = fp.to_dataframe()
df.head()

/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if isinstance(value, (float, np.float)):
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif isinstance(value, (int, np.int)):

[14]:

ligand	TYR38.A									ILE39.A	...	ARG385.B		PHE386.B						LYS387.B
protein	TYR38.A	TYR40.A	GLN41.A	ASP42.A	ARG114.A	TRP345.B	PHE346.B	HSE347.B	ASP352.B	SER43.A	...	LEU383.B	LYS387.B	PHE380.B	HSD381.B			ILE384.B	PHE386.B	HSD381.B	ARG385.B
interaction	PiStacking	PiStacking	HBAcceptor	HBAcceptor	PiCation	PiStacking	HBDonor	PiStacking	HBDonor	HBAcceptor	...	HBDonor	HBAcceptor	PiStacking	HBDonor	HBAcceptor	PiStacking	HBDonor	PiStacking	HBAcceptor	HBDonor
Frame
0	True	False	False	False	False	False	False	False	False	False	...	True	False	False	True	False	True	False	True	False	False
10	True	False	False	False	False	False	True	False	False	False	...	True	False	False	True	False	True	False	True	True	False
20	True	False	False	False	False	False	False	False	False	False	...	True	False	False	True	False	False	False	True	False	False
30	True	False	False	False	False	False	True	True	False	False	...	False	False	False	True	True	False	False	True	False	False
40	True	False	False	False	False	False	True	True	False	False	...	False	False	False	True	False	False	False	True	False	False

5 rows × 1337 columns

To hide most of the HBond interactions responsible for the alpha-helix structuration, I will show how to do it on the pandas DataFrame for simplicity, but ideally you should copy-paste the source code inside the fp.run method and add the condition shown below before calculating the bitvector for a residue pair, then use the custom function instead of fp.run. This would make the analysis faster and more memory efficient.

[15]:

# remove interactions between residues i and i±4 or less
mask = []
for l, p, interaction in df.columns:
    lr = plf.ResidueId.from_string(l)
    pr = plf.ResidueId.from_string(p)
    if (pr == lr) or (abs(pr.number - lr.number) <= 4
                      and interaction in ["HBDonor", "HBAcceptor", "Hydrophobic"]):
        mask.append(False)
    else:
        mask.append(True)
df = df[df.columns[mask]]
df.head()

[15]:

ligand	TYR38.A						TYR40.A			GLN41.A	...	HSD381.B					PHE386.B				LYS387.B
protein	TYR40.A	ARG114.A	TRP345.B	PHE346.B	HSE347.B	ASP352.B	TYR38.A	HSE347.B	ASP352.B	LYS49.A	...	PHE380.B	PHE386.B			LYS387.B	PHE380.B	HSD381.B			HSD381.B
interaction	PiStacking	PiCation	PiStacking	HBDonor	PiStacking	HBDonor	PiStacking	PiStacking	HBDonor	HBAcceptor	...	PiStacking	HBDonor	HBAcceptor	PiStacking	HBDonor	PiStacking	HBDonor	HBAcceptor	PiStacking	HBAcceptor
Frame
0	False	False	False	False	False	False	False	False	True	False	...	True	False	True	True	False	False	True	False	True	False
10	False	False	False	True	False	False	False	False	True	False	...	False	False	True	True	True	False	True	False	True	True
20	False	False	False	False	False	False	False	False	True	False	...	False	False	True	False	False	False	True	False	False	False
30	False	False	False	True	True	False	False	False	True	False	...	True	True	True	False	False	False	True	True	False	False
40	False	False	False	True	True	False	False	False	True	False	...	False	False	True	False	False	False	True	False	False	False

5 rows × 416 columns

[16]:

data = (df.groupby(level=["ligand", "protein"], axis=1, sort=False)
          .sum()
          .astype(bool)
          .mean())

G = make_graph(data, df, width_multiplier=5)

# color each node based on its degree
max_nbr = len(max(G.adj.values(), key=lambda x: len(x)))
palette = cm.get_cmap('YlGnBu', max_nbr)
for n, d in G.nodes(data=True):
    n_neighbors = len(G.adj[n])
    d["color"] = colors.to_hex( palette(n_neighbors / max_nbr) )

# convert to pyvis network
net = Network(width=640, height=500,  notebook=True, heading="")
net.from_nx(G)

# use specific layout
layout = nx.circular_layout(G)
for node in net.nodes:
    node["x"] = layout[node["id"]][0] * 1000
    node["y"] = layout[node["id"]][1] * 1000
net.toggle_physics(False)

net.write_html("residue-network_graph.html")
IFrame("residue-network_graph.html", width=650, height=510)

[16]:

End of this notebook. If you have other suggestions for displaying interaction fingerprints, please create a new Discussion on GitHub 👍

List of files to be automatically copied to the docs: