4. Visualisation¶
This notebook showcases different ways of visualizing lig-prot and prot-prot interactions, either with atomistic details or simply at the residue level.
[1]:
import MDAnalysis as mda
import prolif as plf
import numpy as np
# load topology
u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)
lig = u.select_atoms("resname LIG")
prot = u.select_atoms("protein")
[2]:
# create RDKit-like molecules for visualisation
lmol = plf.Molecule.from_mda(lig)
pmol = plf.Molecule.from_mda(prot)
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
if isinstance(value, (float, np.float)):
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
elif isinstance(value, (int, np.int)):
[3]:
# get lig-prot interactions with atom info
fp = plf.Fingerprint(["HBDonor", "HBAcceptor", "Cationic", "PiStacking"])
fp.run(u.trajectory[0:1], lig, prot, return_atoms=True)
df = fp.to_dataframe()
df.T
[3]:
Frame | 0 | ||
---|---|---|---|
ligand | protein | interaction | |
LIG1.G | ASP129.A | Cationic | (13, 10) |
HBDonor | (52, 10) | ||
PHE331.B | PiStacking | (3, 9) | |
SER212.A | HBDonor | (44, 2) | |
VAL201.A | HBAcceptor | (28, 6) |
4.1. py3Dmol (3Dmol.js)¶
With py3Dmol we can easily display the interactions.
For interactions involving a ring (pi-cation, pi-stacking…etc.) ProLIF returns the index of one of the ring atoms, but for visualisation having the centroid of the ring looks nicer. We’ll start by writing a function to find the centroid, given the index of one of the ring atoms.
[4]:
from rdkit import Chem
from rdkit import Geometry
def get_ring_centroid(mol, index):
# find ring using the atom index
Chem.SanitizeMol(mol, Chem.SanitizeFlags.SANITIZE_SETAROMATICITY)
ri = mol.GetRingInfo()
for r in ri.AtomRings():
if index in r:
break
else:
raise ValueError("No ring containing this atom index was found in the given molecule")
# get centroid
coords = mol.xyz[list(r)]
ctd = plf.utils.get_centroid(coords)
return Geometry.Point3D(*ctd)
Finally, the actual visualisation code. The API of py3Dmol is exactly the same as the GLViewer class of 3Dmol.js, for which the documentation can be found here.
[5]:
import py3Dmol
colors = {
"HBAcceptor": "blue",
"HBDonor": "red",
"Cationic": "green",
"PiStacking": "purple",
}
# JavaScript functions
resid_hover = """function(atom,viewer) {{
if(!atom.label) {{
atom.label = viewer.addLabel('{0}:'+atom.atom+atom.serial,
{{position: atom, backgroundColor: 'mintcream', fontColor:'black'}});
}}
}}"""
hover_func = """
function(atom,viewer) {
if(!atom.label) {
atom.label = viewer.addLabel(atom.interaction,
{position: atom, backgroundColor: 'black', fontColor:'white'});
}
}"""
unhover_func = """
function(atom,viewer) {
if(atom.label) {
viewer.removeLabel(atom.label);
delete atom.label;
}
}"""
v = py3Dmol.view(650, 600)
v.removeAllModels()
models = {}
mid = -1
for i, row in df.T.iterrows():
lresid, presid, interaction = i
lindex, pindex = row[0]
lres = lmol[lresid]
pres = pmol[presid]
# set model ids for reusing later
for resid, res, style in [(lresid, lres, {"colorscheme": "cyanCarbon"}),
(presid, pres, {})]:
if resid not in models.keys():
mid += 1
v.addModel(Chem.MolToMolBlock(res), "sdf")
model = v.getModel()
model.setStyle({}, {"stick": style})
# add residue label
model.setHoverable({}, True, resid_hover.format(resid), unhover_func)
models[resid] = mid
# get coordinates for both points of the interaction
if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "PiCation"]:
p1 = get_ring_centroid(lres, lindex)
else:
p1 = lres.GetConformer().GetAtomPosition(lindex)
if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "CationPi"]:
p2 = get_ring_centroid(pres, pindex)
else:
p2 = pres.GetConformer().GetAtomPosition(pindex)
# add interaction line
v.addCylinder({"start": dict(x=p1.x, y=p1.y, z=p1.z),
"end": dict(x=p2.x, y=p2.y, z=p2.z),
"color": colors[interaction],
"radius": .15,
"dashed": True,
"fromCap": 1,
"toCap": 1,
})
# add label when hovering the middle of the dashed line by adding a dummy atom
c = Geometry.Point3D(*plf.utils.get_centroid([p1, p2]))
modelID = models[lresid]
model = v.getModel(modelID)
model.addAtoms([{"elem": 'Z',
"x": c.x, "y": c.y, "z": c.z,
"interaction": interaction}])
model.setStyle({"interaction": interaction}, {"clicksphere": {"radius": .5}})
model.setHoverable(
{"interaction": interaction}, True,
hover_func, unhover_func)
# show protein:
# first we need to reorder atoms as in the original MDAnalysis file.
# needed because the RDKitConverter reorders them when infering bond order
# and 3Dmol.js doesn't like when atoms from the same residue are spread accross the whole file
order = np.argsort([atom.GetIntProp("_MDAnalysis_index") for atom in pmol.GetAtoms()])
mol = Chem.RenumberAtoms(pmol, order.astype(int).tolist())
mol = Chem.RemoveAllHs(mol)
pdb = Chem.MolToPDBBlock(mol, flavor=0x20 | 0x10)
v.addModel(pdb, "pdb")
model = v.getModel()
model.setStyle({}, {"cartoon": {"style":"edged"}})
v.zoomTo({"model": list(models.values())})
You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol
[5]:
<py3Dmol.view at 0x7f75285e1ac0>
4.2. RDKit¶
This script is given as a starting point for visualising the interactions with RDKit. It’s definitely not ideal for 3D visualisation as the drawing ends up being crowded and quite unreadable very quickly. I’m sure there’s a better way to do this but it will do as a code snippet.
[6]:
from rdkit.Chem import Draw
from IPython import display
colors = {
"HBAcceptor": (0, 1, 1),
"HBDonor": (1, .7,.3),
"Cationic": (0, 1, 0),
"PiStacking": (.5, 0, .5),
}
d = Draw.MolDraw2DSVG(650, 600)
opts = Draw.MolDrawOptions()
opts.fixedBondLength = 25
d.SetDrawOptions(opts)
displayed = []
for i, row in df.T.iterrows():
lresid, presid, interaction = i
lindex, pindex = row[0]
lres = lmol[lresid]
pres = pmol[presid]
for resid, res in [(lresid, lres), (presid, pres)]:
if resid not in displayed:
displayed.append(resid)
d.DrawMolecule(res)
if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "PiCation"]:
p1 = get_ring_centroid(lres, lindex)
else:
p1 = lres.GetConformer().GetAtomPosition(lindex)
if interaction in ["PiStacking", "EdgeToFace", "FaceToFace", "CationPi"]:
p2 = get_ring_centroid(pres, pindex)
else:
p2 = pres.GetConformer().GetAtomPosition(pindex)
p1 = Geometry.Point2D(p1)
p2 = Geometry.Point2D(p2)
d.DrawWavyLine(p1, p2, colors[interaction], colors[interaction])
# display
d.FinishDrawing()
display.SVG(d.GetDrawingText())
[6]:
4.3. NetworkX and pyvis¶
NetworkX is a great library for working with graphs, but the drawing options are quickly limited so we will use networkx to create a graph, and pyvis to create interactive plots. The following code snippet will calculate the IFP, each residue (ligand or protein) is converted to a node, each interaction to an edge, and the occurence of each interaction between residues will be used to control the weight and thickness of each edge.
[7]:
import networkx as nx
from pyvis.network import Network
from tqdm.auto import tqdm
from matplotlib import cm, colors
from IPython.display import IFrame
[8]:
# get lig-prot interactions and distance between residues
fp = plf.Fingerprint()
fp.run(u.trajectory[::10], lig, prot)
df = fp.to_dataframe()
df.head()
[8]:
ligand | LIG1.G | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
protein | TYR38.A | TYR109.A | THR110.A | TRP125.A | LEU126.A | ASP129.A | ILE130.A | CYS133.A | ... | MET337.B | PRO338.B | ALA343.B | CYS344.B | LEU348.B | PHE351.B | ASP352.B | THR355.B | TYR359.B | |||
interaction | Hydrophobic | Hydrophobic | Hydrophobic | Hydrophobic | Hydrophobic | Hydrophobic | HBDonor | Cationic | Hydrophobic | Hydrophobic | ... | Hydrophobic | Hydrophobic | Hydrophobic | Hydrophobic | Hydrophobic | Hydrophobic | PiStacking | Hydrophobic | Hydrophobic | Hydrophobic |
Frame | |||||||||||||||||||||
0 | False | True | False | True | True | True | True | True | True | True | ... | True | True | False | False | False | True | False | True | True | True |
10 | False | True | False | True | True | True | True | True | True | True | ... | True | False | True | False | False | True | True | False | True | True |
20 | False | True | True | True | True | True | True | True | True | True | ... | True | False | True | False | False | True | False | False | False | True |
30 | True | True | True | True | True | True | True | True | True | True | ... | True | False | True | True | False | True | True | False | True | True |
40 | False | True | True | True | True | True | True | True | True | True | ... | False | False | False | False | False | True | True | True | False | True |
5 rows × 40 columns
[9]:
def make_graph(values, df=None,
node_color=["#FFB2AC", "#ACD0FF"], node_shape="dot",
edge_color="#a9a9a9", width_multiplier=1):
"""Convert a pandas DataFrame to a NetworkX object
Parameters
----------
values : pandas.Series
Series with 'ligand' and 'protein' levels, and a unique value for
each lig-prot residue pair that will be used to set the width and weigth
of each edge. For example:
ligand protein
LIG1.G ALA216.A 0.66
ALA343.B 0.10
df : pandas.DataFrame
DataFrame obtained from the fp.to_dataframe() method
Used to label each edge with the type of interaction
node_color : list
Colors for the ligand and protein residues, respectively
node_shape : str
One of ellipse, circle, database, box, text or image, circularImage,
diamond, dot, star, triangle, triangleDown, square, icon.
edge_color : str
Color of the edge between nodes
width_multiplier : int or float
Each edge's width is defined as `width_multiplier * value`
"""
lig_res = values.index.get_level_values("ligand").unique().tolist()
prot_res = values.index.get_level_values("protein").unique().tolist()
G = nx.Graph()
# add nodes
# https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network.add_node
for res in lig_res:
G.add_node(res, title=res, shape=node_shape,
color=node_color[0], dtype="ligand")
for res in prot_res:
G.add_node(res, title=res, shape=node_shape,
color=node_color[1], dtype="protein")
for resids, value in values.items():
label = "{} - {}<br>{}".format(*resids, "<br>".join([f"{k}: {v}"
for k, v in (df.xs(resids,
level=["ligand", "protein"],
axis=1)
.sum()
.to_dict()
.items())]))
# https://pyvis.readthedocs.io/en/latest/documentation.html#pyvis.network.Network.add_edge
G.add_edge(*resids, title=label, color=edge_color,
weight=value, width=value*width_multiplier)
return G
4.3.1. Regrouping all interactions¶
We will regroup all interactions as if they were equivalent.
[10]:
data = (df.groupby(level=["ligand", "protein"], axis=1)
.sum()
.astype(bool)
.mean())
G = make_graph(data, df, width_multiplier=3)
# display graph
net = Network(width=600, height=500, notebook=True, heading="")
net.from_nx(G)
net.write_html("lig-prot_graph.html")
IFrame("lig-prot_graph.html", width=610, height=510)
[10]:
4.3.2. Only plotting a specific interaction¶
We can also plot a specific type of interaction.
[11]:
data = (df.xs("Hydrophobic", level="interaction", axis=1)
.mean())
G = make_graph(data, df, width_multiplier=3)
# display graph
net = Network(width=600, height=500, notebook=True, heading="")
net.from_nx(G)
net.write_html("lig-prot_hydrophobic_graph.html")
IFrame("lig-prot_hydrophobic_graph.html", width=610, height=510)
[11]:
4.3.3. Protein-protein interaction¶
This kind of “residue-level” visualisation is especially suitable for protein-protein interactions. Here we’ll show the interactions between one helix of our G-Protein coupled receptor (transmembrane helix 3, or TM3) in red and the rest of the protein in blue.
[12]:
tm3 = u.select_atoms("resid 119:152")
prot = u.select_atoms("protein and not group tm3", tm3=tm3)
fp = plf.Fingerprint()
fp.run(u.trajectory[::10], tm3, prot)
df = fp.to_dataframe()
df.head()
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
if isinstance(value, (float, np.float)):
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
elif isinstance(value, (int, np.int)):
[12]:
ligand | GLN119.A | ... | THR152.A | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
protein | GLY118.A | GLN189.A | ALA190.A | LYS191.A | ALA192.A | GLU193.A | GLU194.A | ... | ASP153.A | ALA154.A | GLU234.A | ARG238.A | LYS241.A | ||||||||
interaction | Hydrophobic | Hydrophobic | HBAcceptor | HBDonor | Hydrophobic | Hydrophobic | HBDonor | HBAcceptor | Hydrophobic | Hydrophobic | ... | Hydrophobic | HBDonor | HBAcceptor | Hydrophobic | Hydrophobic | HBDonor | Hydrophobic | HBAcceptor | Hydrophobic | HBAcceptor |
Frame | |||||||||||||||||||||
0 | True | False | False | False | False | False | True | False | True | True | ... | True | False | False | False | True | False | False | False | False | False |
10 | True | False | False | False | False | False | False | False | False | False | ... | True | False | False | True | True | False | False | False | False | False |
20 | True | False | False | False | False | False | False | False | True | False | ... | True | False | True | True | True | False | False | True | False | False |
30 | True | False | False | False | False | True | False | False | False | False | ... | True | False | False | False | False | False | True | False | False | False |
40 | True | False | False | False | True | False | False | False | False | False | ... | True | False | False | False | False | False | False | False | False | False |
5 rows × 232 columns
[13]:
data = (df.groupby(level=["ligand", "protein"], axis=1, sort=False)
.sum()
.astype(bool)
.mean())
G = make_graph(data, df, width_multiplier=8)
# color each node based on its degree
max_nbr = len(max(G.adj.values(), key=lambda x: len(x)))
blues = cm.get_cmap('Blues', max_nbr)
reds = cm.get_cmap('Reds', max_nbr)
for n, d in G.nodes(data=True):
n_neighbors = len(G.adj[n])
# show TM3 in red and the rest of the protein in blue
palette = reds if d["dtype"] == "ligand" else blues
d["color"] = colors.to_hex( palette(n_neighbors / max_nbr) )
# convert to pyvis network
net = Network(width=640, height=500, notebook=True, heading="")
net.from_nx(G)
net.write_html("prot-prot_graph.html")
IFrame("prot-prot_graph.html", width=650, height=510)
[13]:
4.3.4. Residue interaction network¶
Another possible application is the visualisation of the residue interaction network of the whole protein. Since this protein is a GPCR, the graph will mostly display the HBond interactions reponsible for the secondary structure of the protein (7 alpha-helices). It would also show hydrophobic interactions between neighbor residues, so I’m simply going to disable it in the Fingerprint.
[14]:
prot = u.select_atoms("protein")
fp = plf.Fingerprint(['HBDonor', 'HBAcceptor', 'PiStacking', 'Anionic', 'Cationic', 'CationPi', 'PiCation'])
fp.run(u.trajectory[::10], prot, prot)
df = fp.to_dataframe()
df.head()
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:492: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
if isinstance(value, (float, np.float)):
/home/docs/checkouts/readthedocs.org/user_builds/prolif/conda/v0.3.1/lib/python3.9/site-packages/MDAnalysis/coordinates/RDKit.py:494: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
elif isinstance(value, (int, np.int)):
[14]:
ligand | TYR38.A | ILE39.A | ... | ARG385.B | PHE386.B | LYS387.B | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
protein | TYR38.A | TYR40.A | GLN41.A | ASP42.A | ARG114.A | TRP345.B | PHE346.B | HSE347.B | ASP352.B | SER43.A | ... | LEU383.B | LYS387.B | PHE380.B | HSD381.B | ILE384.B | PHE386.B | HSD381.B | ARG385.B | ||
interaction | PiStacking | PiStacking | HBAcceptor | HBAcceptor | PiCation | PiStacking | HBDonor | PiStacking | HBDonor | HBAcceptor | ... | HBDonor | HBAcceptor | PiStacking | HBDonor | HBAcceptor | PiStacking | HBDonor | PiStacking | HBAcceptor | HBDonor |
Frame | |||||||||||||||||||||
0 | True | False | False | False | False | False | False | False | False | False | ... | True | False | False | True | False | True | False | True | False | False |
10 | True | False | False | False | False | False | True | False | False | False | ... | True | False | False | True | False | True | False | True | True | False |
20 | True | False | False | False | False | False | False | False | False | False | ... | True | False | False | True | False | False | False | True | False | False |
30 | True | False | False | False | False | False | True | True | False | False | ... | False | False | False | True | True | False | False | True | False | False |
40 | True | False | False | False | False | False | True | True | False | False | ... | False | False | False | True | False | False | False | True | False | False |
5 rows × 1337 columns
To hide most of the HBond interactions responsible for the alpha-helix structuration, I will show how to do it on the pandas DataFrame for simplicity, but ideally you should copy-paste the source code inside the fp.run
method and add the condition shown below before calculating the bitvector for a residue pair, then use the custom function instead of fp.run
. This would make the analysis faster and more memory efficient.
[15]:
# remove interactions between residues i and i±4 or less
mask = []
for l, p, interaction in df.columns:
lr = plf.ResidueId.from_string(l)
pr = plf.ResidueId.from_string(p)
if (pr == lr) or (abs(pr.number - lr.number) <= 4
and interaction in ["HBDonor", "HBAcceptor", "Hydrophobic"]):
mask.append(False)
else:
mask.append(True)
df = df[df.columns[mask]]
df.head()
[15]:
ligand | TYR38.A | TYR40.A | GLN41.A | ... | HSD381.B | PHE386.B | LYS387.B | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
protein | TYR40.A | ARG114.A | TRP345.B | PHE346.B | HSE347.B | ASP352.B | TYR38.A | HSE347.B | ASP352.B | LYS49.A | ... | PHE380.B | PHE386.B | LYS387.B | PHE380.B | HSD381.B | HSD381.B | ||||
interaction | PiStacking | PiCation | PiStacking | HBDonor | PiStacking | HBDonor | PiStacking | PiStacking | HBDonor | HBAcceptor | ... | PiStacking | HBDonor | HBAcceptor | PiStacking | HBDonor | PiStacking | HBDonor | HBAcceptor | PiStacking | HBAcceptor |
Frame | |||||||||||||||||||||
0 | False | False | False | False | False | False | False | False | True | False | ... | True | False | True | True | False | False | True | False | True | False |
10 | False | False | False | True | False | False | False | False | True | False | ... | False | False | True | True | True | False | True | False | True | True |
20 | False | False | False | False | False | False | False | False | True | False | ... | False | False | True | False | False | False | True | False | False | False |
30 | False | False | False | True | True | False | False | False | True | False | ... | True | True | True | False | False | False | True | True | False | False |
40 | False | False | False | True | True | False | False | False | True | False | ... | False | False | True | False | False | False | True | False | False | False |
5 rows × 416 columns
[16]:
data = (df.groupby(level=["ligand", "protein"], axis=1, sort=False)
.sum()
.astype(bool)
.mean())
G = make_graph(data, df, width_multiplier=5)
# color each node based on its degree
max_nbr = len(max(G.adj.values(), key=lambda x: len(x)))
palette = cm.get_cmap('YlGnBu', max_nbr)
for n, d in G.nodes(data=True):
n_neighbors = len(G.adj[n])
d["color"] = colors.to_hex( palette(n_neighbors / max_nbr) )
# convert to pyvis network
net = Network(width=640, height=500, notebook=True, heading="")
net.from_nx(G)
# use specific layout
layout = nx.circular_layout(G)
for node in net.nodes:
node["x"] = layout[node["id"]][0] * 1000
node["y"] = layout[node["id"]][1] * 1000
net.toggle_physics(False)
net.write_html("residue-network_graph.html")
IFrame("residue-network_graph.html", width=650, height=510)
[16]:
End of this notebook. If you have other suggestions for displaying interaction fingerprints, please create a new Discussion on GitHub 👍
List of files to be automatically copied to the docs: