Helper functions#

Helper functions — prolif.utils#

prolif.utils.get_residues_near_ligand(lig: BaseRDKitMol, prot: BaseRDKitMol, cutoff: float = 6.0, use_segid: bool = False) list[prolif.residue.ResidueId][source]#

Detects residues close to a reference ligand

Parameters:
  • lig (prolif.molecule.Molecule or prolif.residue.Residue) – Select residues that are near this ligand/residue

  • prot (prolif.molecule.Molecule) – Protein containing the residues

  • cutoff (float) – If any interatomic distance between the ligand reference points and a residue is below or equal to this cutoff, the residue will be selected

  • use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain.

Returns:

  • residues (list) – A list of unique ResidueId that are close to the ligand

  • .. versionchanged:: 2.1.0 – Added use_segid.

prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, selections: str, **kwargs: Any) AtomGroup[source]#
prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, *selections: str, **kwargs: Any) list[MDAnalysis.core.groups.AtomGroup]

Returns AtomGroup(s) that satisfy each distance-based selection over the entire trajectory rather than only the first frame. Only useful for distance-based selections. Use group {0} in any additional selection to refer to the AtomGroup generated by the first selection, or group {N-1} for the Nth selection.

prolif.utils.to_bitvectors(df: DataFrame) list[rdkit.DataStructs.cDataStructs.ExplicitBitVect][source]#

Converts an interaction DataFrame to a list of RDKit ExplicitBitVector

Parameters:

df (pandas.DataFrame) – A DataFrame where each column corresponds to an interaction between two residues

Returns:

bv – A list of ExplicitBitVect for each frame

Return type:

list

Example

>>> from rdkit.DataStructs import TanimotoSimilarity
>>> bv = prolif.to_bitvectors(df)
>>> TanimotoSimilarity(bv[0], bv[1])
0.42
prolif.utils.to_dataframe(ifp: IFPResults, interactions: Collection[str], count: bool = False, dtype: type | None = None, drop_empty: bool = True, index_col: str = 'Frame') DataFrame[source]#

Converts IFPs to a pandas DataFrame

Parameters:
  • ifp (dict) – A dict in the format {<frame number>: {(<residue_id>, <residue_id>): <interactions>}}. <interactions> is either a numpy.ndarray bitvector, or a tuple of dict in the format {<interaction name>: <metadata dict>}.

  • interactions (list) – A list of interactions, in the same order as used to detect the interactions.

  • count (bool) – Whether to output a count fingerprint or not.

  • dtype (object or None) – Cast the dataframe values to this type. If None, uses np.uint8 if count=True, else bool.

  • drop_empty (bool) – Drop columns with only empty values

  • index_col (str) – Name of the index column in the DataFrame

Returns:

df – A 3-levels DataFrame where each ligand residue, protein residue, and interaction type are in separate columns

Return type:

pandas.DataFrame

Example

>>> df = prolif.to_dataframe(results, fp.interactions.keys(), dtype=int)
>>> print(df)
ligand             LIG1.G
protein             ILE59                  ILE55       TYR93
interaction   Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking
Frame
0                       0          1           0           0          0
...

Changed in version 0.3.2: Moved the return_atoms parameter from the run methods to the dataframe conversion code

Changed in version 2.0.0: Removed the return_atoms parameter. Added the count parameter. Removed support for ifp containing np.ndarray bitvectors.

Molecule standardizer — prolif.io.molecule_standardizer#

This module provides the MoleculeStandardizer class for standardizing residue names and fixing bond orders in molecular structures.

Yu-Yuan (Stuart) Yang, 2025

class prolif.io.molecule_standardizer.MoleculeStandardizer(templates: collections.abc.Sequence[gemmi.cif.Document | tuple[str, rdkit.Chem.rdchem.Mol]] | None = None)[source]#

MoleculeStandardizer standardizes residue names and fixes bond orders when reading molecules with RDKit.

New in version 2.2.0.

Parameters:

templates (Sequence[gemmi.cif.Document | tuple[str, Chem.Mol]] | None, optional) –

The templates to use for standardizing the molecule. Each element can be:

  • A gemmi.cif.Document (e.g. from cif_template_reader()), where each block in the document provides a CIF-based template.

  • A (residue_name, rdkit.Chem.Mol) tuple for an RDKit-molecule-based template.

If None, only the standard amino acid templates are used. User-supplied templates take priority over the built-in ones (first match wins). Default is None.

engines#

Maps each residue name to its corresponding template engine.

Type:

dict[str, TemplateEngine]

Note

This class works with ProLIF Molecule instances or PDB files. It reads the input topology, standardizes the residue names, and fixes the bond orders based on the provided templates or the standard amino acid template.

Example

>>> import prolif as plf
>>> from prolif.io import MoleculeStandardizer, cif_template_reader
>>> from rdkit import Chem
>>> standardizer = MoleculeStandardizer(
...     templates=[
...         cif_template_reader("path/to/ligand.cif"),
...         ("EDO", Chem.MolFromSmiles("OCCO")),
...     ]
... )
>>> mol = standardizer("path/to/protein.pdb")
>>> plf.display_residues(mol)
static convert_to_standard_resname(resname: str, forcefield_name: str = 'unknown') str[source]#

Convert a residue name to its standard form based on the forcefield.

Parameters:
  • resname (str) – The residue name to convert.

  • forcefield_name (str) – The name of the forcefield for assigning the correct standard name for CYS. Default is “unknown”.

Returns:

The standard residue name.

Return type:

str

Note

This conversion is designed to distinguish residues with different possible H-bond donors at side chains, instead of the actual protonated states of residues.

For example, neutral and protonated arginine are both assigned to ARG, while neutral and deprotonated arginine in GROMOS force field are assigned to ARGN and ARG, respectively.

static forcefield_guesser(conv_resnames: set[str]) str[source]#

Guess the forcefield based on the residue names.

Parameters:

conv_resnames (set[str]) – Set of residue names in the protein molecule.

Returns:

The guessed forcefield name.

Return type:

str

static n_residue_heavy_atoms(residue: Residue) int[source]#

Count the number of heavy atoms in a residue.

Parameters:

residue (Residue) – The residue to count the heavy atoms in.

Returns:

The number of heavy atoms in the residue.

Return type:

int

Template engines — prolif.io.template_engine#

This module defines the TemplateEngine protocol and its concrete implementations for fixing bond orders on residues.

New in version 2.2.0.

class prolif.io.template_engine.CIFTemplateEngine(name: str, block: Block)[source]#

Template engine using a gemmi CIF block.

Parameters:
  • name (str) – The residue name this template matches.

  • block (gemmi.cif.Block) – The gemmi CIF block containing the bond and atom data.

class prolif.io.template_engine.RDKitMolTemplateEngine(name: str, mol: Mol)[source]#

Template engine using an RDKit Mol object.

Parameters:
  • name (str) – The residue name this template matches.

  • mol (Chem.Mol) – The RDKit molecule to use as a template.

class prolif.io.template_engine.TemplateEngine(*args, **kwargs)[source]#

Protocol for template engines that fix bond orders on a residue.

apply(residue: Residue) Residue[source]#

Fix the bond orders of the residue using this template.

n_heavy_atoms() int[source]#

Number of heavy atoms in this template (excluding OXT).

property name: str#

The residue name this template matches.

prolif.io.template_engine.assign_bond_orders_from_template(template_mol: Mol, mol: prolif.residue.Residue | rdkit.Chem.rdchem.Mol) Mol[source]#

Assign bond orders from an RDKit template molecule to a residue.

Parameters:
  • template_mol (Chem.Mol) – The template molecule with correct bond orders.

  • mol (Residue | Chem.Mol) – The residue or molecule to assign bond orders to.

Returns:

  • Chem.Mol – The molecule with assigned bond orders from the template.

  • .. versionchanged:: 2.2.0 – Renamed from assign_bond_orders_from_smiles and now takes an RDKit Mol object instead of a SMILES string.

prolif.io.template_engine.assign_intra_props(mol: Mol, residue_name: str, block: Block) Mol[source]#

Assign bonds and aromaticity based on residue and atom names from a gemmi CIF block.

Parameters:
  • mol (rdkit.Chem.Mol) – The input molecule, this is modified in place and returned.

  • residue_name (str) – The name of the residue being processed.

  • block (gemmi.cif.Block) – The gemmi CIF block from which to assign bonds.

Returns:

The modified molecule with assigned bonds and aromaticity.

Return type:

rdkit.Chem.Mol

Note

This function is adapted from the pdbinf/_pdbinf.py module’s assign_intra_props function, which is used to assign bonds and aromaticity based on the standard amino acid templates.

Source: OpenFreeEnergy/pdbinf

Changed in version 2.2.0: Now takes a gemmi.cif.Block instead of a custom dict, and added support for triple bonds (TRIP).

prolif.io.template_engine.strip_bonds(m: Mol) Mol[source]#

Strip all bonds and chiral tags from a molecule.

Parameters:

m (rdkit.Chem.Mol) – The input molecule to strip bonds from.

Returns:

The modified molecule with all bonds and chiral tags removed.

Return type:

rdkit.Chem.Mol

Note

This function is adapted from the pdbinf/_pdbinf.py module’s strip_bonds function.

Source: OpenFreeEnergy/pdbinf

prolif.io.cif.cif_template_reader(cif_filepath: pathlib.Path | str) Document[source]#

Reads a CIF file and returns a gemmi CIF Document.

New in version 2.2.0.

Parameters:

cif_filepath (Path | str) – The path to the CIF file to read.

Returns:

A gemmi CIF Document containing the parsed data blocks.

Return type:

gemmi.cif.Document