Helper functions#
Helper functions — prolif.utils#
- prolif.utils.get_residues_near_ligand(lig: BaseRDKitMol, prot: BaseRDKitMol, cutoff: float = 6.0, use_segid: bool = False) list[prolif.residue.ResidueId][source]#
Detects residues close to a reference ligand
- Parameters:
lig (prolif.molecule.Molecule or prolif.residue.Residue) – Select residues that are near this ligand/residue
prot (prolif.molecule.Molecule) – Protein containing the residues
cutoff (float) – If any interatomic distance between the ligand reference points and a residue is below or equal to this cutoff, the residue will be selected
use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain.
- Returns:
residues (list) – A list of unique
ResidueIdthat are close to the ligand.. versionchanged:: 2.1.0 – Added use_segid.
- prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, selections: str, **kwargs: Any) AtomGroup[source]#
- prolif.utils.select_over_trajectory(u: Universe, trajectory: Trajectory, *selections: str, **kwargs: Any) list[MDAnalysis.core.groups.AtomGroup]
Returns AtomGroup(s) that satisfy each distance-based
selectionover the entire trajectory rather than only the first frame. Only useful for distance-based selections. Usegroup {0}in any additional selection to refer to the AtomGroup generated by the first selection, orgroup {N-1}for the Nth selection.
- prolif.utils.to_bitvectors(df: DataFrame) list[rdkit.DataStructs.cDataStructs.ExplicitBitVect][source]#
Converts an interaction DataFrame to a list of RDKit ExplicitBitVector
- Parameters:
df (pandas.DataFrame) – A DataFrame where each column corresponds to an interaction between two residues
- Returns:
bv – A list of
ExplicitBitVectfor each frame- Return type:
Example
>>> from rdkit.DataStructs import TanimotoSimilarity >>> bv = prolif.to_bitvectors(df) >>> TanimotoSimilarity(bv[0], bv[1]) 0.42
- prolif.utils.to_dataframe(ifp: IFPResults, interactions: Collection[str], count: bool = False, dtype: type | None = None, drop_empty: bool = True, index_col: str = 'Frame') DataFrame[source]#
Converts IFPs to a pandas DataFrame
- Parameters:
ifp (dict) – A dict in the format
{<frame number>: {(<residue_id>, <residue_id>): <interactions>}}.<interactions>is either anumpy.ndarraybitvector, or a tuple of dict in the format{<interaction name>: <metadata dict>}.interactions (list) – A list of interactions, in the same order as used to detect the interactions.
count (bool) – Whether to output a count fingerprint or not.
dtype (object or None) – Cast the dataframe values to this type. If
None, usesnp.uint8ifcount=True, elsebool.drop_empty (bool) – Drop columns with only empty values
index_col (str) – Name of the index column in the DataFrame
- Returns:
df – A 3-levels DataFrame where each ligand residue, protein residue, and interaction type are in separate columns
- Return type:
Example
>>> df = prolif.to_dataframe(results, fp.interactions.keys(), dtype=int) >>> print(df) ligand LIG1.G protein ILE59 ILE55 TYR93 interaction Hydrophobic HBAcceptor Hydrophobic Hydrophobic PiStacking Frame 0 0 1 0 0 0 ...
Changed in version 0.3.2: Moved the
return_atomsparameter from therunmethods to the dataframe conversion codeChanged in version 2.0.0: Removed the
return_atomsparameter. Added thecountparameter. Removed support forifpcontainingnp.ndarraybitvectors.
Molecule standardizer — prolif.io.molecule_standardizer#
This module provides the MoleculeStandardizer class for standardizing
residue names and fixing bond orders in molecular structures.
Yu-Yuan (Stuart) Yang, 2025
- class prolif.io.molecule_standardizer.MoleculeStandardizer(templates: collections.abc.Sequence[gemmi.cif.Document | tuple[str, rdkit.Chem.rdchem.Mol]] | None = None)[source]#
MoleculeStandardizerstandardizes residue names and fixes bond orders when reading molecules with RDKit.New in version 2.2.0.
- Parameters:
templates (Sequence[gemmi.cif.Document | tuple[str, Chem.Mol]] | None, optional) –
The templates to use for standardizing the molecule. Each element can be:
A
gemmi.cif.Document(e.g. fromcif_template_reader()), where each block in the document provides a CIF-based template.A
(residue_name, rdkit.Chem.Mol)tuple for an RDKit-molecule-based template.
If
None, only the standard amino acid templates are used. User-supplied templates take priority over the built-in ones (first match wins). Default isNone.
- engines#
Maps each residue name to its corresponding template engine.
- Type:
Note
This class works with ProLIF Molecule instances or PDB files. It reads the input topology, standardizes the residue names, and fixes the bond orders based on the provided templates or the standard amino acid template.
Example
>>> import prolif as plf >>> from prolif.io import MoleculeStandardizer, cif_template_reader >>> from rdkit import Chem >>> standardizer = MoleculeStandardizer( ... templates=[ ... cif_template_reader("path/to/ligand.cif"), ... ("EDO", Chem.MolFromSmiles("OCCO")), ... ] ... ) >>> mol = standardizer("path/to/protein.pdb") >>> plf.display_residues(mol)
- static convert_to_standard_resname(resname: str, forcefield_name: str = 'unknown') str[source]#
Convert a residue name to its standard form based on the forcefield.
- Parameters:
- Returns:
The standard residue name.
- Return type:
Note
This conversion is designed to distinguish residues with different possible H-bond donors at side chains, instead of the actual protonated states of residues.
For example, neutral and protonated arginine are both assigned to ARG, while neutral and deprotonated arginine in GROMOS force field are assigned to ARGN and ARG, respectively.
Template engines — prolif.io.template_engine#
This module defines the TemplateEngine protocol and its concrete
implementations for fixing bond orders on residues.
New in version 2.2.0.
- class prolif.io.template_engine.CIFTemplateEngine(name: str, block: Block)[source]#
Template engine using a gemmi CIF block.
- Parameters:
name (str) – The residue name this template matches.
block (gemmi.cif.Block) – The gemmi CIF block containing the bond and atom data.
- class prolif.io.template_engine.RDKitMolTemplateEngine(name: str, mol: Mol)[source]#
Template engine using an RDKit Mol object.
- Parameters:
name (str) – The residue name this template matches.
mol (Chem.Mol) – The RDKit molecule to use as a template.
- class prolif.io.template_engine.TemplateEngine(*args, **kwargs)[source]#
Protocol for template engines that fix bond orders on a residue.
- prolif.io.template_engine.assign_bond_orders_from_template(template_mol: Mol, mol: prolif.residue.Residue | rdkit.Chem.rdchem.Mol) Mol[source]#
Assign bond orders from an RDKit template molecule to a residue.
- Parameters:
template_mol (Chem.Mol) – The template molecule with correct bond orders.
mol (Residue | Chem.Mol) – The residue or molecule to assign bond orders to.
- Returns:
Chem.Mol – The molecule with assigned bond orders from the template.
.. versionchanged:: 2.2.0 – Renamed from
assign_bond_orders_from_smilesand now takes an RDKitMolobject instead of a SMILES string.
- prolif.io.template_engine.assign_intra_props(mol: Mol, residue_name: str, block: Block) Mol[source]#
Assign bonds and aromaticity based on residue and atom names from a gemmi CIF block.
- Parameters:
mol (rdkit.Chem.Mol) – The input molecule, this is modified in place and returned.
residue_name (str) – The name of the residue being processed.
block (gemmi.cif.Block) – The gemmi CIF block from which to assign bonds.
- Returns:
The modified molecule with assigned bonds and aromaticity.
- Return type:
rdkit.Chem.Mol
Note
This function is adapted from the pdbinf/_pdbinf.py module’s assign_intra_props function, which is used to assign bonds and aromaticity based on the standard amino acid templates.
Source: OpenFreeEnergy/pdbinf
Changed in version 2.2.0: Now takes a
gemmi.cif.Blockinstead of a custom dict, and added support for triple bonds (TRIP).
- prolif.io.template_engine.strip_bonds(m: Mol) Mol[source]#
Strip all bonds and chiral tags from a molecule.
- Parameters:
m (rdkit.Chem.Mol) – The input molecule to strip bonds from.
- Returns:
The modified molecule with all bonds and chiral tags removed.
- Return type:
rdkit.Chem.Mol
Note
This function is adapted from the pdbinf/_pdbinf.py module’s strip_bonds function.
Source: OpenFreeEnergy/pdbinf
- prolif.io.cif.cif_template_reader(cif_filepath: pathlib.Path | str) Document[source]#
Reads a CIF file and returns a gemmi CIF Document.
New in version 2.2.0.
- Parameters:
cif_filepath (Path | str) – The path to the CIF file to read.
- Returns:
A gemmi CIF Document containing the parsed data blocks.
- Return type:
gemmi.cif.Document