Molecules#

Reading RDKit molecules — `prolif.rdkitmol`#

class prolif.rdkitmol.BaseRDKitMol[source]#

Bases: Mol

Base molecular class that behaves like an RDKit Mol with extra attributes (see below). The sole purpose of this class is to define the common API between the Molecule and Residue classes. This class should not be instantiated by users.

Parameters:: mol (rdkit.Chem.rdchem.Mol) – A molecule (protein, ligand, or residue) with a single conformer

centroid#

XYZ coordinates of the centroid of the molecule

Type:: rdkit.Geometry.rdGeometry.Point3D

xyz#

XYZ coordinates of all atoms in the molecule

Type:: numpy.ndarray

Reading proteins and ligands — `prolif.molecule`#

class prolif.molecule.Molecule(mol: Mol, *, use_segid: bool = False, residues: list[prolif.residue.Residue] | None = None)[source]#

Bases: BaseRDKitMol

Main molecule class that behaves like an RDKit Mol with extra attributes (see examples below). The main purpose of this class is to access residues as fragments of the molecule.

Parameters:

mol (rdkit.Chem.rdchem.Mol) – A ligand or protein with a single conformer
use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain.

residues#

A dictionnary storing one/many Residue indexed by ResidueId. The residue list is sorted.

Type:: prolif.residue.ResidueGroup

n_residues#

Number of residues

Type:: int

Examples

In [1]: import MDAnalysis as mda

In [2]: import prolif

In [3]: u = mda.Universe(prolif.datafiles.TOP, prolif.datafiles.TRAJ)

In [4]: mol = u.select_atoms("protein").convert_to("RDKIT")

In [5]: mol = prolif.Molecule(mol)

In [6]: mol
Out[6]: <prolif.molecule.Molecule with 302 residues and 4988 atoms at 0x7a592e7b1da0>

You can also create a Molecule directly from a Universe:

In [7]: mol = prolif.Molecule.from_mda(u, "protein")

In [8]: mol
Out[8]: <prolif.molecule.Molecule with 302 residues and 4988 atoms at 0x7a59314a6020>

Notes

Residues can be accessed easily in different ways:

In [9]: mol["TYR38.A"] # by resid string (residue name + number + chain)
Out[9]: <prolif.residue.Residue TYR38.A at 0x7a592e7f5210>

In [10]: mol[42] # by index (from 0 to n_residues-1)
Out[10]: <prolif.residue.Residue LEU80.A at 0x7a592e7cd760>

In [11]: mol[prolif.ResidueId("TYR", 38, "A")] # by ResidueId
Out[11]: <prolif.residue.Residue TYR38.A at 0x7a592e7f5210>

See prolif.residue for more information on residues

Changed in version 2.1.0: Added use_segid.

Changed in version 2.2.0: Added residues to bypass split_mol_by_residues().

classmethod from_mda(obj: MDAObject, selection: str | None = None, *, use_segid: bool | None = None, **kwargs: Any) → Molecule[source]#

Creates a Molecule from an MDAnalysis object

Parameters:

obj (MDAnalysis.core.universe.Universe or MDAnalysis.core.groups.AtomGroup) – The MDAnalysis object to convert
selection (None or str) – Apply a selection to obj to create an AtomGroup. Uses all atoms in obj if selection=None
use_segid (bool | None, default = None) – Use the segment number rather than the chain identifier as a chain.
**kwargs (object) – Other arguments passed to the RDKitConverter of MDAnalysis

Example

In [1]: mol = prolif.Molecule.from_mda(u, "protein")

In [2]: mol
Out[2]: <prolif.molecule.Molecule with 302 residues and 4988 atoms at 0x7a592f1fb290>

Which is equivalent to:

In [3]: protein = u.select_atoms("protein")

In [4]: mol = prolif.Molecule.from_mda(protein)

In [5]: mol
Out[5]: <prolif.molecule.Molecule with 302 residues and 4988 atoms at 0x7a592e6b75b0>

Since MDAnalysis v2.10.0, it is possible to directly control how bond orders and charges are inferred from the topology using the inferrer parameter:

>>> from MDAnalysis.converters.RDKitInferring import TemplateInferrer
>>> from rdkit import Chem
>>> ligand_template = Chem.MolFromSmiles("NC(c2nc(=Cc1ccc([O-])cc1)c(=O)n2CC=O)C(C)O")
>>> ligand_inferrer = TemplateInferrer(ligand_template)
>>> mol = prolif.Molecule.from_mda(u, "resname LIG", inferrer=ligand_inferrer)

Changed in version 2.1.0: Added use_segid.

classmethod from_rdkit(mol: Mol, resname: str = 'UNL', resnumber: int = 1, chain: str = '', use_segid: bool = False) → Molecule[source]#

Creates a Molecule from an RDKit molecule

While directly instantiating a molecule with prolif.Molecule(mol) would also work, this method insures that every atom is linked to an AtomPDBResidueInfo which is required by ProLIF

Parameters:

mol (rdkit.Chem.rdchem.Mol) – The input RDKit molecule
resname (str) – The default residue name that is used if none was found
resnumber (int) – The default residue number that is used if none was found
chain (str) – The default chain Id that is used if none was found
use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain

Notes

This method only checks for an existing AtomPDBResidueInfo in the first atom. If none was found, it will patch all atoms with the one created from the method’s arguments (resname, resnumber, chain).

Changed in version 2.1.0: Added use_segid.

class prolif.molecule.mol2_supplier(path: Union[str, Path], cleanup_substructures: bool = True, sanitize: bool = True, **kwargs: Any)[source]#

Bases: Sequence[Molecule]

Supplies molecules, given a path to a MOL2 file

Parameters:

path (str) – A path to the .mol2 file
sanitize (bool) – Whether to sanitize each molecule or not.
cleanup_substructures (bool) – Toggles standardizing some substructures found in mol2 files, based on atom types.
resname (str) – Residue name for every ligand
resnumber (int) – Residue number for every ligand
chain (str) – Chain ID for every ligand

Returns:

suppl – A sequence that provides Molecule objects

Return type:

Sequence

Example

The supplier is typically used like this:

>>> lig_suppl = mol2_supplier("docking/output.mol2")
>>> for lig in lig_suppl:
...     # do something with each ligand

Changed in version 1.0.0: Molecule suppliers are now sequences that can be reused, indexed, and can return their length, instead of single-use generators.

Changed in version 2.1.0: Added cleanup_substructures and sanitize parameters (default to True, same behavior as before).

class prolif.molecule.pdbqt_supplier(paths: Iterable[Union[str, Path]], template: Mol, converter_kwargs: dict | None = None, **kwargs: Any)[source]#

Bases: Sequence[Molecule]

Supplies molecules, given paths to PDBQT files

Parameters:

paths (list) – A list (or any iterable) of PDBQT files
template (rdkit.Chem.rdchem.Mol) – A template molecule with the correct bond orders and charges. It must match exactly the molecule inside the PDBQT file.
converter_kwargs (dict | None) – Keyword arguments passed to the RDKitConverter of MDAnalysis
resname (str) – Residue name for every ligand
resnumber (int) – Residue number for every ligand
chain (str) – Chain ID for every ligand

Returns:

suppl – A sequence that provides Molecule objects

Return type:

Sequence

Example

The supplier is typically used like this:

>>> import glob
>>> pdbqts = glob.glob("docking/ligand1/*.pdbqt")
>>> lig_suppl = pdbqt_supplier(pdbqts, template)
>>> for lig in lig_suppl:
...     # do something with each ligand

Changed in version 1.0.0: Molecule suppliers are now sequences that can be reused, indexed, and can return their length, instead of single-use generators.

Changed in version 1.1.0: Because the PDBQT supplier needs to strip hydrogen atoms before assigning bond orders from the template, it used to replace them entirely with hydrogens containing new coordinates. It now directly uses the hydrogen atoms present in the file and won’t add explicit ones anymore, to prevent the fingerprint from detecting hydrogen bonds with “random” hydrogen atoms. A lot of irrelevant warnings and logs have been disabled as well.

class prolif.molecule.sdf_supplier(path: str, sanitize: bool = True, **kwargs: Any)[source]#

Bases: Sequence[Molecule]

Supplies molecules, given a path to an SDFile

Parameters:

path (str) – A path to the .sdf file
sanitize (bool) – Whether to sanitize each molecule or not.
resname (str) – Residue name for every ligand
resnumber (int) – Residue number for every ligand
chain (str) – Chain ID for every ligand

Returns:

suppl – A sequence that provides Molecule objects. Can be indexed

Return type:

Sequence

Example

The supplier is typically used like this:

>>> lig_suppl = sdf_supplier("docking/output.sdf")
>>> for lig in lig_suppl:
...     # do something with each ligand

Changed in version 1.0.0: Molecule suppliers are now sequences that can be reused, indexed, and can return their length, instead of single-use generators.

Changed in version 2.1.0: Added sanitize parameter (defaults to True, same behavior as before).

prolif.molecule.split_molecule(mol: Molecule, predicate: Callable[[ResidueId], bool]) → tuple[prolif.molecule.Molecule, prolif.molecule.Molecule][source]#

Splits a molecule into two based on a predicate function. The first molecule returned contains all residues of the input mol for which the predicate function was true, the second molecule contains the rest. This function is typically used to extract a ligand or water molecules from a file containing a solvated complex:

>>> solvated_system = Molecule(...)
>>> water_mol, protein_mol = split_molecule(
...     solvated_system, lambda x: x.name == "WAT"
... )

New in version 2.1.0.

Changed in version 2.2.0: The underlying residues of the input molecule are copied instead of being recalculated from each new child molecule, allowing to keep information from the MoleculeStandardizer class.

class prolif.residue.Residue(mol: Chem.Mol, *, use_segid: bool = False)[source]#

Bases: BaseRDKitMol

A class for residues as RDKit molecules

Parameters:

mol (rdkit.Chem.rdchem.Mol) – The residue as an RDKit molecule
use_segid (bool, default = False) – Use the segment number rather than the chain identifier as a chain

resid#

The residue identifier

Type:: prolif.residue.ResidueId

Notes

The name of the residue can be converted to a string by using str(Residue)

Changed in version 2.1.0: Added use_segid.

Molecules

Contents

Molecules#

Reading RDKit molecules — prolif.rdkitmol#

Reading proteins and ligands — prolif.molecule#

Reading RDKit molecules — `prolif.rdkitmol`#

Reading proteins and ligands — `prolif.molecule`#