Molecule Values¶
Molecule objects behave as value-style molecule values. Transformation
methods return new molecule objects and leave the original object unchanged.
Internally COSMolKit uses copy-on-write (COW) storage to share unchanged data
efficiently. This is intentionally different from common RDKit Python
workflows, where code often mutates an existing molecule or RWMol directly.
from cosmolkit import Molecule
mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()
assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())
Do not write code that assumes mol.with_hydrogens() changes mol. Keep
the returned value and pass that value to later operations.
Common transformations include:
with_hydrogens()without_hydrogens()with_kekulized_bonds()with_2d_coordinates()
In-Place Operations¶
Performance-sensitive code can opt into explicit in-place mutation. Every
public Molecule in-place method ends with _; the trailing underscore has
no other Molecule API meaning.
mol = Molecule.from_smiles("CCO")
mol.add_hydrogens_()
mol.compute_2d_coordinates_()
Common in-place operations include:
add_hydrogens_()remove_hydrogens_()kekulize_()sanitize_()compute_2d_coordinates_()
If an in-place method returns an error, the molecule is not guaranteed to equal its pre-call value. Use the value-style method when failure-preserving behavior is required.
SMILES Output¶
to_smiles() returns a SMILES string:
mol = Molecule.from_smiles("F[C@H](Cl)Br")
print(mol.to_smiles())
print(mol.to_smiles(isomeric_smiles=False))
SMILES writer options are available on both single molecules and batches:
benzene = Molecule.from_smiles("c1ccccc1")
ethanol = Molecule.from_smiles("CCO")
print(benzene.to_smiles(kekule=True))
print(ethanol.to_smiles(all_bonds_explicit=True))
print(ethanol.to_smiles(canonical=False, rooted_at_atom=2))
Explicit Editing¶
Use Molecule.edit() when you want to stage changes and commit them as one
new molecule:
editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
edited = editor.commit()
Depictions¶
Molecules with 2D coordinates can be exported as SVG or PNG:
mol = Molecule.from_smiles("c1ccccc1O").with_2d_coordinates()
svg = mol.to_svg(width=400, height=300)
mol.write_svg("python/examples/output/phenol.svg", width=400, height=300)
mol.write_png("python/examples/output/phenol.png", width=400, height=300)
Stereo¶
COSMolKit keeps the atom-level CW/CCW chiral tag path available. This is the closest representation to the explicit chiral information carried by SMILES or RDKit atoms:
from cosmolkit import ChiralTag, Molecule
mol = Molecule.from_smiles("F[C@H](Cl)Br")
for atom in mol.atoms():
if atom.chiral_tag() != ChiralTag.CHI_UNSPECIFIED:
print(atom.idx(), atom.chiral_tag().name)
print(mol.find_chiral_centers(include_unassigned=False))
Atom and bond enum-valued fields return Python IntEnum members, so callers
can compare or match against ChiralTag, BondOrder, BondDirection,
and BondStereo instead of spelling chemistry states as strings. Read-only
maps such as BOND_ORDER_MAP and CHIRAL_TAG_MAP are available when a
string name from an external source needs to be converted to the enum member.
When code needs COSMolKit’s ordered-ligand tetrahedral representation, use
tetrahedral_stereo() as a separate view derived from those chiral tags:
mol = Molecule.from_smiles("F[C@H](Cl)Br")
print(mol.tetrahedral_stereo())
Conformer Generation And Force-Field Optimization¶
Conformer generation APIs create native 3D conformers through the source-ported distance-geometry path. The default value-style operation uses ETKDGv3 and returns a new molecule value.
from cosmolkit import EmbedParameters, Molecule
mol = Molecule.from_smiles("CC(=O)NC").with_hydrogens()
params = EmbedParameters.etkdg_v3()
params.random_seed = 0xF00D
params.num_threads = 1
params.track_failures = True
embedded = mol.with_3d_conformer(params)
print(embedded.num_conformers())
print(embedded.coordinates_3d())
print(params.failures)
For multi-conformer generation, explicit seeds are deterministic. RMS pruning, sequential seed expansion, and terminal-group symmetrization for pruning follow the source-ported RDKit path.
params = EmbedParameters.etkdg()
params.random_seed = 123
params.num_threads = 1
params.prune_rms_thresh = 0.5
params.enable_sequential_random_seeds = True
pruned = mol.with_3d_conformers(5, params)
print(pruned.num_conformers())
UFF and MMFF optimization APIs operate on existing or generated 3D conformers and return new molecule values through result objects. They do not mutate the source molecule.
from cosmolkit import Molecule
mol = Molecule.from_smiles("CCO").with_hydrogens().with_3d_conformer()
if mol.has_uff_params():
result = mol.with_uff_optimized(max_iters=200)
optimized = result.molecule()
print(not result.needs_more())
print(result.status_code())
print(result.energy())
print(optimized.coordinates_3d())
if mol.has_mmff_params():
result = mol.with_mmff_optimized(mmff_variant="MMFF94", max_iters=200)
optimized = result.molecule()
print(not result.needs_more())
print(result.status_code())
Substructure And SMARTS¶
Substructure matching functions accept molecule queries:
import cosmolkit
mol = Molecule.from_smiles("CCO")
query = Molecule.from_smiles("CO")
print(cosmolkit.has_substruct_match(mol, query))
print(cosmolkit.get_substruct_match(mol, query).atom_mapping())
parse_smarts() exposes the Rust SMARTS parser as parse metadata. It returns
a SmartsMolecule query-tree value. Direct SMARTS query matching is not yet
a Python API; Python substructure functions currently accept molecule queries.
smarts = cosmolkit.parse_smarts("[#6]-O")
print(smarts.num_atoms())
print(smarts.num_bonds())