Quick Start¶
Value-Style Molecule Values¶
COSMolKit molecules use value semantics. Transform methods return a new
Molecule and leave the original object unchanged. Internally the library
uses copy-on-write (COW) storage to share unchanged data efficiently:
from cosmolkit import BatchErrorMode, BondOrder, ChiralTag, Molecule
mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()
assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())
This is an intentional difference from common RDKit Python usage. Do not assume
that a transform mutates the existing object; always keep the returned
Molecule.
In-place molecule operations are explicit and always end with _:
mol = Molecule.from_smiles("CCO")
mol.add_hydrogens_()
mol.compute_2d_coordinates_()
Create a molecule from SMILES and export a depiction:
from cosmolkit import Molecule
mol = Molecule.from_smiles("c1ccccc1O")
drawn = mol.with_2d_coordinates()
print(mol.to_smiles())
drawn.write_png("python/examples/output/phenol.png", width=400, height=300)
Inspect atoms and bonds:
from cosmolkit import BondOrder, Molecule
mol = Molecule.from_smiles("c1ccccc1O")
for atom in mol.atoms():
print(atom.idx(), atom.atomic_num(), atom.is_aromatic())
for bond in mol.bonds():
if bond.bond_type() == BondOrder.SINGLE:
print(bond.begin_atom_idx(), bond.end_atom_idx(), bond.bond_type().name)
Inspect chiral tags without converting to an ordered tetrahedral record:
chiral = Molecule.from_smiles("F[C@H](Cl)Br")
print(chiral.to_smiles())
print(chiral.to_smiles(isomeric_smiles=False))
for atom in chiral.atoms():
if atom.chiral_tag() != ChiralTag.CHI_UNSPECIFIED:
print(atom.idx(), atom.chiral_tag().name)
Read and write the first SDF record:
mol = Molecule.read_sdf("input.sdf", coordinate_dim="auto")
mol.write_sdf("python/examples/output/output.sdf", format="v2000")
Read MOL2 with the RDKit-style parser profile:
mol = Molecule.read_mol2("input.mol2")
Access coordinates as NumPy arrays:
mol2d = Molecule.from_smiles("CCO").with_2d_coordinates()
coords = mol2d.coordinates_2d()
print(coords.shape)
Generate a native 3D conformer with ETKDGv3:
from cosmolkit import EmbedParameters, Molecule
mol = Molecule.from_smiles("CC(=O)NC").with_hydrogens()
params = EmbedParameters.etkdg_v3()
params.random_seed = 0xF00D
params.num_threads = 1
params.track_failures = True
embedded = mol.with_3d_conformer(params)
print(embedded.num_conformers())
print(embedded.coordinates_3d().shape)
print(params.failures)
with_3d_conformer() follows RDKit’s ETKDG behavior for trusted molecular
graphs. A molecule without explicit hydrogens is embedded as a heavy-atom-only
conformer; the operation does not fail and does not automatically add
hydrogens. Add hydrogens first when you need all-atom geometry, force-field
optimization, or hydrogen-bond-sensitive coordinates.
Generate multiple conformers with RMS pruning:
params = EmbedParameters.etkdg()
params.random_seed = 123
params.num_threads = 1
params.prune_rms_thresh = 0.5
params.enable_sequential_random_seeds = True
conformers = mol.with_3d_conformers(5, params)
print(conformers.num_conformers())
Optimize an existing 3D conformer with UFF:
mol = Molecule.read_sdf("input_3d.sdf", coordinate_dim="3d")
if mol.has_uff_params():
result = mol.with_uff_optimized(max_iters=200)
optimized = result.molecule()
print(not result.needs_more())
print(result.status_code())
print(result.energy())
print(optimized.coordinates_3d().shape)
if mol.has_mmff_params():
result = mol.with_mmff_optimized(mmff_variant="MMFF94", max_iters=200)
print(not result.needs_more())
print(result.status_code())
Generate a Morgan fingerprint:
fp = Molecule.from_smiles("c1ccccc1O").fingerprint_morgan(
radius=2,
n_bits=2048,
)
print(fp.n_bits())
print(fp.on_bits())
on_bits() returns the sparse bit indexes set inside the fixed-length binary
fingerprint. It is not a dense neural embedding.
Parse SMARTS metadata:
import cosmolkit
query = cosmolkit.parse_smarts("[#6]-O")
print(query.num_atoms())
print(query.num_bonds())
parse_smarts() returns a SmartsMolecule parse tree value. Direct SMARTS
query matching is not yet a Python API.
Process a list of molecules:
from cosmolkit import BatchErrorMode, MoleculeBatch
batch = MoleculeBatch.from_smiles_list(
["CCO", "c1ccccc1", "not-smiles"],
errors=BatchErrorMode.KEEP,
).with_parallel_jobs(8)
for error in batch.errors():
print(error.index(), error.operation(), error.message())
prepared = batch.with_2d_coordinates(errors=BatchErrorMode.KEEP)
fingerprints = prepared.fingerprint_morgan_list(n_bits=2048)
print(prepared.valid_mask())
print(prepared.to_smiles_list(canonical=True))
print([mol.to_smiles() if mol is not None else None for mol in prepared])
print([fp.on_bits() if fp is not None else None for fp in fingerprints])
MoleculeBatch preserves input order. When errors="keep" or
BatchErrorMode.KEEP is used, invalid records stay aligned with their input
positions and appear as None in molecule-valued outputs.