API Reference

class cosmolkit.Atom

Read-only atom feature record returned by Molecule.atoms().

The methods on this object expose common atom properties such as atomic number, formal charge, aromaticity, chiral tag, hydrogen counts, and valence values.

class cosmolkit.BatchError

A per-record batch processing error.

Batch methods can keep invalid records when errors="keep" is used. In that case, MoleculeBatch.errors() returns BatchError objects describing the input index, operation, and message.

as_dict()

Return the error as key-value pairs.

index()

Return the zero-based input index that produced the error.

message()

Return the human-readable error message.

operation()

Return the operation name.

class cosmolkit.BatchErrorMode(value)
class cosmolkit.BatchExportReport

Summary returned by batch export methods.

The report records how many inputs were processed successfully and includes structured errors for records that could not be exported.

errors()

Return structured errors for failed records.

failed()

Return the number of records that failed during export.

success()

Return the number of records exported successfully.

total()

Return the total number of records considered for export.

exception cosmolkit.BatchValidationError(message, error_count=0, reason=None, record_errors=None)
class cosmolkit.Bond

Read-only bond feature record returned by Molecule.bonds().

The methods on this object expose atom endpoints, bond type, direction, stereo labels, stereo atom indices, and aromaticity.

class cosmolkit.BondDirection(value)
class cosmolkit.BondOrder(value)
class cosmolkit.BondStereo(value)
class cosmolkit.ChiralTag(value)
class cosmolkit.Molecule

A molecule value.

Molecule stores atoms, bonds, stereochemistry, and optional coordinate data. Transformation methods such as with_hydrogens(), without_hydrogens(), with_kekulized_bonds(), and with_2d_coordinates() return new molecule values. The original molecule is left unchanged.

Internally COSMolKit uses copy-on-write storage to share unchanged molecular data efficiently, but the public Python contract is value semantics.

In-place methods mutate the receiver and always end with _. COSMolKit reserves the trailing underscore for this single public Molecule meaning.

Examples

Create molecules with Molecule.from_smiles(), transform them with value methods such as with_2d_coordinates(), then export strings, arrays, or depiction files.

add_hydrogens_()

Add explicit hydrogens in place.

This is the in-place version of with_hydrogens().

All public in-place Molecule methods end with _. If this method returns an error, the receiver is not guaranteed to equal its pre-call value; use with_hydrogens() when failure-preserving value semantics are required.

atoms()

Return read-only atom feature records.

avalon_fingerprint(min_path=1, max_path=7, n_bits=2048, n_bits_per_hash=1, use_bond_order=True, use_hs=False, tautomeric_fingerprint=False, from_atoms=None)

Return an Avalon fingerprint.

bonds()

Return read-only bond feature records.

compute_2d_coordinates_()

Compute 2D coordinates in place.

This is the in-place version of with_2d_coordinates().

coordinates_2d()

Return 2D coordinates as a NumPy array with shape (num_atoms, 3).

The z column is zero-filled.

coordinates_3d(conformer_index=0)

Return 3D coordinates as a NumPy array with shape (num_atoms, 3).

dg_bounds_matrix()

Return the distance-geometry bounds matrix as a NumPy array.

The returned array uses shape (num_atoms, num_atoms).

edit()

Create an explicit edit context for this molecule.

The edit context is useful when several changes should be staged and committed as one new molecule value.

embed_3d_conformer_(params=None)

Generate one 3D conformer in place.

This is the in-place version of with_3d_conformer().

embed_3d_conformer_result_(params=None)

Generate one 3D conformer in place and return the embedding result object.

This is the in-place version of with_3d_conformer_result().

embed_3d_conformers_(num_confs, params=None)

Generate multiple 3D conformers in place.

This is the in-place version of with_3d_conformers().

embed_3d_conformers_result_(num_confs, params=None)

Generate multiple 3D conformers in place and return the embedding result object.

This is the in-place version of with_3d_conformers_result().

find_chiral_centers(include_unassigned=True)

Return chiral center labels.

Parameters:

include_unassigned (bool, default True) – Include atoms with unspecified tetrahedral chirality.

fingerprint_morgan(radius=2, n_bits=2048, include_chirality=False, use_bond_types=True, count_simulation=False, count_bounds=None, only_nonzero_invariants=False, include_redundant_environments=False, from_atoms=None, ignore_atoms=None, custom_atom_invariants=None, custom_bond_invariants=None, atom_invariants_generator=None, atom_invariants_include_ring_membership=True, bond_invariants_generator=None, bond_invariants_use_bond_types=True, bond_invariants_use_chirality=False, num_bits_per_feature=1)

Return an RDKit-style Morgan fingerprint.

Parameters:
  • radius (int, default 2) – Morgan neighborhood radius.

  • n_bits (int, default 2048) – Output bit vector size.

  • include_chirality (bool, default False) – Include atom chirality in invariant updates.

  • use_bond_types (bool, default True) – Include bond order in invariant updates.

  • count_simulation (bool, default False) – Apply RDKit count-simulation bit expansion.

  • count_bounds (list[int], optional) – Count-simulation thresholds. Defaults to [1, 2, 4, 8].

  • only_nonzero_invariants (bool, default False) – Skip atoms whose starting invariant is zero.

  • include_redundant_environments (bool, default False) – Retain duplicate environments instead of applying RDKit redundancy checks.

  • from_atoms (list[int], optional) – Restrict environments to these root atoms.

  • ignore_atoms (list[int], optional) – Accepted for RDKit API parity; Morgan currently ignores this input.

  • custom_atom_invariants (list[int], optional) – Per-atom starting invariants.

  • custom_bond_invariants (list[int], optional) – Per-bond invariants.

  • atom_invariants_generator ({"connectivity", "morgan", "feature", "fcfp"}, optional) – Explicit atom invariant generator. None uses the Morgan connectivity default.

  • atom_invariants_include_ring_membership (bool, default True) – Include ring membership for the connectivity invariant generator.

  • bond_invariants_generator ({"morgan", "default", "bond"}, optional) – Explicit Morgan bond invariant generator. None uses the fingerprint defaults.

  • bond_invariants_use_bond_types (bool, default True) – Include bond order in the explicit bond invariant generator.

  • bond_invariants_use_chirality (bool, default False) – Include bond stereo in the explicit bond invariant generator.

  • num_bits_per_feature (int, default 1) – Number of bits set for each feature.

fingerprint_morgan_with_output(radius=2, n_bits=2048, include_chirality=False, use_bond_types=True, count_simulation=False, count_bounds=None, only_nonzero_invariants=False, include_redundant_environments=False, from_atoms=None, ignore_atoms=None, custom_atom_invariants=None, custom_bond_invariants=None, atom_invariants_generator=None, atom_invariants_include_ring_membership=True, bond_invariants_generator=None, bond_invariants_use_bond_types=True, bond_invariants_use_chirality=False, num_bits_per_feature=1)

Return a Morgan fingerprint together with allocated RDKit-style additional output.

fragments()

Return the connected fragments as separate molecules.

classmethod from_mmcif_block(text, *, sanitize=True, remove_hs=True, flavor=0, proximity_bonding=True)

Create a molecule from an mmCIF block.

This uses COSMolKit’s mmCIF structure reader, then applies the same RDKit-compatible molecule conversion profile used by Molecule.from_pdb_block. RDKit does not provide a direct Chem.MolFromMMCIFBlock oracle; this API is a COSMolKit mmCIF structural reader layered into the RDKit-compatible PDB molecule conversion state.

Parameters:
  • text (str) – mmCIF block text.

  • sanitize (bool) – Whether to sanitize after molecule construction.

  • remove_hs (bool) – Whether sanitization should remove hydrogens.

  • flavor (int) – RDKit-compatible PDB parser flavor bit mask applied during molecule conversion.

  • proximity_bonding (bool) – Whether to add proximity bonds using RDKit’s PDB proximity-bond algorithm.

Returns:

Parsed molecule.

Return type:

Molecule

classmethod from_pdb_block(text, *, sanitize=True, remove_hs=True, flavor=0, proximity_bonding=True)

Create a molecule from a PDB block.

This follows the COSMolKit core PDB molecule conversion profile, which is designed to match RDKit Chem.MolFromPDBBlock for modeled molecule state. Structural parsing is handled by COSMolKit’s structure reader before molecule conversion.

Parameters:
  • text (str) – PDB block text.

  • sanitize (bool) – Whether to sanitize after PDB molecule construction.

  • remove_hs (bool) – Whether sanitization should remove hydrogens.

  • flavor (int) – RDKit-compatible PDB parser flavor bit mask.

  • proximity_bonding (bool) – Whether to add proximity bonds using RDKit’s PDB proximity-bond algorithm.

Returns:

Parsed molecule.

Return type:

Molecule

classmethod from_rdkit(rdmol, sanitize=None)

Create a molecule from an RDKit molecule object.

Parameters:
  • rdmol (object) – An object exposing RDKit’s molecule API.

  • sanitize (bool, optional) – Optional molecule preparation flag.

Returns:

COSMolKit molecule copied from the input object.

Return type:

Molecule

classmethod from_smiles(smiles, sanitize=None)

Create a molecule from a SMILES string.

Parameters:
  • smiles (str) – Input SMILES string.

  • sanitize (bool, optional) – Optional molecule preparation flag. COSMolKit applies the available preparation behavior during construction.

Returns:

Parsed molecule.

Return type:

Molecule

Examples

Use Molecule.from_smiles("CCO") to create a molecule and mol.to_smiles() to write it back.

classmethod from_xyz_block(text)

Create a molecule from an XYZ block.

XYZ contains atom identities and Cartesian coordinates only. This follows COSMolKit core’s RDKit-aligned MolFromXYZBlock behavior: atoms and one 3D conformer are parsed, and bonds are not inferred.

The returned molecule is coordinate-only. Topology-dependent operations such as adding hydrogens or ETKDG conformer generation require a trusted bond graph.

Parameters:

text (str) – XYZ block text.

Returns:

Parsed molecule with zero bonds and a 3D conformer when the atom count is nonzero.

Return type:

Molecule

has_2d_coordinates()

Return whether the molecule has 2D coordinates.

has_mmff_params()

Return whether MMFF94 parameters are available for this molecule.

has_uff_params()

Return whether UFF parameters are available for every atom in this molecule.

hash()

Return a hash of the molecule.

hash_with_ranks(ranks)

Return a hash of the molecule using the provided atom ranks.

kekulize_(clear_aromatic_flags=None)

Convert aromatic bonds to an explicit Kekule form in place.

This is the in-place version of with_kekulized_bonds().

largest_fragment()

Return the largest connected fragment.

maccs_fingerprint(n_bits=166)

Return a MACCS fingerprint.

classmethod mol_from_binary(data)

Deserialize a molecule from COSMolKit binary data.

mol_to_binary()

Serialize the molecule to COSMolKit binary form.

murcko_scaffold()

Return the Murcko scaffold.

net_scaffold()

Return the net scaffold.

num_atoms()

Return the number of atoms.

num_bonds()

Return the number of bonds.

num_conformers()

Return the number of stored 3D conformers.

perceive_stereochemistry()

Perceive stereochemistry and validate stereo processing for this molecule.

classmethod read_mol(path, sanitize=None, coordinate_dim='auto')

Read one molecule from an MDL molfile.

The parser follows RDKit MolFromMolBlock record boundaries: it reads the molfile CTAB through the first M  END line and ignores unread trailing text, including SDF data fields and $$$$ record separators. Use read_sdf() or SdfDataset when SDF data fields must be parsed.

Parameters:
  • path (str) – Molfile path.

  • sanitize (bool, optional) – Optional molecule preparation flag.

  • coordinate_dim ({"auto", "2d", "3d"}, optional) – Coordinate interpretation mode. "auto" preserves the molfile header.

classmethod read_mol2(path, *, sanitize=True, remove_hs=True, variant='corina', cleanup_substructures=True)

Read one molecule from a Tripos MOL2 file.

The reader follows the source-ported RDKit Mol2FileToMol/MolFromMol2File profile. The exposed parameters map to RDKit Mol2ParserParams: sanitize, removeHs, variant, and cleanupSubstructures. The only currently supported variant is "corina", matching RDKit’s public enum.

Parameters:
  • path (str) – MOL2 file path.

  • sanitize (bool, optional) – Run RDKit-style MOL2 sanitization after parsing.

  • remove_hs (bool, optional) – Remove explicit hydrogens during MOL2 finalization.

  • variant ({"corina"}, optional) – MOL2 atom-type definition profile.

  • cleanup_substructures (bool, optional) – Run RDKit-style cleanup of common MOL2 substructures before charge assignment when formal charges are not present.

classmethod read_mol2_from_str(mol2_text, *, sanitize=True, remove_hs=True, variant='corina', cleanup_substructures=True)

Read one molecule from a Tripos MOL2 string.

The reader follows the source-ported RDKit Mol2BlockToMol/MolFromMol2Block profile. The exposed parameters map to RDKit Mol2ParserParams: sanitize, removeHs, variant, and cleanupSubstructures. The only currently supported variant is "corina", matching RDKit’s public enum.

classmethod read_mol_from_str(mol_text, sanitize=None, coordinate_dim='auto')

Read one molecule from an MDL molfile string.

The parser follows RDKit MolFromMolBlock record boundaries: it reads the molfile CTAB through the first M  END line and ignores unread trailing text, including SDF data fields and $$$$ record separators. Use read_sdf_from_str() when SDF data fields must be parsed.

classmethod read_sdf(path, sanitize=None, coordinate_dim='auto')

Read the first molecule record from an SDF file.

This uses the SDF reader, so SDF data fields after the molfile M  END line are parsed as record metadata. Use read_mol() for RDKit MolFromMolBlock-style molfile-only parsing.

Parameters:
  • path (str) – SDF file path.

  • sanitize (bool, optional) – Optional molecule preparation flag.

  • coordinate_dim ({"auto", "2d", "3d"}, optional) – Coordinate interpretation mode. "auto" preserves the molfile header.

classmethod read_sdf_from_str(sdf_text, sanitize=None, coordinate_dim='auto')

Read one molecule from an SDF record string.

This uses the SDF reader, so data fields after the molfile M  END line are parsed as SDF record metadata. Use read_mol_from_str() for RDKit MolFromMolBlock-style molfile-only parsing that ignores trailing SDF text.

remove_hydrogens_(sanitize=None)

Remove explicit hydrogens in place.

This is the in-place version of without_hydrogens().

tetrahedral_stereo()

Return ordered tetrahedral stereo ligand records.

Each record is (center_atom_index, ordered_ligands). Implicit hydrogen is represented as None.

to_2d_sdf_string(format=None, include_stereo=True, kekulize=True)

Return the molecule as a 2D SDF record string.

If the molecule does not already have 2D coordinates, they are generated for this export. The original Molecule value is left unchanged.

to_3d_sdf_string(format=None, include_stereo=True, kekulize=True)

Return the molecule as a 3D SDF record string.

The molecule must already have a 3D conformer, for example from a 3D SDF record.

to_pdb_block(conf_id=Ellipsis, flavor=0)

Return a PDB block string.

to_png(width=300, height=300)

Render the molecule to PNG bytes.

to_smiles(isomeric_smiles=True, canonical=True, kekule=False, clean_stereo=True, all_bonds_explicit=False, all_hs_explicit=False, include_dative_bonds=True, ignore_atom_map_numbers=False, rooted_at_atom=None)

Return a SMILES string.

Parameters:
  • isomeric_smiles (bool, default True) – Include stereochemical and isotopic information when available.

  • canonical (bool, default True) – Return a canonical SMILES when supported.

  • kekule (bool, default False) – Write aromatic systems using Kekule bond notation.

  • clean_stereo (bool, default True) – Normalize stereo annotations before writing.

  • all_bonds_explicit (bool, default False) – Write explicit bond symbols.

  • all_hs_explicit (bool, default False) – Write explicit hydrogens.

  • include_dative_bonds (bool, default True) – Include dative bond notation.

  • ignore_atom_map_numbers (bool, default False) – Omit atom map numbers from canonical decisions.

  • rooted_at_atom (int, optional) – Start traversal from a selected atom index.

to_svg(width=300, height=300)

Render the molecule to an SVG string.

topological_fingerprint(min_path=1, max_path=7, n_bits=2048, n_bits_per_hash=2, use_bond_types=True, from_atoms=None, ignore_atoms=None)

Return a topological fingerprint.

with_2d_coordinates()

Return a new molecule with 2D coordinates.

with_3d_conformer(params=None)

Return a new molecule with one generated 3D conformer.

Parameters:

params (EmbedParameters, optional) – Distance-geometry embedding parameters. The default is EmbedParameters.etkdg_v3().

Returns:

A new molecule value containing one additional 3D conformer.

Return type:

Molecule

with_3d_conformer_result(params=None)

Return an embedding result object for one generated 3D conformer.

The result keeps the embedded molecule, the returned conformer id, and the final parameter snapshot so callers can inspect status and failure counters without relying on side effects on the input EmbedParameters object.

with_3d_conformers(num_confs, params=None)

Return a new molecule with multiple generated 3D conformers.

Parameters:
  • num_confs (int) – Number of conformers to request.

  • params (EmbedParameters, optional) – Distance-geometry embedding parameters.

Returns:

A new molecule value containing the generated 3D conformers.

Return type:

Molecule

with_3d_conformers_result(num_confs, params=None)

Return an embedding result object for multiple generated 3D conformers.

The result keeps the embedded molecule, the kept conformer ids, and the final parameter snapshot so callers can inspect pruning and tracked failures without reconstructing that state manually.

with_hydrogens()

Return a new molecule with explicit hydrogens added.

The original Molecule value is left unchanged.

with_kekulized_bonds(clear_aromatic_flags=None)

Return a new molecule with aromatic bonds converted to an explicit Kekule form.

The original Molecule value is left unchanged.

with_mmff_optimized(mmff_variant='MMFF94', max_iters=200, non_bonded_thresh=100.0, conf_id=Ellipsis, ignore_interfrag_interactions=True)

Return an MMFF optimization result with a new optimized molecule value.

The source molecule is not mutated. The molecule must already contain a 3D conformer. Supported variants follow the Rust core parser, including "MMFF94" and "MMFF94S".

with_mmff_optimized_confs(num_threads=1, max_iters=1000, mmff_variant='MMFF94', non_bonded_thresh=10.0, ignore_interfrag_interactions=True)

Return MMFF optimization results for all 3D conformers as a new molecule value.

with_uff_optimized(max_iters=1000, vdw_thresh=10.0, conf_id=Ellipsis, ignore_interfrag_interactions=True)

Return a UFF optimization result with a new optimized molecule value.

The source molecule is not mutated. The molecule must already contain a 3D conformer, for example from a 3D SDF, MOL, MOL2, or XYZ input.

with_uff_optimized_confs(num_threads=1, max_iters=1000, vdw_thresh=10.0, ignore_interfrag_interactions=True)

Return UFF optimization results for all 3D conformers as a new molecule value.

without_hydrogens(sanitize=None)

Return a new molecule with explicit hydrogens removed.

The original Molecule value is left unchanged.

write_png(path, width=300, height=300)

Write a PNG depiction to a file.

write_sdf(path, format=None, include_stereo=True, kekulize=True)

Write the molecule as one SDF record.

write_sdf_to_directory(directory, file_name=None, format=None, include_stereo=True, kekulize=True)

Write the molecule as one SDF record inside a directory.

Returns:

The output path.

Return type:

str

write_svg(path, width=300, height=300)

Write an SVG depiction to a file.

class cosmolkit.MoleculeBatch

An ordered collection of molecules for batch workflows.

MoleculeBatch keeps input order and supports construction, transformation, filtering, rendering, and SDF export across many molecules. Methods that transform molecules return a new batch.

Parameters such as errors control invalid-record handling:

  • "raise" raises an exception when any record fails.

  • "keep" keeps failed records and exposes them through errors(). Export methods write valid records and count invalid records as skipped in the returned report.

Examples

Construct a batch with MoleculeBatch.from_smiles_list(), choose an errors mode for invalid records, and use with_parallel_jobs() when the same worker count should apply to later batch operations.

dg_bounds_matrix_list(n_jobs=None, progress_bar=None)

Return distance-geometry bounds matrices for all valid records.

errors()

Return structured errors collected for invalid records.

filter_valid()

Return a batch containing only valid molecules.

fingerprint_morgan_list(radius=2, n_bits=2048, include_chirality=False, use_bond_types=True, count_simulation=False, count_bounds=None, only_nonzero_invariants=False, include_redundant_environments=False, from_atoms=None, ignore_atoms=None, custom_atom_invariants=None, custom_bond_invariants=None, atom_invariants_generator=None, atom_invariants_include_ring_membership=True, bond_invariants_generator=None, bond_invariants_use_bond_types=True, bond_invariants_use_chirality=False, num_bits_per_feature=1, n_jobs=None, progress_bar=None)

Return Morgan fingerprints for valid batch records.

Invalid records are returned as None in their original positions.

fingerprint_morgan_with_output_list(radius=2, n_bits=2048, include_chirality=False, use_bond_types=True, count_simulation=False, count_bounds=None, only_nonzero_invariants=False, include_redundant_environments=False, from_atoms=None, ignore_atoms=None, custom_atom_invariants=None, custom_bond_invariants=None, atom_invariants_generator=None, atom_invariants_include_ring_membership=True, bond_invariants_generator=None, bond_invariants_use_bond_types=True, bond_invariants_use_chirality=False, num_bits_per_feature=1, n_jobs=None, progress_bar=None)

Return Morgan fingerprints and additional output for valid batch records.

Invalid records are returned as None in their original positions.

classmethod from_smiles_list(smiles, sanitize=None, errors=None, n_jobs=None)

Create a batch from a list of SMILES strings.

Parameters:
  • smiles (list[str]) – Input SMILES strings.

  • sanitize (bool, optional) – Optional molecule preparation flag. COSMolKit applies the available preparation behavior during construction.

  • errors ({"raise", "keep"}, optional) – Invalid-record handling mode. The default is "raise".

  • n_jobs (int, optional) – Number of worker threads to use. None uses the default scheduler.

Returns:

A batch preserving the input order for valid and kept records.

Return type:

MoleculeBatch

invalid_count()

Return the number of invalid records.

invalid_mask()

Return a boolean mask indicating which records are invalid.

parallel_jobs()

Return the batch-level default worker count, or None when unset.

progress_bar()

Return the batch-level progress-bar default, or None when unset.

classmethod read_sdf(path, errors=None, n_jobs=None, progress_bar=False, coordinate_dim='auto')

Read all molecule records from an SDF file into a batch.

Parameters:
  • path (str) – SDF file path.

  • errors ({"raise", "keep"}, optional) – Invalid-record handling mode. The default is "raise".

  • n_jobs (int, optional) – Number of worker threads to use for batch construction.

  • progress_bar (bool, optional) – Show a Rust-side progress bar while records are parsed. This builds a lightweight record index first so the total is known.

  • coordinate_dim ({"auto", "2d", "3d"}, optional) – Coordinate interpretation mode. "auto" preserves the molfile header.

classmethod read_sdf_records_from_str(sdf_text, errors=None, n_jobs=None, coordinate_dim='auto')

Read all molecule records from an SDF string.

Parameters:
  • sdf_text (str) – SDF text containing one or more records.

  • errors ({"raise", "keep"}, optional) – Invalid-record handling mode. The default is "raise".

  • n_jobs (int, optional) – Number of worker threads to use.

  • coordinate_dim ({"auto", "2d", "3d"}, optional) – Coordinate interpretation mode. "auto" preserves the molfile header.

sanitize(strict=None, errors=None, n_jobs=None, progress_bar=None)

Return a sanitized batch.

Parameters:
  • strict (bool, optional) – Optional strictness flag for available validation steps.

  • errors ({"raise", "keep"}, optional) – Invalid-record handling mode.

  • n_jobs (int, optional) – Number of worker threads to use.

to_images(out_dir, format=None, size=None, n_jobs=None, errors=None, report_path=None, filenames=None, progress_bar=None)

Write molecule depictions to a directory.

Parameters:
  • out_dir (str) – Output directory.

  • format ({"png", "svg"}, optional) – Image format. The default is "png".

  • size (tuple[int, int], optional) – Output image size as (width, height).

  • n_jobs (int, optional) – Number of worker threads to use.

  • errors ({"raise", "keep"}, optional) – Export error handling mode.

  • report_path (str, optional) – Write a JSON or CSV error report.

  • filenames (list[str | None], optional) – Per-record output filenames. Names are relative to out_dir; missing extensions are filled from format.

Returns:

Export summary.

Return type:

BatchExportReport

to_list()

Return batch records as a Python list.

Valid records become Molecule objects and invalid records become None.

to_sdf(path, format=None, errors=None, n_jobs=None, report_path=None, progress_bar=None)

Write valid molecules to an SDF file.

Parameters:
  • path (str) – Output SDF path.

  • format ({"auto", "v2000", "v3000"}, optional) – SDF output format.

  • errors ({"raise", "keep"}, optional) – Export error handling mode.

  • n_jobs (int, optional) – Number of worker threads to use.

  • report_path (str, optional) – Write a JSON or CSV error report.

to_sdf_files(out_dir, format=None, errors=None, n_jobs=None, report_path=None, filenames=None, progress_bar=None)

Write each valid molecule to its own SDF file in a directory.

Parameters:
  • out_dir (str) – Output directory.

  • format ({"auto", "v2000", "v3000"}, optional) – SDF output format.

  • errors ({"raise", "keep"}, optional) – Export error handling mode.

  • n_jobs (int, optional) – Number of worker threads to use.

  • report_path (str, optional) – Write a JSON or CSV error report.

  • filenames (list[str | None], optional) – Per-record output filenames. Names are relative to out_dir; missing extensions are written as .sdf.

to_smiles_list(isomeric_smiles=True, canonical=True, kekule=False, clean_stereo=True, all_bonds_explicit=False, all_hs_explicit=False, include_dative_bonds=True, ignore_atom_map_numbers=False, rooted_at_atom=None, n_jobs=None, progress_bar=None)

Return one SMILES string per record.

Invalid records are returned as None when they are kept in the batch.

Parameters:
  • isomeric_smiles (bool, default True) – Include stereochemical and isotopic information when available.

  • canonical (bool, default True) – Return canonical SMILES when enabled.

  • kekule (bool, default False) – Write aromatic systems in Kekule form.

  • clean_stereo (bool, default True) – Normalize stereo output where possible.

  • all_bonds_explicit (bool, default False) – Write explicit bond symbols.

  • all_hs_explicit (bool, default False) – Write explicit hydrogens.

  • include_dative_bonds (bool, default True) – Include dative bond notation.

  • ignore_atom_map_numbers (bool, default False) – Omit atom map numbers from canonical decisions.

  • rooted_at_atom (int, optional) – Start traversal from a selected atom index.

  • n_jobs (int, optional) – Number of worker threads to use.

to_svg_list(width=300, height=300, n_jobs=None, progress_bar=None)

Render each valid molecule to an SVG string.

valid_count()

Return the number of valid records.

valid_mask()

Return a boolean mask indicating which records are valid.

with_2d_coordinates(errors=None, n_jobs=None, progress_bar=None)

Return a new batch with 2D coordinates computed for each valid molecule.

with_hydrogens(errors=None, n_jobs=None, progress_bar=None)

Return a new batch with explicit hydrogens added to each valid molecule.

with_kekulized_bonds(clear_aromatic_flags=None, errors=None, n_jobs=None, progress_bar=None)

Return a new batch with aromatic bonds converted to an explicit Kekule form.

with_parallel_jobs(n_jobs)

Return a new batch configured to use this worker count by default.

Pass None to clear the batch-level default and let rayon decide. Method-level n_jobs arguments still override this setting for that one call.

with_progress_bar(progress_bar)

Return a new batch configured to show Rust-side progress bars by default.

Pass None to clear the batch-level default. Method-level progress_bar arguments still override this setting for that one call.

without_hydrogens(errors=None, n_jobs=None, progress_bar=None)

Return a new batch with explicit hydrogens removed from each valid molecule.

class cosmolkit.MoleculeEdit

An explicit molecule editing context.

Use Molecule.edit() to create an editor, apply changes, and call commit() to receive a new Molecule.

Examples

Create an editor with mol.edit(), apply atom and bond changes, then call commit() to produce a new Molecule.

add_atom(element)

Add an atom by element symbol and return its atom index.

add_bond(begin, end, order)

Add a bond between two atom indices.

Parameters:
  • begin (int) – Begin atom index.

  • end (int) – End atom index.

  • order ({"single", "double", "triple", "aromatic", "dative", "unspecified"}) – Bond order.

commit(sanitize=None)

Commit staged edits and return a new molecule.

set_atom_charge(atom_index, charge)

Set an atom formal charge.

class cosmolkit.ResidueCode(value)
class cosmolkit.ResidueInfo

Gemmi-derived tabulated residue information.

Use ResidueInfo.code() and ResidueInfo.kind() for enum matching instead of matching raw residue-name strings.

code()

Return the tabulated residue code as ResidueCode.

kind()

Return the Gemmi residue-info kind as ResidueInfoKind.

kind_name()

Return the Gemmi residue-info kind name.

name()

Return the tabulated residue name.

class cosmolkit.ResidueInfoKind(value)
class cosmolkit.SdfDataset

Indexed, seekable SDF dataset.

SdfDataset builds a lightweight in-memory index of record byte ranges first. After opening, len(dataset) is cheap, dataset[i] parses only that record, dataset[:n] returns a MoleculeBatch, and dataset.batches(size=...) yields bounded MoleculeBatch chunks.

Use MoleculeBatch.read_sdf() when you intentionally want the whole file in memory. Use SdfDataset for large seekable files where random access, metadata inspection, or chunked processing matter.

class cosmolkit.SdfReader

Forward-only SDF reader for one-pass workflows.

Use SdfReader for non-indexed stream-style processing. For seekable files where random access or accurate record-count progress matters, prefer SdfDataset.

class cosmolkit.SdfRecord

One parsed SDF record returned by SdfDataset.

The record exposes the parsed molecule plus SDF data fields.

class cosmolkit.SdfRecordMetadata

Lightweight metadata for one indexed SDF record.

Metadata is available from SdfDataset without parsing the molecule graph.

cosmolkit.expand_one_letter(code, kind)

Expand a one-letter amino-acid, RNA, or DNA residue code using Gemmi’s table.

cosmolkit.expand_one_letter_sequence(seq, kind)

Expand a one-letter amino-acid, RNA, or DNA residue sequence using Gemmi’s table.

cosmolkit.expand_protein_one_letter(code)

Expand a deprecated Gemmi protein one-letter residue code alias.

cosmolkit.expand_protein_one_letter_string(seq)

Expand a deprecated Gemmi protein one-letter residue sequence alias.

cosmolkit.find_tabulated_residue(name)

Return Gemmi-derived tabulated residue information for a residue name.

cosmolkit.find_tabulated_residue_idx(name)

Return the Gemmi tabulated residue index for a residue name.

cosmolkit.get_residue_info(idx)

Return Gemmi-derived tabulated residue information by table index.

cosmolkit.mmff_has_all_molecule_params(mol)

Return whether MMFF94 parameters are available for a molecule.

cosmolkit.mmff_optimize_molecule(mol, mmff_variant='MMFF94', max_iters=200, non_bonded_thresh=100.0, conf_id=Ellipsis, ignore_interfrag_interactions=True)

Optimize one existing 3D conformer with MMFF and return a result object.

The input molecule is not mutated. Supported variants include "MMFF94" and "MMFF94S".

cosmolkit.mmff_optimize_molecule_confs(mol, num_threads=1, max_iters=1000, mmff_variant='MMFF94', non_bonded_thresh=10.0, ignore_interfrag_interactions=True)

Optimize all existing 3D conformers with MMFF and return a result object.

The input molecule is not mutated. Supported variants include "MMFF94" and "MMFF94S".

cosmolkit.mol_from_binary(data)

Deserialize a molecule from COSMolKit binary bytes.

cosmolkit.mol_to_binary(mol)

Serialize a molecule to COSMolKit binary bytes.

Use mol_to_binary() / mol_from_binary() or the matching Molecule methods when you need an exact COSMolKit round-trip format instead of text IO.

cosmolkit.parse_smarts(smarts)

Parse SMARTS text into a SmartsMolecule query-tree value.

This exposes SMARTS parse metadata in Python. Direct SMARTS query matching is not yet a Python API.

cosmolkit.residue_code_from_name(name)

Return the Gemmi tabulated residue code for a residue name.

cosmolkit.uff_has_all_molecule_params(mol)

Return whether UFF parameters are available for every atom in a molecule.

cosmolkit.uff_optimize_molecule(mol, max_iters=1000, vdw_thresh=10.0, conf_id=Ellipsis, ignore_interfrag_interactions=True)

Optimize one existing 3D conformer with UFF and return a result object.

The input molecule is not mutated.

cosmolkit.uff_optimize_molecule_confs(mol, num_threads=1, max_iters=1000, vdw_thresh=10.0, ignore_interfrag_interactions=True)

Optimize all existing 3D conformers with UFF and return a result object.

The input molecule is not mutated.

Protein API

class cosmolkit.Protein

A protein-focused structural value.

Protein is the default high-level protein API. It keeps amino-acid residues and excludes ligands, nucleic acids, and waters by default.

Use Protein.from_pdb() for PDB files, Protein.from_pdb_str() for PDB text, Protein.from_mmcif() for mmCIF files, and Protein.from_mmcif_str() for mmCIF text.

atoms()

Return all protein atoms as ProteinAtom views.

chains()

Return the protein chains as ProteinChain views.

classmethod from_mmcif(path)

Read an mmCIF file as a protein-focused structural value.

The result uses the same protein projection as Protein.from_pdb().

classmethod from_mmcif_str(text, path)

Read mmCIF text as a protein-focused structural value.

path is used for format context and diagnostic messages.

classmethod from_pdb(path)

Read a PDB file as a protein-focused structural value.

The returned Protein keeps amino-acid residues and exposes chain, residue, and atom traversal. Use Molecule.from_pdb_block() instead when the desired result is a RDKit-compatible molecule conversion.

classmethod from_pdb_str(text)

Read PDB text as a protein-focused structural value.

This is the in-memory counterpart to Protein.from_pdb().

num_atoms()

Return the number of protein atoms.

num_chains()

Return the number of protein chains.

num_models()

Return the number of coordinate models in the protein structure.

num_residues()

Return the number of protein residues.

residues()

Return all protein residues as ProteinResidue views.

class cosmolkit.ProteinChain
atoms()

Return atoms belonging to this chain.

index()

Return the zero-based chain index.

kind()

Return the chain kind, for example Protein.

residues()

Return residues belonging to this chain.

class cosmolkit.ProteinResidue
atoms()

Return atoms belonging to this residue.

code()

Return the Gemmi tabulated residue code as ResidueCode.

fasta_code()

Return Gemmi’s FASTA code for this residue.

index()

Return the zero-based residue index.

info()

Return the Gemmi-derived tabulated residue information.

is_standard()

Return whether Gemmi marks this residue as standard.

kind()

Return the residue kind.

name()

Return the residue name, for example ALA.

one_letter_code()

Return Gemmi’s one-letter code for this residue.

class cosmolkit.ProteinAtom
element()

Return the atomic number as a string.

index()

Return the zero-based atom index.

name()

Return the atom name, for example CA.

position()

Return (x, y, z) coordinates, or None when absent.