Fingerprints

COSMolKit exposes RDKit-style Morgan fingerprints as fixed-length bit vectors. The Python Fingerprint object is a sparse view over that binary vector: on_bits() returns the bit indexes whose value is 1. It is not a dense floating-point neural embedding.

Single Molecules

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1O")
fp = mol.fingerprint_morgan(radius=2, n_bits=2048)

print(fp.n_bits())
print(fp.on_bits())

Tanimoto similarity is computed directly on Fingerprint values:

phenol = Molecule.from_smiles("c1ccccc1O").fingerprint_morgan()
benzene = Molecule.from_smiles("c1ccccc1").fingerprint_morgan()

print(phenol.tanimoto(benzene))

Additional Output

fingerprint_morgan_with_output() returns a MorganFingerprintResult with the fingerprint and RDKit-style provenance data:

result = mol.fingerprint_morgan_with_output(radius=2, n_bits=2048)
output = result.additional_output()

print(result.fingerprint().on_bits())
print(output.atom_counts())
print(output.atom_to_bits())
print(output.bit_info_map())
print(output.atoms_per_bit())

Supported Parameters

The Python binding exposes the supported RDKit-style Morgan generator branches:

  • radius and n_bits

  • include_chirality and use_bond_types

  • count_simulation and count_bounds

  • only_nonzero_invariants

  • include_redundant_environments

  • from_atoms and ignore_atoms

  • custom_atom_invariants and custom_bond_invariants

  • atom_invariants_generator="connectivity" | "morgan" | "feature" | "fcfp"

  • atom_invariants_include_ring_membership

  • bond_invariants_generator="morgan" | "default" | "bond"

  • bond_invariants_use_bond_types

  • bond_invariants_use_chirality

  • num_bits_per_feature

Batch Fingerprints

MoleculeBatch exposes matching batch APIs. Invalid records kept with errors="keep" produce None in the corresponding output position.

from cosmolkit import MoleculeBatch

batch = MoleculeBatch.from_smiles_list(
    ["CCO", "not-smiles", "CCCO"],
    errors="keep",
).with_parallel_jobs(8)
fps = batch.fingerprint_morgan_list(n_bits=2048)

print([fp.on_bits() if fp is not None else None for fp in fps])