Protein Structures¶
Use Protein when a workflow starts from PDB or mmCIF structural data and
needs protein-chain, residue, and atom traversal.
Read a PDB file directly:
from cosmolkit import Protein, ResidueCode
protein = Protein.from_pdb("1crn.pdb")
print(protein.num_models())
print(protein.num_chains())
print(protein.num_residues())
print(protein.num_atoms())
Read PDB text that is already in memory:
protein = Protein.from_pdb_str(pdb_text)
Read mmCIF input with the same high-level protein projection:
protein = Protein.from_mmcif("1crn.cif")
protein = Protein.from_mmcif_str(cif_text, path="1crn.cif")
Protein keeps amino-acid residues and excludes ligands, nucleic acids, and
waters by default. Use it for protein-focused traversal rather than low-level
mixed structural tables.
Chains, Residues, And Atoms¶
Protein behaves like a chain collection. len(protein) returns the
number of protein chains, and protein[i] returns a ProteinChain.
first_chain = protein[0]
print(first_chain.index(), first_chain.kind(), len(first_chain))
for chain in protein.chains():
for residue in chain.residues():
if residue.code() == ResidueCode.MET:
print("methionine", residue.index(), residue.fasta_code())
print(residue.index(), residue.name(), residue.code(), len(residue))
for atom in residue.atoms():
print(atom.index(), atom.name(), atom.element(), atom.position())
atom.position() returns None when the atom has no Cartesian coordinate
in the selected structure data; otherwise it returns (x, y, z).
Residue Information¶
ProteinResidue.name() returns the raw residue name from the structure.
Use ProteinResidue.code() for enum matching against Gemmi’s tabulated
residue vocabulary, and ProteinResidue.info() when you need the
source-derived classification fields. Sequence expansion follows Gemmi’s
expand_one_letter and expand_one_letter_sequence residue tables.
from cosmolkit import (
ResidueCode,
ResidueInfoKind,
expand_one_letter_sequence,
find_tabulated_residue,
)
info = find_tabulated_residue("MSE")
assert info.code() == ResidueCode.MSE
assert info.kind() == ResidueInfoKind.AA
assert info.fasta_code() == "X"
assert expand_one_letter_sequence("ACD(MSE)", ResidueInfoKind.AA) == [
"ALA",
"CYS",
"ASP",
"MSE",
]
Protein vs Molecule PDB APIs¶
Use Protein.from_pdb() or Protein.from_pdb_str() when the desired
object is a protein structural view:
protein = Protein.from_pdb("input.pdb")
Use Molecule.from_pdb_block() only when the desired object is a
RDKit-compatible molecule conversion from PDB text:
from cosmolkit import Molecule
mol = Molecule.from_pdb_block(
pdb_text,
sanitize=True,
remove_hs=True,
proximity_bonding=True,
)
The molecule conversion path is useful for cheminformatics-style molecule
operations. The Protein path is the ergonomic path for protein chain,
residue, and atom access.