chemistry-rdkit¶
Stages:
paper-draftingRDKit Cheminformatics Best Practice¶
Molecular I/O¶
- Create molecules from SMILES:
mol = Chem.MolFromSmiles('CCO') - Always check for None:
MolFromSmilesreturns None on invalid input - Convert to canonical SMILES:
Chem.MolToSmiles(mol) - Read SDF files:
suppl = Chem.SDMolSupplier('file.sdf') - Read SMILES files:
suppl = Chem.SmilesMolSupplier('file.smi') - Write molecules:
writer = Chem.SDWriter('output.sdf')
Molecular Descriptors¶
- Molecular weight:
Descriptors.MolWt(mol) - LogP (lipophilicity):
Descriptors.MolLogP(mol) - TPSA (polar surface area):
Descriptors.TPSA(mol) - H-bond donors/acceptors:
Descriptors.NumHDonors(mol),Descriptors.NumHAcceptors(mol) - Rotatable bonds:
Descriptors.NumRotatableBonds(mol) - Lipinski Rule of 5: MW <= 500, LogP <= 5, HBD <= 5, HBA <= 10
Fingerprints and Similarity¶
- Morgan (circular) fingerprints:
AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=2048) - RDKit fingerprints:
Chem.RDKFingerprint(mol) - MACCS keys:
MACCSkeys.GenMACCSKeys(mol) - Tanimoto similarity:
DataStructs.TanimotoSimilarity(fp1, fp2) - Use radius=2 (ECFP4 equivalent) as default for most applications
- For virtual screening, Tanimoto > 0.7 suggests structural similarity
Substructure Search¶
- SMARTS patterns:
pattern = Chem.MolFromSmarts('[OH]') - Check match:
mol.HasSubstructMatch(pattern) - Get all matches:
mol.GetSubstructMatches(pattern) - Common SMARTS:
#6[OH](carboxylic acid),[NH2](primary amine) - Filter compound libraries by functional group presence
Property Calculation Patterns¶
- Batch processing: iterate over SDMolSupplier, skip None entries
- Use
Chem.Descriptors.descListfor all available descriptors - For ADMET filtering, calculate Lipinski, Veber, and PAINS filters
- Generate 3D coordinates:
AllChem.EmbedMolecule(mol, AllChem.ETKDG()) - Minimize energy:
AllChem.MMFFOptimizeMolecule(mol)
Common Pitfalls¶
- Always sanitize molecules (default behavior) — disable only when needed
- Add hydrogens explicitly for 3D work:
Chem.AddHs(mol) - Handle stereochemistry: use
Chem.AssignStereochemistry(mol) - Large SDF files: use
ForwardSDMolSupplierfor memory efficiency - Kekulization errors usually indicate invalid SMILES input