`biology-biopython`¶

Pack: AutoResearchClaw skills

Category: drafting

Field: —

License: MIT

Updated: 2026-04-23

Stages: paper-drafting

Read FASTA: for rec in SeqIO.parse("file.fasta", "fasta"): ...
Read GenBank: for rec in SeqIO.parse("file.gb", "genbank"): ...
Read single record: rec = SeqIO.read("file.fasta", "fasta")
Write sequences: SeqIO.write(records, "output.fasta", "fasta")
Convert formats: SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")
Index large files: idx = SeqIO.index("large.fasta", "fasta") for random access

Online BLAST: from Bio.Blast import NCBIWWW; result = NCBIWWW.qblast("blastn", "nt", seq)
Parse results: from Bio.Blast import NCBIXML; records = NCBIXML.parse(result)
Local BLAST: run via subprocess, parse XML output with NCBIXML
Always set Entrez.email before any NCBI access
Filter results by e-value (typically < 1e-5) and coverage

Always set email: Entrez.email = "your@email.com"
Search: handle = Entrez.esearch(db="pubmed", term="query")
Fetch records: handle = Entrez.efetch(db="nucleotide", id="ID", rettype="fasta")
Use API key for higher rate limits (10 req/s vs 3 req/s)
Respect NCBI rate limits; add delays between batch requests

Parse PDB: parser = PDBParser(); structure = parser.get_structure("id", "file.pdb")
Hierarchy: Structure > Model > Chain > Residue > Atom
Get atoms: iterate through structure.get_atoms()
Calculate distances: use atom coordinate vectors
For mmCIF files: use MMCIFParser() instead of PDBParser()

biology-biopython¶