Rosalind: Rust genomics engine runs reproducible variant calling on a laptop
Original source
Rosalind: A genomics toolkit in Rust running whole-genome pipelines on a laptop
Hacker News →Rosalind is a single-threaded Rust library and CLI that performs read alignment and variant calling with memory bounded by local read coverage rather than input file size, making whole-genome-style pipelines feasible on commodity hardware. It builds an FM-index over a single reference contig, aligns reads with exact-match seeding and banded affine-gap refinement, then streams a pileup to call germline SNVs or somatic SNVs and simple indels from tumor/normal pairs using a binomial log-likelihood-ratio model. Outputs in BAM and VCF are emitted in canonical order and engineered to be byte-for-byte identical across repeated runs with identical inputs.
The toolkit deliberately scopes itself to small-to-moderate references, targeted regions, and per-sample streaming workloads. Extensibility comes through a GenomicPlugin trait for custom per-block analyses on the same bounded-memory evaluator, plus PyO3 bindings so the engine can be driven from Python alongside pandas or NumPy. Truth-set evaluation against a VCF with left-align/trim normalization is built in, and a determinism test suite enforces stable outputs and FM-index invariants.
The target audiences are edge and field sequencing where predictable memory matters more than throughput, reproducibility-sensitive pipelines that need auditable outputs, and educators or builders who want a readable, hackable Rust implementation of FM-index alignment and streaming variant calling rather than a black-box pipeline. The project is dual-licensed under Apache-2.0 and MIT.
Read the full article
Continue reading at Hacker News →This is an AI-generated summary. Read the original for the full story.