Serverless GPU Orchestration for Genomic LLMs

Bridging Biological Data and High-Performance Compute

Biofly transforms raw DNA sequences into clinical insights. The primary engineering hurdle was managing the massive computational requirements of genomic foundation models within a responsive web application.

1. The Decoupled Architecture

To ensure a smooth user experience, I separated the concerns of the application into two distinct layers:

The Orchestrator: A T3-stack Next.js application that handles user state, genomic visualizations, and cross-referencing legacy databases like ClinVar.
The Inference Engine: A FastAPI backend deployed on Modal that encapsulates the Evo2 weights. When a prediction is requested, Modal dynamically provisions an H100, loads the necessary genomic context, and returns the pathogenicity score.

2. High-Fidelity Sequence Streaming

Genomic data is too large to pass through traditional REST payloads efficiently. I implemented a “Proxy-Fetch” strategy:

The user provides a genomic coordinate (e.g., chr17:43044295).
The backend streams the relevant 7kb context window directly from UCSC Genome Browser APIs.
This ensures that the AI model always looks at the most accurate, up-to-date reference genome without Biofly needing to store terabytes of genomic BigWig files locally.

3. Mitigating Cold Starts

Serverless GPUs suffer from “cold starts” while the model weights load into VRAM. I mitigated this by implementing a Warm-up Pattern:

Small, frequent health checks keep a warm pool of workers during peak research hours.
Utilizing Modal’s optimized image layers to ensure the Evo2 environment (Python/PyTorch/Cuda) initializes in seconds rather than minutes.

Impact on Genomic Research

Democratized Access: Researchers can now run Evo2-level predictions from a browser on a standard laptop, bypassing the need for local Linux clusters.
Cost Efficiency: Infrastructure costs were reduced by over 90% compared to a persistent GPU instance model, as compute only triggers during active analysis.
Clinical Contextualization: By visualizing AI predictions alongside established ClinVar classifications, the tool provides a “sanity check” for researchers, identifying where AI aligns with—or challenges—known clinical literature.

Results

Biofly successfully demonstrates that the “SaaS-ification” of complex biological models is possible through serverless orchestration. The platform currently supports the full hg38 human assembly, providing real-time, LLM-driven variant effect predictions that were previously locked behind complex CLI research scripts.

Would you like me to create a “Related Decision” for the “Proxy-Fetch” strategy used to stream the genome sequences?

Context

Decision

Alternatives Considered

Persistent G5/P4 EC2 Instances

On-Premise GPU Workstation

Reasoning