Genomic Variant Predictor
Deployed a production-grade genomic analysis tool using Evo2 to predict disease likelihood from DNA mutations.
Overview
Architected and built a full-stack genomic platform that leverages the Evo2 large language model to predict the pathogenicity of Single Nucleotide Variants (SNVs).
Problem
Traditional variant effect prediction lacks real-time accessibility and high-fidelity modeling. Genomic data is massive and requires specialized GPU orchestration to move from research scripts to a usable, low-latency web interface.
Constraints
- High-performance H100 GPU orchestration required for model inference
- Bridge the gap between Python-based AI research and a TypeScript web stack
- No-cost infrastructure utilization via serverless GPU compute
- Real-time integration with legacy genomic databases (UCSC/NCBI)
Approach
Adopted a decoupled architecture: a Python/FastAPI backend handles heavy GPU lifting on Modal, while a T3-based Next.js frontend manages complex state and genomic visualization. Used a 'proxy-fetch' strategy to stream genome sequences directly from UCSC APIs.
Key Decisions
Serverless GPU Deployment via Modal
Enables on-demand access to H100 GPUs for the Evo2 model without the cost of idle persistent instances. Modal’s Pythonic interface allowed for seamless scaling of inference workers.
- Persistent EC2 G5 instances (Cost-prohibitive)
- Lambda functions (Insufficient compute/VRAM)
Direct API Integration for Genome Browsing
By utilizing the UCSC and NCBI ClinVar APIs for real-time data fetching, the system avoids the overhead of local genomic indexing while ensuring the most up-to-date reference data.
- Local genome indexing with BigWig/BAM files
- Static ClinVar exports (Quickly becomes outdated)
Tech Stack
- Evo2 LLM
- Python
- FastAPI
- Modal (H100 Serverless)
- Next.js
- TypeScript
- Tailwind CSS
- Shadcn UI
Result & Impact
- NVIDIA H100 GPUCompute Power
- Optimized via FastAPI on ModalInference Latency
- Full hg38 Assembly SupportData Scope
Successfully translated complex biological theory into a functional engineering tool. The system allows users to cross-reference AI-driven predictions against established clinical data (ClinVar) in a unified, responsive UI.
Learnings
- Mapping biological coordinates to LLM tokenization requires extreme precision in the backend logic
- Serverless GPU cold starts can be mitigated through strategic FastAPI warm-up patterns
- Interpreting AI 'black-box' predictions becomes significantly easier when visualized alongside known clinical classifications
Additional Context
This project was a unique challenge in bridging AI research and production engineering. The core objective was to take the Evo2 Large Language Model and make it accessible to researchers without requiring them to manage their own local GPU environments.
The architecture relies on Modal to handle the heavy lifting. When a user inputs a mutation (e.g., in the BRCA1 gene), the TypeScript frontend communicates with a Python/FastAPI worker. This worker orchestrates the Evo2 model to predict whether the mutation is pathogenic or benign.
By integrating ClinVar data directly into the comparison view, the app provides a “ground truth” for users to verify AI predictions against documented clinical cases. This builds trust in the model’s outputs and demonstrates how modern LLMs can be applied to the most critical frontiers of personalized medicine.