Ongoing

Genomic Variant Predictor

AI Scientist & Back-end Engineer · 2026 · Ongoing · 1 person · 3 min read

Deployed a production-grade genomic analysis tool using Evo2 to predict disease likelihood from DNA mutations.

Overview

Architected and built a full-stack genomic platform that leverages the Evo2 large language model to predict the pathogenicity of Single Nucleotide Variants (SNVs).

Problem

Traditional variant effect prediction lacks real-time accessibility and high-fidelity modeling. Genomic data is massive and requires specialized GPU orchestration to move from research scripts to a usable, low-latency web interface.

Constraints

High-performance H100 GPU orchestration required for model inference
Bridge the gap between Python-based AI research and a TypeScript web stack
No-cost infrastructure utilization via serverless GPU compute
Real-time integration with legacy genomic databases (UCSC/NCBI)

Approach

Adopted a decoupled architecture: a Python/FastAPI backend handles heavy GPU lifting on Modal, while a T3-based Next.js frontend manages complex state and genomic visualization. Used a 'proxy-fetch' strategy to stream genome sequences directly from UCSC APIs.

Key Decisions

Serverless GPU Deployment via Modal

Reasoning:

Enables on-demand access to H100 GPUs for the Evo2 model without the cost of idle persistent instances. Modal’s Pythonic interface allowed for seamless scaling of inference workers.

Alternatives considered:

Persistent EC2 G5 instances (Cost-prohibitive)
Lambda functions (Insufficient compute/VRAM)

Direct API Integration for Genome Browsing

Reasoning:

By utilizing the UCSC and NCBI ClinVar APIs for real-time data fetching, the system avoids the overhead of local genomic indexing while ensuring the most up-to-date reference data.

Alternatives considered:

Local genome indexing with BigWig/BAM files
Static ClinVar exports (Quickly becomes outdated)

Tech Stack

Evo2 LLM
Python
FastAPI
Modal (H100 Serverless)
Next.js
TypeScript
Tailwind CSS
Shadcn UI

Result & Impact

NVIDIA H100 GPU

Compute Power
Optimized via FastAPI on Modal

Inference Latency
Full hg38 Assembly Support

Data Scope

Successfully translated complex biological theory into a functional engineering tool. The system allows users to cross-reference AI-driven predictions against established clinical data (ClinVar) in a unified, responsive UI.

Learnings

Mapping biological coordinates to LLM tokenization requires extreme precision in the backend logic
Serverless GPU cold starts can be mitigated through strategic FastAPI warm-up patterns
Interpreting AI 'black-box' predictions becomes significantly easier when visualized alongside known clinical classifications

Additional Context

This project was a unique challenge in bridging AI research and production engineering. The core objective was to take the Evo2 Large Language Model and make it accessible to researchers without requiring them to manage their own local GPU environments.

The architecture relies on Modal to handle the heavy lifting. When a user inputs a mutation (e.g., in the BRCA1 gene), the TypeScript frontend communicates with a Python/FastAPI worker. This worker orchestrates the Evo2 model to predict whether the mutation is pathogenic or benign.

By integrating ClinVar data directly into the comparison view, the app provides a “ground truth” for users to verify AI predictions against documented clinical cases. This builds trust in the model’s outputs and demonstrates how modern LLMs can be applied to the most critical frontiers of personalized medicine.

All projects