Ongoing

Genomic Variant Predictor

AI Scientist & Back-end Engineer · 2026 · Ongoing · 1 person · 3 min read

Deployed a production-grade genomic analysis tool using Evo2 to predict disease likelihood from DNA mutations.

Overview

Architected and built a full-stack genomic platform that leverages the Evo2 large language model to predict the pathogenicity of Single Nucleotide Variants (SNVs).

Problem

Traditional variant effect prediction lacks real-time accessibility and high-fidelity modeling. Genomic data is massive and requires specialized GPU orchestration to move from research scripts to a usable, low-latency web interface.

Constraints

  • High-performance H100 GPU orchestration required for model inference
  • Bridge the gap between Python-based AI research and a TypeScript web stack
  • No-cost infrastructure utilization via serverless GPU compute
  • Real-time integration with legacy genomic databases (UCSC/NCBI)

Approach

Adopted a decoupled architecture: a Python/FastAPI backend handles heavy GPU lifting on Modal, while a T3-based Next.js frontend manages complex state and genomic visualization. Used a 'proxy-fetch' strategy to stream genome sequences directly from UCSC APIs.

Key Decisions

Serverless GPU Deployment via Modal

Reasoning:

Enables on-demand access to H100 GPUs for the Evo2 model without the cost of idle persistent instances. Modal’s Pythonic interface allowed for seamless scaling of inference workers.

Alternatives considered:
  • Persistent EC2 G5 instances (Cost-prohibitive)
  • Lambda functions (Insufficient compute/VRAM)

Direct API Integration for Genome Browsing

Reasoning:

By utilizing the UCSC and NCBI ClinVar APIs for real-time data fetching, the system avoids the overhead of local genomic indexing while ensuring the most up-to-date reference data.

Alternatives considered:
  • Local genome indexing with BigWig/BAM files
  • Static ClinVar exports (Quickly becomes outdated)

Tech Stack

  • Evo2 LLM
  • Python
  • FastAPI
  • Modal (H100 Serverless)
  • Next.js
  • TypeScript
  • Tailwind CSS
  • Shadcn UI

Result & Impact

  • NVIDIA H100 GPU
    Compute Power
  • Optimized via FastAPI on Modal
    Inference Latency
  • Full hg38 Assembly Support
    Data Scope

Successfully translated complex biological theory into a functional engineering tool. The system allows users to cross-reference AI-driven predictions against established clinical data (ClinVar) in a unified, responsive UI.

Learnings

  • Mapping biological coordinates to LLM tokenization requires extreme precision in the backend logic
  • Serverless GPU cold starts can be mitigated through strategic FastAPI warm-up patterns
  • Interpreting AI 'black-box' predictions becomes significantly easier when visualized alongside known clinical classifications

Additional Context

This project was a unique challenge in bridging AI research and production engineering. The core objective was to take the Evo2 Large Language Model and make it accessible to researchers without requiring them to manage their own local GPU environments.

The architecture relies on Modal to handle the heavy lifting. When a user inputs a mutation (e.g., in the BRCA1 gene), the TypeScript frontend communicates with a Python/FastAPI worker. This worker orchestrates the Evo2 model to predict whether the mutation is pathogenic or benign.

By integrating ClinVar data directly into the comparison view, the app provides a “ground truth” for users to verify AI predictions against documented clinical cases. This builds trust in the model’s outputs and demonstrates how modern LLMs can be applied to the most critical frontiers of personalized medicine.