Building production multi-agent systems and reasoning-traceable AI for science.
8+ years production ML & bioinformatics | PhD NTU Singapore | Singapore (PR)
LangGraph Β· Claude API Β· Multi-Agent Systems Β· AWS Β· Nextflow Β· Clinical Genomics
𧬠genome-ft β Fine-tuning a Genomic Foundation Model
New | PyTorch Β· Hugging Face Transformers Β· Nucleotide Transformer v2
Full weight-level fine-tuning of Nucleotide Transformer v2 50M (InstaDeep, a genomic foundation model) on two GenomicBenchmarks DNA classification tasks. All 53.8M parameters are updated, not LoRA or a frozen head. The emphasis is the evaluation protocol rather than the headline number: a leakage-free train/validation/test split, the test set scored exactly once, three random seeds with reported variance, and the base model measured under the identical pipeline.
Tech: PyTorch (Apple MPS), Hugging Face Transformers, AdamW with warmup + cosine decay, reverse-complement augmentation, validation-based checkpoint selection
git clone https://github.com/ankurgenomics/genome-ft
cd genome-ft && pip install -r requirements.txt
python prepare_data.py && python train.py # reproduces the fine-tune end-to-end- Enhancers (
human_enhancers_cohn, 3 seeds): accuracy 0.735, MCC 0.478 Β± 0.003 (base model at chance, MCC β0.009) - Promoters (
human_nontata_promoters): accuracy 0.872, MCC 0.747 - Validation MCC peaked at epoch 1β2 then declined, so a low learning rate and validation-based selection matter more than longer training
- π€ Model card: ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft
π©Ί Reviewer2 β Autonomous ACMG Variant Second Reviewer
New | LangGraph Β· FastMCP Β· MCP Server Β· Pydantic v2 Β· Python
Clinical genomics has a consistency problem. Two trained analysts applying the same 28-criterion ACMG framework to the same variant routinely reach different calls. Reviewer2 acts as an autonomous second reviewer: feed it a variant and a proposed classification, and it independently re-derives the ACMG call from evidence using a 4-node LangGraph pipeline, then flags exactly where the two calls diverge. Every criterion that fires is backed by a specific evidence sentence. No evidence, no flag -- enforced at the Pydantic model level, not by convention.
Tech: LangGraph, FastMCP (Model Context Protocol), Pydantic v2, Ollama / Anthropic / OpenAI / Gemini, uv, ruff, mypy
git clone https://github.com/ankurgenomics/Reviewer2
cd Reviewer2 && uv sync
uv run reviewer2 demo # 3 live cases, no API key needed with Ollama- 21 tests passing, 86% action-band concordance vs expert-panel ClinVar classifications
- Pydantic v2 validators enforce at the model level: no criterion can fire without grounding evidence
- MCP server ready: any LLM agent that speaks Model Context Protocol can call it as a tool
π§« sentinel-amr β Explainable AMR Classifier
New | XGBoost Β· SHAP Β· LangGraph Β· Python
Explainable, agentic antimicrobial resistance classifier for bacterial pathogen surveillance. Predicts resistance, names the genes that drove the call via SHAP TreeExplainer, detects organisms outside the training distribution, and escalates uncertainty to a human β with a full audit trail at every step.
Tech: XGBoost, SHAP TreeExplainer, LangGraph StateGraph, alignment-free k-NN novelty detection, BV-BRC / CARD / VFDB
git clone https://github.com/ankurgenomics/sentinel-amr
cd sentinel-amr && ./run.sh # train + explain + novelty + demo, fully offline
make demo-novel # shows the novel-organism escalation path- ROC-AUC 0.894 on a group-aware holdout β related isolates never leak across the split
- Top drivers independently rediscovered: CTX-M-15, acrAB efflux pump, DNA topoisomerase IV
- Novel-organism gate explicitly blocks the automated call when the sample has no database match
π¦ outbreak-agent β Infectious Disease Triage Pipeline
New | LangGraph Β· Python Β· matplotlib Β· ReportLab
4-node LangGraph state machine for infectious disease outbreak triage. Built around the April 2026 MV Hondius / Andes virus event -- the first confirmed human-to-human hantavirus transmission on a cruise ship. Self-correcting critic loop re-evaluates when outputs are inconsistent.
Tech: LangGraph 0.6, LangChain, matplotlib, ReportLab, pytest
git clone https://github.com/ankurgenomics/outbreak-agent
cd outbreak-agent && pip install -r requirements.txt
python demo.py --case hondius # CRITICAL 98/100, under 2 seconds, no API key- 33 tests passing, completely free to run (no API key required)
- Generates 3-panel risk dashboard (PNG) + structured PDF triage report automatically
- Blog: When an AI Agent Boards a Cruise Ship
𧬠agentic-genomics β GenomicsCopilot
Flagship open-source project | LangGraph Β· Claude Β· Python
Reasoning-traceable agent for variant interpretation. 7 deterministic nodes (VCF ingest β gnomAD/ClinVar lookup β ACMG-lite classification β HPO phenotype scoring) + LLM synthesizer + critic for fact-checking. Every call leaves a full audit trail.
Tech: LangGraph, Claude/Anthropic API, Pydantic v2, pysam, Streamlit, Typer CLI
pip install agentic-genomics
genomics-copilot analyze variants.vcf --phenotypes HPO:0001250- π Why agentic AI for genomics?
- ποΈ System architecture
β οΈ Limitations & prior art
π οΈ genomics-skills β Agent-Callable Skill Library
8 production-quality genomics skills | Python Β· Claude Haiku Β· REST APIs
The downstream skill layer for agentic-genomics. Each skill is agent-discoverable with a SKILL.md contract, CLI entrypoint, and deterministic outputs (TSV + PNG/SVG). LLM-powered routing via Claude Haiku maps natural-language queries to the right skill.
Skills: TCGA pan-cancer expression (9,479 real samples) Β· Kaplan-Meier survival (Cox PH) Β· GO/KEGG enrichment Β· PubMed search Β· Protein variant mapper Β· 3D structure viewer Β· Volcano plots
Tech: Python, Claude Haiku (LLM routing), cBioPortal/MyVariant/NCBI/PDB APIs, Pandas, Matplotlib
genomics-skill suggest "show me survival data for BRCA1 in breast cancer"
genomics-skill run tcga-expression --gene TP53 --mode pan-cancer- Validation MCC peaked at epoch 1β2 then declined, so a low learning rate and validation-based selection matter more than longer training
- π€ Model card: ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft
Personal side project | Multi-agent orchestration Β· Claude API Β· RAG
5 specialized agents (Trigger β Log Fetcher β RAG β Classifier β JIRA Writer) built on weekends to explore autonomous diagnosis of genomic pipeline failures (DRAGEN, ICA, SGE/HPC).
Tech: Multi-agent orchestration, Claude API, RAG, Python, JIRA/Confluence APIs
Production cloud infrastructure | AWS Β· Nextflow Β· Step Functions
Self-optimizing WGS/RNA-seq workflows on AWS with adaptive resource allocation and automated QC gating. Processed 6,000+ samples with minimal human intervention.
Impact:
- 40% β compute costs
- 50% β storage footprint
- 400 TB genomic data managed
Tech: Nextflow (DSL2), AWS Batch, Lambda, Step Functions, Docker, IaC
Related work: gwas_nf β Nextflow pipeline for GWAS
PhD Thesis Β· NTU Singapore Β· 2021 Age-dependent transcriptional and epigenetic alterations in mouse hepatocytes
Technical Writeup Β· Open Source Why agentic AI for genomics? Designing reasoning-traceable variant interpretation
Open-source ML Β· 2026 genome-ft: full fine-tuning of Nucleotide Transformer v2 (50M) on GenomicBenchmarks β leakage-free, multi-seed evaluation; model card on Hugging Face
Technical Blog Β· Amazon Web Services Β· 2025 Using serverless for cross-organizations information exchange in genomic analysis
Blog Post Β· May 2026 When an AI Agent Boards a Cruise Ship: Hantavirus, LangGraph, and the Future of Outbreak Triage
Conference Poster Β· Cell Symposia, Chicago Β· 2019 Significance of hepatocyte polyploidization in liver physiology and pathology
Peer-Reviewed Β· Frontiers in Microbiology Β· 2018 Antiproliferative and antioxidative bioactive compounds in marine-derived endophytic fungus
I am open to relevant roles globally β across industry, research, and startups β where agentic AI, ML, or computational biology intersects with real-world impact.
If you are working on something ambitious at the intersection of AI and science, I'd love to hear from you.
Based in Singapore (PR) β open to remote, hybrid, or relocation anywhere in the world.
- π§ Email: ankurs103@gmail.com
- πΌ LinkedIn: linkedin.com/in/ankurit
- π Portfolio: ankurgenomics.github.io
- π Resume: Download CV (PDF)
