Ankur Sharma ankurgenomics

Ankur Sharma, PhD

Agentic AI · ML · Computational Biology

Building production multi-agent systems and reasoning-traceable AI for science.

8+ years production ML & bioinformatics | PhD NTU Singapore | Singapore (PR)

LangGraph · Claude API · Multi-Agent Systems · AWS · Nextflow · Clinical Genomics

🤖 Featured Projects

🧬 genome-ft — Fine-tuning a Genomic Foundation Model

New | PyTorch · Hugging Face Transformers · Nucleotide Transformer v2

Full weight-level fine-tuning of Nucleotide Transformer v2 50M (InstaDeep, a genomic foundation model) on two GenomicBenchmarks DNA classification tasks. All 53.8M parameters are updated, not LoRA or a frozen head. The emphasis is the evaluation protocol rather than the headline number: a leakage-free train/validation/test split, the test set scored exactly once, three random seeds with reported variance, and the base model measured under the identical pipeline.

Tech: PyTorch (Apple MPS), Hugging Face Transformers, AdamW with warmup + cosine decay, reverse-complement augmentation, validation-based checkpoint selection

git clone https://github.com/ankurgenomics/genome-ft
cd genome-ft && pip install -r requirements.txt
python prepare_data.py && python train.py   # reproduces the fine-tune end-to-end

Enhancers (human_enhancers_cohn, 3 seeds): accuracy 0.735, MCC 0.478 ± 0.003 (base model at chance, MCC −0.009)
Promoters (human_nontata_promoters): accuracy 0.872, MCC 0.747
Validation MCC peaked at epoch 1–2 then declined, so a low learning rate and validation-based selection matter more than longer training
🤗 Model card: ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft

🩺 Reviewer2 — Autonomous ACMG Variant Second Reviewer

New | LangGraph · FastMCP · MCP Server · Pydantic v2 · Python

Clinical genomics has a consistency problem. Two trained analysts applying the same 28-criterion ACMG framework to the same variant routinely reach different calls. Reviewer2 acts as an autonomous second reviewer: feed it a variant and a proposed classification, and it independently re-derives the ACMG call from evidence using a 4-node LangGraph pipeline, then flags exactly where the two calls diverge. Every criterion that fires is backed by a specific evidence sentence. No evidence, no flag -- enforced at the Pydantic model level, not by convention.

Tech: LangGraph, FastMCP (Model Context Protocol), Pydantic v2, Ollama / Anthropic / OpenAI / Gemini, uv, ruff, mypy

git clone https://github.com/ankurgenomics/Reviewer2
cd Reviewer2 && uv sync
uv run reviewer2 demo   # 3 live cases, no API key needed with Ollama

21 tests passing, 86% action-band concordance vs expert-panel ClinVar classifications
Pydantic v2 validators enforce at the model level: no criterion can fire without grounding evidence
MCP server ready: any LLM agent that speaks Model Context Protocol can call it as a tool

🧫 sentinel-amr — Explainable AMR Classifier

New | XGBoost · SHAP · LangGraph · Python

Explainable, agentic antimicrobial resistance classifier for bacterial pathogen surveillance. Predicts resistance, names the genes that drove the call via SHAP TreeExplainer, detects organisms outside the training distribution, and escalates uncertainty to a human — with a full audit trail at every step.

Tech: XGBoost, SHAP TreeExplainer, LangGraph StateGraph, alignment-free k-NN novelty detection, BV-BRC / CARD / VFDB

git clone https://github.com/ankurgenomics/sentinel-amr
cd sentinel-amr && ./run.sh   # train + explain + novelty + demo, fully offline
make demo-novel               # shows the novel-organism escalation path

ROC-AUC 0.894 on a group-aware holdout — related isolates never leak across the split
Top drivers independently rediscovered: CTX-M-15, acrAB efflux pump, DNA topoisomerase IV
Novel-organism gate explicitly blocks the automated call when the sample has no database match

🦠 outbreak-agent — Infectious Disease Triage Pipeline

New | LangGraph · Python · matplotlib · ReportLab

4-node LangGraph state machine for infectious disease outbreak triage. Built around the April 2026 MV Hondius / Andes virus event -- the first confirmed human-to-human hantavirus transmission on a cruise ship. Self-correcting critic loop re-evaluates when outputs are inconsistent.

Tech: LangGraph 0.6, LangChain, matplotlib, ReportLab, pytest

git clone https://github.com/ankurgenomics/outbreak-agent
cd outbreak-agent && pip install -r requirements.txt
python demo.py --case hondius   # CRITICAL 98/100, under 2 seconds, no API key

33 tests passing, completely free to run (no API key required)
Generates 3-panel risk dashboard (PNG) + structured PDF triage report automatically
Blog: When an AI Agent Boards a Cruise Ship

🧬 agentic-genomics — GenomicsCopilot

Flagship open-source project | LangGraph · Claude · Python

Reasoning-traceable agent for variant interpretation. 7 deterministic nodes (VCF ingest → gnomAD/ClinVar lookup → ACMG-lite classification → HPO phenotype scoring) + LLM synthesizer + critic for fact-checking. Every call leaves a full audit trail.

Tech: LangGraph, Claude/Anthropic API, Pydantic v2, pysam, Streamlit, Typer CLI

pip install agentic-genomics
genomics-copilot analyze variants.vcf --phenotypes HPO:0001250

🛠️ genomics-skills — Agent-Callable Skill Library

8 production-quality genomics skills | Python · Claude Haiku · REST APIs

The downstream skill layer for agentic-genomics. Each skill is agent-discoverable with a SKILL.md contract, CLI entrypoint, and deterministic outputs (TSV + PNG/SVG). LLM-powered routing via Claude Haiku maps natural-language queries to the right skill.

Skills: TCGA pan-cancer expression (9,479 real samples) · Kaplan-Meier survival (Cox PH) · GO/KEGG enrichment · PubMed search · Protein variant mapper · 3D structure viewer · Volcano plots

Tech: Python, Claude Haiku (LLM routing), cBioPortal/MyVariant/NCBI/PDB APIs, Pandas, Matplotlib

genomics-skill suggest "show me survival data for BRCA1 in breast cancer"
genomics-skill run tcga-expression --gene TP53 --mode pan-cancer

Validation MCC peaked at epoch 1–2 then declined, so a low learning rate and validation-based selection matter more than longer training
🤗 Model card: ankur0050/nucleotide-transformer-v2-50m-genomicbenchmarks-ft

🔧 GenomicsOps AI

Personal side project | Multi-agent orchestration · Claude API · RAG

5 specialized agents (Trigger → Log Fetcher → RAG → Classifier → JIRA Writer) built on weekends to explore autonomous diagnosis of genomic pipeline failures (DRAGEN, ICA, SGE/HPC).

Tech: Multi-agent orchestration, Claude API, RAG, Python, JIRA/Confluence APIs

☁️ Autonomous Genomic Pipelines

Production cloud infrastructure | AWS · Nextflow · Step Functions

Self-optimizing WGS/RNA-seq workflows on AWS with adaptive resource allocation and automated QC gating. Processed 6,000+ samples with minimal human intervention.

Impact:

40% ↓ compute costs
50% ↓ storage footprint
400 TB genomic data managed

Tech: Nextflow (DSL2), AWS Batch, Lambda, Step Functions, Docker, IaC

Related work: gwas_nf — Nextflow pipeline for GWAS

📊 GitHub Stats

🛠️ Technical Stack

Agentic AI & LLMs

ML & Data Science

Cloud & Infrastructure

Bioinformatics

📚 Publications & Research

PhD Thesis · NTU Singapore · 2021 Age-dependent transcriptional and epigenetic alterations in mouse hepatocytes

Technical Writeup · Open Source Why agentic AI for genomics? Designing reasoning-traceable variant interpretation

Open-source ML · 2026 genome-ft: full fine-tuning of Nucleotide Transformer v2 (50M) on GenomicBenchmarks — leakage-free, multi-seed evaluation; model card on Hugging Face

Technical Blog · Amazon Web Services · 2025 Using serverless for cross-organizations information exchange in genomic analysis

Blog Post · May 2026 When an AI Agent Boards a Cruise Ship: Hantavirus, LangGraph, and the Future of Outbreak Triage

Conference Poster · Cell Symposia, Chicago · 2019 Significance of hepatocyte polyploidization in liver physiology and pathology

Peer-Reviewed · Frontiers in Microbiology · 2018 Antiproliferative and antioxidative bioactive compounds in marine-derived endophytic fungus

🎯 Open to Relevant Opportunities

I am open to relevant roles globally — across industry, research, and startups — where agentic AI, ML, or computational biology intersects with real-world impact.

If you are working on something ambitious at the intersection of AI and science, I'd love to hear from you.

Based in Singapore (PR) — open to remote, hybrid, or relocation anywhere in the world.

📫 Let's Connect

📧 Email: ankurs103@gmail.com
💼 LinkedIn: linkedin.com/in/ankurit
🌐 Portfolio: ankurgenomics.github.io
📄 Resume: Download CV (PDF)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly