Skip to content
View loversky02's full-sized avatar

Block or report loversky02

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
loversky02/README.md

Vuong Tran Dinh Minh

AI Engineer  ·  Agent-First Development  ·  Harness Engineering

VinUniversity  ·  Vietnam  🇻🇳

I turn fresh research papers into runnable, verifiable demos.
A series I call Build with Paper — read the paper, ship the demo, prove it runs.


About

  • Build with Paper — I pick a recent arXiv paper, reproduce its core idea, and ship a demo you can actually run and check.
  • Focus — LLM routing & deliberation, agentic evaluation (honest oracles), RL post-training (GRPO / self-play), and data engines for Vietnamese.
  • Harness engineering — the scaffolding that makes agents reliable, cheap, and verifiable.
  • Local-first — iterate on a Mac (MLX / Metal); rent GPUs (H100 / H200) only when weights need to move.

Featured — Build with Paper

Routing & deliberation

  • super-agent — a unified cost-aware router: model × reasoning-depth on one Thompson bandit, drift-aware memory, and a GRPO-internalized Qwen3-4B depth policy.
  • System-III-Router — learned deliberation routing from Critique of Agent Model (arXiv:2606.23991); a bandit over reasoning depths (direct → cot → plan+verify), live on Vi-GSM8K.

Honest evaluation

  • rtl-gauntlet — a two-tier honest-evaluation harness for agentic RTL design, backed by a formal oracle.
  • agent-memory-lab — a runnable lab measuring invalidation & staleness in agent memory (arXiv:2606.24775).

Data & post-training

Multi-objective & control

  • svh-mol — first public repro of Annealed Stein Variational Hypernetworks (arXiv:2506.06715): one hypernetwork traces the whole Pareto front, plus GRPO-learned annealing and an LLM multi-objective-alignment bridge. Multi-seed verified, $0 on a Mac.

Toolbox

Python · PyTorch · Transformers · TRL / GRPO · vLLM · MLX · Hugging Face · Docker
Agents: Claude Code · custom harnesses · Thompson-bandit routers


GitHub  ·  vuongsky55.cv@gmail.com

Read the paper. Ship the demo. Prove it runs.

Pinned Loading

  1. agent-memory-lab agent-memory-lab Public

    Runnable lab measuring invalidation/staleness in agent memory — paper: Are We Ready For An Agent-Native Memory System? (arXiv:2606.24775)

    Python

  2. rtl-gauntlet rtl-gauntlet Public

    Two-tier honest-evaluation harness for agentic RTL design (research, WIP)

    Python

  3. spiced-mini spiced-mini Public

    Hands-on, runnable demos of SPICED (self-play + a pinch of human data, arxiv 2606.19370) at tiny scale on a Mac M5: demo-regularized self-play, the RLHF KL knob, and LoRA on a real LLM.

    Python

  4. super-agent super-agent Public

    Unified cost-aware router: model x reasoning-depth on one Thompson bandit (factored policy), bi-temporal forget-on-drift statistics memory + auto-drift proxy, GRPO-internalized Qwen3-4B depth policy.

    Python

  5. System-III-Router System-III-Router Public

    Learned deliberation routing (System III Configurator) from 'Critique of Agent Model' (arXiv:2606.23991): a cost-aware bandit over reasoning depths (direct/cot/plan+verify), live on Vi-GSM8K, with …

    Python

  6. vi-gsm8k-agentic vi-gsm8k-agentic Public

    Vietnamese GSM8K agentic self-instruct dataset — proven to beat machine-translated data (Qwen3-4B: 81.0% vs 76.5%)

    Python