Senior backend engineer (10 years) building production-grade AI systems. I focus on the infrastructure that makes LLM applications reliable at scale — evaluation harnesses, retrieval pipelines, observability, and resilience patterns. The kind of engineering that separates a demo from a product.
| Repository | What It Demonstrates |
|---|---|
| ShipIt | Capstone: Production AI Readiness Scanner — Live Demo |
| agent-exercises | Agentic AI patterns — tool use, multi-step reasoning, ReAct loops, human-in-the-loop, multi-agent orchestration |
| eval-exercises | LLM evaluation — assertion frameworks, scoring rubrics, LLM-as-judge, comparative eval, regression testing, end-to-end pipelines |
| rag-exercises | Production RAG — chunking strategies, BM25, hybrid search, reranking, query transformation, retrieval metrics |
| observability-exercises | LLM observability — token tracking, caching, latency monitoring, streaming, budget controls, structured logging, dashboards |
| resilience-exercises | LLM resilience — retry with backoff, circuit breakers, graceful degradation, hallucination detection, prompt injection defense |
- LLM Infrastructure: Building the systems around the model — not just calling the API, but making it production-ready
- Reliability Engineering: Circuit breakers, retry budgets, and cost guards applied to non-deterministic AI systems
- Evaluation & Quality: Measuring LLM output quality systematically, not vibes-based
- Retrieval Systems: End-to-end RAG pipelines from chunking to evaluation metrics
Python · Google Gemini API · pytest · distributed systems · event-driven architecture
ShipIt — my capstone project. Paste LLM code, get a production-readiness scorecard. The app itself uses every pattern it checks for (circuit breaker, caching, token budgets, structured logging, RAG).