2 days ago
My latest open source project, though a bit different this time
Very cool!
What lets the same algorithm work across all three domains is that they share two key properties:
- Closure under negation and intersection — so the difference between two candidates is itself expressible.
- Sampling from that difference — so the system can show the user concrete cases where the candidates disagree.
3 days ago
How do we navigate the world of LLMs, and how they understand (or rather, don’t) the concept of truth? How can we use them for what they’re good for, rather than throwing them at every problem? Let’s discuss ways of constraining them, and ways to improve how they’re used.
via: https://www.aeracode.org/2026/06/01/constraining-llms/
It seems a bit counterintuitive that test case reduction would lead to the discovery of new bugs since we might expect that the space of inputs to a well-tested software system is mostly non-bug-triggering with a few isolated pockets of bug-triggering inputs scattered here and there. I am afraid that that view might not be realistic. Rather, all of the inputs we usually see occupy a tiny portion of the space of inputs, and it is surrounded by huge overlapping clouds of bug-triggering inputs. Fuzzers can push the boundaries of the space of inputs that we can test, but not by as much as people generally think. Proofs remain the only way to actually show that a piece of software does the right thing any significant chunk of its input space. But I digress. The important fact is that reducers are decently effective mutation-based fuzzers.
via: https://tratt.net/laurie/blog/2026/test_case_reducers_are_underappreciated_debugging_tools.html
my reducer has no understanding of my interestingness test or, by extension, the underlying program the interestingness test is running. In other words, test-case reduction has done something useful despite having almost no understanding of why what it’s doing is useful. This lack of understanding is the key to the success of test-case reducers. I can run my test-case reducer on any text file, and it will work.
4 days ago
09 Jun 26
07 Jun 26
I introduce my next idea – inspired by the Aleutian kayak, also known as a baidarka, and a video of a dead fish.
Interesting video.
via: https://surjan.substack.com/p/a-two-minute-video-two-years-in-the
27 May 26
A deep dive into Claude Code for daily users. Covers the .claude directory, CLAUDE.md the way Boris writes it, CLAUDE.local.md, Skills with real examples, custom subagents, plugins, underused commands like /goal and /insights, MCPs, and the workflow patterns the Anthropic team actually uses.
Coding agents can now produce code faster than humans can review it. Techniques that minimize the amount of code that can affect system correctness are therefore a major goal in the field. The Axon project used a coding agent to build a complete verified compiler — source language, operational semantics, optimizations, machine model, and mechanically checked correctness proofs — in 34 days under one supervisor.
The talk discusses how the project combined four validation techniques — testing, credible compilation, formal verification, and code audits - to maximize development velocity while minimizing the amount of unverified/unchecked code.
We present the overall architecture of the compiler, the roles that the different validation techniques played in the development, results from the Axon implementation, and implications for other agent centric development efforts.
see: https://arxiv.org/abs/2605.01660
26 May 26
Illustrating the Engineering Around Us
19 May 26
14 May 26
For those unfamiliar, emacs-devel is the primary development discussion list for GNU Emacs – where design decisions get made, patches get reviewed, and occasionally where people spend 200 messages arguing about version control software. This is the story of that last one.
13 May 26
12 May 26
Overall verification resulted in a remarkably robust and rigorous codebase. AFL and Claude had a really hard time finding errors. But they did still find issues. Verification is only as strong as the questions you think to ask and the foundations you choose to trust.
09 May 26
My takes and experinces with secrets management on NixOS
07 May 26
We like to write software! Coding is a zigzag journey of problem-solving, and the destination is less important than some might think.
06 May 26
Principia Softwarica is a series of books explaining how things work in a computer by describing with full details all the source code of all the essential programs used by a programmer. All the programs come from Plan 9 from Bell Labs, an operating system designed as the successor to Unix whose code is remarkably elegant and small.