GitHub Agentic Workflows

Blog

Custom Linters in Practice: Sergo, Linter Miner, and LintMonster

gh-aw now registers 35 custom Go analyzers in cmd/linters/main.go. That linter surface is not maintained by hand alone. It is grown, audited, and applied by three separate workflows:

  • Linter Miner proposes new analyzers from recurring patterns.
  • Sergo stress-tests those analyzers for false positives, false negatives, and suppression gaps.
  • LintMonster runs the custom suite and turns findings into tracked cleanup work.

The interesting part is not that each workflow exists. It is that they form a loop: one workflow adds lint rules, another challenges them, and a third drives the codebase toward compliance.

The workflow definition is explicit about its job: mine discussions, issues, and Go source, pick one new linter idea, implement it, and open a PR. GitHub search currently shows a long run of [linter-miner] PRs, and the recent examples are concrete:

  • fprintlnsprintf flags fmt.Fprintln(w, fmt.Sprintf(...)) and links to ADR 34498.
  • timeafterleak catches time.After(...) inside for+select loops.
  • errorfwrapv flags fmt.Errorf(...%v..., err) where %w should preserve the error chain.
  • wgdonenotdeferred catches non-deferred sync.WaitGroup.Done() calls.
  • lenstringsplit rewrites len(strings.Split(s, sep)) to strings.Count(s, sep)+1 when the separator is provably non-empty.
  • stringreplaceminusone rewrites strings.Replace(..., -1) to strings.ReplaceAll(...).

This is not a one-off burst. The same theme appears in the blog’s own weekly updates: May 25, June 15, and June 22. Those posts document fprintlnsprintf, timeafterleak, errorfwrapv, and deferinloop as shipped work rather than aspirational ideas.

Sergo pressure-tests the linters after they land

Section titled “Sergo pressure-tests the linters after they land”

Where Linter Miner expands the rule set, Sergo does the adversarial follow-up. The workflow is focused on actionable Go analysis using Serena, and its issue history shows a steady pattern: find a precision gap, write a tightly scoped issue, and let the next PR harden the analyzer.

The clearest evidence is the issue-to-PR chain:

  • Issue #40244 found that errstringmatch only handled strings.Contains(err.Error(), ...); PR #40248 extended coverage to HasPrefix, HasSuffix, EqualFold, Index, LastIndex, and Compare.
  • Issue #41377 found missing //nolint: support across four context-family linters; PR #41382 added suppression parity.
  • Issue #41376 found a false negative in manualmutexunlock when two struct instances shared the same mutex field; PR #41383 fixed the keying model.
  • Issue #40947 found that wgdonenotdeferred missed goroutine closures launched inside loops; PR #41026 fixed the function-literal scope boundary.
  • Issue #41163 found that lenstringsplit mishandled an empty raw-string separator; PR #41188 fixed the false positive and the broken autofix.

There is also useful evidence in the failures. Sergo’s Issue #40243 bundled several package-identity precision fixes into one direction, and PR #40247 closed unmerged after sprawling into a large branch. The narrower follow-up work still landed, including PR #40248. That is a good sign: the workflow is producing reviewable problems, not just optimistic reports.

LintMonster turns diagnostics into repository work

Section titled “LintMonster turns diagnostics into repository work”

LintMonster operates later in the loop. It runs make golint-custom, groups findings by root cause, creates or updates issues, and can assign up to three Copilot agent sessions to fix them.

Its evidence trail is easy to follow:

  • Issue #40932 grouped four resource-lifecycle and context-propagation findings; PR #41589 merged the targeted fixes.
  • Issue #40933 tracked hard-coded path constants; PR #41611 replaced the flagged literals with existing constants.
  • Issue #39314 established an authoritative function-length backlog for 653 findings.
  • Issue #41466 refreshed that same backlog at 660 findings and kept it consolidated instead of spawning duplicate tracking issues.

This is what makes the custom linter suite operational instead of decorative. Rules only matter if they change the repository. LintMonster is the workflow that turns diagnostics into queues, slices, assignments, and merged cleanup work.

Taken together, the workflows separate three jobs that usually get conflated:

  1. Invent a rule from a real pattern. Linter Miner does this with new analyzers such as timeafterleak and lenstringsplit.
  2. Challenge the rule’s correctness. Sergo does this with issues such as #40947 and #41163.
  3. Apply the rule to production code. LintMonster does this with issue-to-PR chains such as #40932 → #41589 and #40933 → #41611.

That split is why the system looks durable. New rules keep arriving. Old rules keep getting corrected. The repository keeps absorbing the results.

If you want to inspect the trail directly, start here:

This is a useful pattern beyond gh-aw: treat static analysis as a living workflow system, not just a binary that runs in CI.

Weekly Update – June 22, 2026

Another packed week at github/gh-aw! Over 20 pull requests merged between June 15 and June 22, covering a significant performance regression fix, a new Go linter, a major feature flag rollout, and a handful of targeted reliability improvements. Here’s what shipped.

Performance: +320% Compiler Regression Fixed

Section titled “ Performance: +320% Compiler Regression Fixed”

PR #40662 fixes a nasty regression in BenchmarkCompileComplexWorkflow that had quietly pushed compile times from ~3 ms/op to ~12.7 ms/op — a 320% slowdown. The culprit was validateTemplateInjection triggering a full yaml.Unmarshal on every pass through hasAnyExpressionInRunContent, even when skipValidation=true (the default in NewCompiler()). Eliminating that redundant unmarshal brings benchmark performance back to baseline. If your workflows felt slower to compile lately, this is the fix.

PR #40679 adds a new Go analysis linter — deferinloop — that flags defer statements placed inside for-loop bodies. A defer inside a loop doesn’t fire at the end of each iteration; it fires when the enclosing function returns, causing resource leaks (file handles, connections) and confusing LIFO cleanup ordering. gocritic covers this pattern but is currently disabled due to golangci-lint v2 bugs, so this custom analyzer fills the gap and is now enforced in CI.

gh-aw-detection Rolls Out to 50% of Workflows

Section titled “ gh-aw-detection Rolls Out to 50% of Workflows”

PR #40698 expands the gh-aw-detection feature flag from 20% (43 workflows) to 50% of agentic workflows (107 out of 214). The rollout targets workflows alphabetically and adds features: gh-aw-detection: true to the 64 newly included workflows. If you’re watching detection coverage metrics, expect a notable jump.

PR #40715 fixes a bug where handleMessage in the MCP server was surfacing [object Object] in error responses. The root cause: the catch block used String(e) for non-Error thrown values, but safe_outputs_handlers.cjs throws plain objects for validation errors — giving callers a useless stringification. The fix detects plain objects and serializes them correctly, and also enforces valid JSON-RPC error codes for all thrown values.

PR #40684 fixes a sparse checkout path typing issue in Skillet’s pre-activation skills checkout. A type mismatch was causing silent failures when resolving sparse checkout paths — the kind of bug that’s nearly invisible until it bites you.

Daily Observability Report Artifact Fetching

Section titled “Daily Observability Report Artifact Fetching”

PR #40705 ensures the daily-observability-report workflow explicitly requests agent and detection artifact sets during log fetches. Without this, report generation could silently proceed without the required telemetry inputs, producing incomplete or noop outcomes.

PR #40696 replaces SHA-256 with FNV-1a for heredoc delimiter generation. FNV-1a is dramatically faster for this use case — heredoc delimiters don’t need cryptographic-strength hashing, and the switch reduces overhead in the compiler’s string-processing path.

PR #40695 reduces ambient prompt surface in high-traffic workflows. Trimming unnecessary context from the initial system prompt means fewer tokens on every invocation — the savings add up quickly when a workflow runs hundreds of times a day.


Your repository’s resident UX guardian — scans documentation, CLI help text, workflow messages, and validation code for clarity, professionalism, and usability gaps, filing targeted single-file improvement tasks when it finds something worth fixing.

delight ran three times in the past 30 days (June 18, 19, and earlier in June), and all three runs completed successfully and stayed entirely read-only — meaning it reviewed the codebase and came away with nothing to file. For a workflow whose whole job is finding UX rough edges, that’s a quiet kind of compliment to the team. Each run, it randomly samples 1–2 documentation files, 1–2 CLI commands, 1–2 workflow message configurations, and 1 validation file, then evaluates them against five enterprise UX design principles: clarity, professional communication, efficiency, trust, and documentation quality.

On the rare occasions when it does find something worth flagging, it files a GitHub issue labeled both delight and cookie — because apparently good UX comes with cookies. It’s capped at 2 issues per run so it never floods your backlog, and it keeps a rolling memory of past findings to avoid flagging the same thing twice.

Usage tip: Run delight in any repo where user-facing quality matters — its single-file task constraint means every improvement it suggests is scoped, reviewable, and completable in an afternoon.

View the workflow on GitHub


Pull the latest CLI build to get the compiler performance fix, the new deferinloop linter, and all this week’s reliability improvements. As always, feedback and contributions are welcome at github/gh-aw.

Weekly Update – June 15, 2026

No releases this week — but the merge queue more than made up for it. Over 50 pull requests landed in github/gh-aw between June 9 and June 15, touching everything from Go reliability to docs, linters, and cost optimization. Here’s the highlights.

Reliability: Eliminating time.After Timer Leaks

Section titled “ Reliability: Eliminating time.After Timer Leaks”

PR #39188 landed one of the most satisfying fixes of the week: every looped time.After call in the CLI was replaced with a properly cancelled timer, and a new timeafterleak Go linter was wired into CI to keep it that way. In tight loops, time.After creates a new timer on every iteration without ever cleaning up the old ones — a slow drip of leaked goroutines. Now that drip is plugged, and the linter makes sure it stays that way.

New Linters: errorfwrapv and timeafterleak

Section titled “ New Linters: errorfwrapv and timeafterleak”

Two new Go analysis linters shipped this week:

  • timeafterleak — flags time.After inside for+select loops where the timer would never be cancelled.
  • errorfwrapv — flags fmt.Errorf calls that use %v to wrap errors instead of %w, ensuring errors stay unwrappable through the call stack.

Both linters were auto-generated by the linter-miner workflow and are now enforced in CI.

PR #39118 raises the default max-patch-size from 1 MB to 4 MB and improves the error message when a patch exceeds the limit. If your workflows were running into patch-size rejections on larger changesets, you’ll want to pull in the latest CLI — this headroom matters for repos with big generated files.

Cross-Repo safe-outputs Dispatch Allowlists

Section titled “ Cross-Repo safe-outputs Dispatch Allowlists”

PR #39080 adds support for cross-repo dispatch-workflow allowlists in safe-outputs. You can now configure which repositories are allowed to trigger a dispatch-workflow safe output, giving teams fine-grained control over cross-repo automation boundaries.

Two PRs improve what you see when workflows fail:

  • #39122: Failure issues now include the last 5 tool calls when a tool denial triggers — so instead of “tool was denied,” you get the full context of what the agent was trying to do.
  • #39069: When the AI credits guardrail fires, failure issues now include an “Optimize token consumption” section with concrete suggestions for reducing costs.
  • #39241: Anthropic Workload Identity Federation (WIF) is now documented as a first-class Claude authentication option — no more hunting through PRs to figure out how to set it up.
  • #39226: The experiments docs were expanded with concrete examples covering custom models, sub-agents, and sub-skills.

Several PRs this week focused on reducing unnecessary token consumption:

  • #39280: Reduced first-request token overhead in smoke-copilot and test-quality-sentinel by trimming ambient context.
  • #39157: Reduced ambient-context payload across daily and PR workflows by sharing prompt imports more efficiently.

Agent of the Week: aw-failure-investigator

Section titled “ Agent of the Week: aw-failure-investigator”

Your tireless overnight watchman — scans every workflow run in the repository, diagnoses root causes, and files structured GitHub issues before you’ve had your morning coffee.

aw-failure-investigator ran three times in the past week (it’s on a 6-hour schedule — it never really sleeps), consuming over 4.7 million tokens and 60 turns across those runs. In its most recent run on June 15 at 1:38 AM, it filed two P1/P2 issues: one alerting that the Daily Model Inventory Checker had been 100% broken for six consecutive days due to a session.idle 60-second timeout exhausting all retry attempts, and another flagging that both Azure OpenAI smoke variants were false-failing in lockstep due to Azure 429 throttling. Earlier in the week it also identified that Code Simplifier was silently hitting the api-proxy invocation cap (50/50 LLM calls), causing 100% failure rate with no existing tracking issue.

It ran its June 14 morning investigation in 16.6 minutes, used 1.8M tokens, and still filed 3 detailed issues — including one that caught a failure the team hadn’t noticed yet. Impressive dedication for an agent that technically has no idea what time it is.

Usage tip: Deploy aw-failure-investigator in any repo with multiple scheduled workflows — catching silent regressions at 2 AM beats discovering them at the next sprint review.

View the workflow on GitHub


All of this week’s improvements ship with the latest CLI build. Pull the newest version and explore the expanded patch-size headroom, the new linters, and the improved failure diagnostics. As always, contributions are welcome at github/gh-aw.

Effective Tokens replaced by AI Credits

In the latest gh-aw build, Effective Tokens (ET) have been replaced by AI Credits (AIC) as the primary spend metric.

This change reflects GitHub Copilot billing and models.dev pricing. It makes spend tracking directly aligned to monetary cost instead of a normalized token proxy.

  • gh aw audit and gh aw logs report AI Credits as the primary spend metric.
  • Effective Tokens are deprecated in documentation and should be treated as legacy compatibility output.
  • Cost reporting and budget discussions should use AIC values.

For repositories that need automatic workflow updates, run:

Terminal window
gh aw fix --write
  • AI Credits (AIC): primary spend metric (1 AIC = $0.01 USD)
  • Effective Tokens (ET): deprecated legacy metric

Agent of the Day – June 2, 2026

Agent of the Day – June 2, 2026: The Data Detective

Section titled “Agent of the Day – June 2, 2026: The Data Detective”

You know that feeling when a bill arrives and it’s higher than you expected — and the line items are all vague? That’s what staring at aggregate AI token consumption looks like without good tooling. The number goes up, the curve bends, and everyone shrugs. Was it a new workflow? A prompt gone feral? A perfectly normal Monday?

That’s the exact problem Scout was built for.


Scout is gh-aw’s on-demand research agent — a workflow you invoke with a question and come back to with an answer. It doesn’t file PRs or leave comments as part of a pipeline. It reads, reasons, and reports, turning an open-ended research prompt into structured evidence a team can actually act on.

On May 31, 2026 (run #26709587451), Scout received a deceptively simple prompt on issue #36100: investigate token usage trends from the agentic-token-audit and agentic-token-optimizer workflows across April and May.

Eight turns and 8.1 minutes later, it had the answer — and it wasn’t pretty.


The headline: daily token consumption in gh-aw nearly doubled over two months, peaking at 138 million tokens on May 29 — the highest single day in the entire dataset.

WindowAvg tokens/dayAvg action-min/day
April 2026 (21 days)~80.1M~713
Early May (days 1–5)~62.1M
Late May (days 20–29)~101.8M~900

Run counts stayed nearly flat the whole time — capped near 100/day by the collector’s limit. More runs weren’t the culprit. The growth was coming from within each run.

Scout traced it to two compounding forces. First, heavy-hitter workflows: the May 29 spike was dominated by PR Sous Chef (15.7M tokens across 5 runs, averaging ~186 turns per run), Safe Output Health Monitor (8.7M, single run), and Go Logger Enhancement (8.5M). Token variance tracked workflow mix and turn count almost exactly. Second, catalog growth: ~111 new agentic workflow .md files were added between April and May, pushing the repository to over 237 workflows. More workflows meant more scheduled runners pulling heavier daily reporters and analyzers into the mix.

There’s a silver lining. The agentic-token-optimizer workflow is doing its job — flagging concrete savings targets and driving commits. After Scout’s predecessor run flagged go-logger at 1.7M tokens per run on May 31, commit #36088 (“Trim go-logger workflow prompt and validation overhead”) landed quickly. The feedback loop works.

The gap is velocity: new workflows are arriving faster than optimizations land, so the net curve still bends upward.


What makes this run compelling isn’t just the findings — it’s how Scout approached the problem. It used 37 distinct tool types across 8 turns, drawing on Tavily’s research suite (search, crawl, extract, map, and research) to pull historical snapshot data and cross-reference it against repository commits. It made 61 network requests with zero firewall blocks, querying the memory/token-audit branch for the daily snapshot history and reconciling gaps in the mid-May data (several dates had empty downloads from API rate-limit failures during collection).

The result was a structured research report posted directly to issue #36100, complete with a data table, a trend attribution section, caveats about data quality during the blind-spot window (May 6–19), and concrete recommendations — all in a single comment.

No pipeline. No scaffolding. Just: “here’s a hard question” → “here’s a rigorous answer.”


Scout is a good reminder that not every agent needs to do something to be valuable. Some of the highest-leverage work in a complex system is the work of seeing clearly — quantifying what’s happening, attributing root causes, and giving a team a shared picture to reason from. Without that, optimization work is guesswork.

When your token bill doubles in six weeks, you want a Scout.


Want to run your own research agent or explore the full gh-aw workflow catalog? Check out the project at github.com/github/gh-aw.