Skip to content

feat(channels): native channel-instance dimension — multi-bot substrate#2733

Merged
gavrielc merged 6 commits into
mainfrom
feat/channel-instances
Jun 11, 2026
Merged

feat(channels): native channel-instance dimension — multi-bot substrate#2733
gavrielc merged 6 commits into
mainfrom
feat/channel-instances

Conversation

@gavrielc

Copy link
Copy Markdown
Collaborator

Type of Change

  • Feature skill - adds a channel or integration (source code changes + SKILL.md)
  • Utility skill - adds a standalone tool (code files in .claude/skills/<name>/, no source changes)
  • Operational/container skill - adds a workflow or agent skill (SKILL.md only, no source changes)
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code
  • Documentation - docs, README, or CONTRIBUTING changes only

(None of the boxes fit: this is core substrate — the "watch for hotspots → add a proper hook" maintainer commitment from docs/skills-model.md. It adds a dimension that turns multi-bot skills into clean adds; the bots themselves stay skills.)

Description

What

A native channel-instance dimension: N adapter instances per platform (e.g. three Slack apps in one workspace), keyed by an instance name that defaults to channelType.

  • messaging_groups.instance — NOT NULL, backfilled to channel_type, UNIQUE(channel_type, platform_id, instance) (migration 016: FK-OFF table recreate; the runner gains disableForeignKeys and fails only on FK violations a migration introduces — pre-existing latent orphans are logged and carried, never a boot crash-loop)
  • Adapter registry keyed by instance ?? channelType; delivery/typing dispatch by exact key (a named instance with an offline adapter gets offline handling — never a cross-identity send through a sibling bot); the channelType fallback survives only for channelType-only callers (cold DMs, approval delivery) and warns when it crosses instances
  • ChatSdkBridgeConfig.instance — one field driving the registry key, the webhook route (/webhook/<instance>), and the Chat SDK state namespace. State keys are namespaced only when instance !== adapter.name, so existing installs' chat_sdk_* rows stay byte-identical
  • registerWebhookAdapter(chat, adapterName, routingPath?) — adopted verbatim from feat(chat-sdk-bridge): add channelType override + webhook routingPath #2617 (credit: @mmahmed); routes by path, dispatches by adapter name
  • instance threaded through router (lookup + auto-create), delivery (origin-session-first resolution), and typing

channelType stays the semantic platform key everywhere — user ids (<channel>:<handle>), formatting, container config. Containers stay instance-blind: zero session-DB and zero container/ changes.

Why

Same-channel multi-bot is the one shape the current keying can't express: the registry, UNIQUE(channel_type, platform_id), webhook routes, and Chat SDK state all assume one adapter per platform. A real fork (PR Factory: worker/supervisor/tester Slack trio) needed reach-ins across 12 core files to add this dimension — the definition of a hotspot under docs/skills-model.md's maintainer commitments. With this substrate, that becomes a channel skill that just registers two more adapters. Refs #1804, refs #2653 (both are instance-shaped demand). Complementary to #2617, which targets multi-workspace (where changing the effective channelType is correct); this PR covers multi-bot within one workspace/channel, where channelType must stay stable or user identity fractures.

How it was tested

  • 397 host tests (48 new: migration fresh/aged/idempotent + seeded-FK-children recreate, registry exact-vs-fallback, bridge identity/state-namespace/URL-safety, webhook path-vs-name split, delivery origin-instance preference, typing threading, router no-hijack auto-create)
  • Every new integration point mutation-verified: each guard was confirmed to go red when its wiring is deleted/reverted, then green on restore (~30 mutations executed)
  • build, typecheck clean; lint byte-identical to the dc34ceb baseline (114/13, all pre-existing)
  • Container suite: 101 tests, code untouched (git diff dc34ceb -- container/ is empty). Note: poll-loop — /upload-trace flaked during verification — it makes a live Hugging Face call and races a 5s timeout; reproduces on pristine dc34ceb code, unrelated to this PR (worth a separate issue)
  • channels-branch compat verified: all native adapters and all 13 Chat-SDK channel modules use the default path unchanged. One follow-up for the branch: it carries a stale copy of chat-sdk-bridge.ts that should be synced after this lands

Notes for review

  • An earlier draft staggered same-platform gateway boots by 10s (rate-limit etiquette); cut as operational policy, not substrate — reintroducible skill-side if gateway-mode multi-instance lands.
  • .claude/skills/add-github and add-linear wiring INSERTs gain the instance column (NOT NULL has no SQL default).

🤖 Generated with Claude Code

gavrielc and others added 6 commits June 10, 2026 21:49
Adds the channel-instance dimension to the schema: an `instance` column
(NOT NULL, default instance = channel_type) on messaging_groups, relaxing
UNIQUE(channel_type, platform_id) to the triple so N adapter instances of
one platform can each own a row per chat.

SQLite can't relax a table-level UNIQUE in place, and DROP TABLE fails FK
integrity on live DBs with child rows (the failure that forced migration
011 to abandon its rebuild) — so the migration runner grows an opt-in
`disableForeignKeys` flag: foreign_keys=OFF around the transaction (the
pragma is a no-op inside one), PRAGMA foreign_key_check inside it so a
violating recreate rolls back atomically.

Query semantics (deliberately asymmetric, both documented):
- getMessagingGroupWithAgentCount (router fast path): exact-on-instance,
  no fallback — an unknown named instance returns null so the router
  auto-creates a per-instance group instead of hijacking a sibling's row.
  Default param (= channelType) keeps existing callers identical.
- getMessagingGroupByPlatform (outbound/cold-DM/setup): unset instance
  resolves default-instance-first with a deterministic ORDER BY; set
  instance is exact-only.

Existing rows are backfilled instance = channel_type, so single-instance
installs see zero behavior change and need no operator action.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… fallback

ChannelAdapter and InboundEvent gain an optional `instance` field — the
host-side routing identity for N adapters of one platform. channelType
stays the semantic platform key (user ids, formatting, container config).

Registry changes:
- activeAdapters keys by `adapter.instance ?? adapter.channelType`, so the
  default instance keeps today's channelType key byte-identically. A
  duplicate instance key warns loudly and overwrites (today's boot
  semantics, made visible).
- getChannelAdapter(key) resolves the exact instance key first, then falls
  back to the first-registered adapter of that channel type — channelType-
  only callers (cold DMs, user-id prefix resolution, approval delivery)
  still resolve deterministically when every instance of a platform is
  named.
- initChannelAdapters staggers same-channelType setups by 10s so two
  gateway bots of one platform don't identify simultaneously from one IP.
  Inert when no two registrations share a channelType.

No adapter sets `instance` today, so every existing install boots
identically.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…utes

ChatSdkBridgeConfig gains `instance`. The bridge keeps channelType =
adapter.name (semantic platform identity is untouched) and threads the
instance into three places:

- Registry identity: bridge.name / bridge.instance follow config.instance.
- Chat SDK state: SqliteStateAdapter takes an optional namespace and
  prefixes every key at a single choke point (k()). All bridges share the
  chat_sdk_* tables and two same-platform instances see identical
  thread/message ids — without the namespace, the SDK's
  dedupe:${adapter.name}:${message.id} key makes the second bot silently
  drop every message the first processed, locks serialize across bots, and
  subscriptions leak engagement. The namespace applies ONLY when instance
  is set AND differs from adapter.name: the default instance stays on the
  legacy UNPREFIXED keyspace byte-identically, so live installs' existing
  subscriptions/kv/locks/lists rows are never orphaned. enqueue does not
  prefix (appendToList does) — layout is ns:queue:<tid>; acquireLock
  returns the raw threadId and release/extend re-apply k() at their SQL
  sites.
- Webhook route: registerWebhookAdapter(chat, adapterName, routingPath =
  adapterName) splits the URL segment from the chat.webhooks handler key,
  so each same-platform instance gets its own URL (and signing secret).
  Signature adopted verbatim from PR #2617 (credit @davekim917's #1804
  prototype); the handler body needed zero change — dispatch already read
  entry.adapterName, not the route key.

Instance names are validated URL-safe (no '/', '?', ':' or whitespace) at
bridge construction: the route regex is [^/?]+ and ':' is the namespace
delimiter. The Chat instance's inner adapters map stays keyed adapter.name
(the SDK resolves adapters via channelId.split(':')[0] and serializes by
adapter.name) — instance identity lives entirely outside the Chat.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… typing

Inbound: src/index.ts onInbound stamps `instance: adapter.instance ??
adapter.channelType` — the single host-side stamping seam; adapters stay
instance-blind and onInboundEvent (CLI) passes events through unchanged.
The router resolves the thread-policy adapter and the messaging group by
the receiving instance (exact-only — an unknown named instance auto-creates
its own group, persisting the instance, instead of hijacking a sibling's
row).

Outbound: ChannelDeliveryAdapter.deliver/setTyping grow a trailing
`instance` param (host-internal interface only — messages_out, destinations
and session_routing schemas are untouched; containers never see instance).
deliverMessage resolves the messaging group ORIGIN-SESSION-FIRST, so a
named instance's session replies through its own adapter even when a
sibling default row shares the same (channel_type, platform_id); dispatch
goes through getChannelAdapter(instance ?? channelType).

Typing: TypingTarget stores the instance and all three tick sites
(immediate, 4s interval, re-trigger) forward it, so the indicator fires
through the bot that owns the chat.

Also updates a raw-SQL fixture in groups.test.ts for the NOT NULL instance
column.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- CLAUDE.md entity model: instance on messaging_groups.
- db-central.md: updated messaging_groups DDL (instance NOT NULL, triple
  UNIQUE, denied_at), instance semantics (default = channel_type via
  migration 016 backfill; inbound exact-on-instance, outbound
  default-first), and the user_dms per-platform (not per-instance)
  cold-DM note.
- architecture.md: same DDL update in the schema appendix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n-safe skill snippets

Review-round fixes on the instance dimension:
- delivery/typing resolve adapters by exact registry key, never the
  channelType fallback — a named instance with an offline adapter gets
  offline handling, not a cross-identity send through a sibling bot;
  the fallback scan (channelType-only callers) now warns when it
  resolves through a differently-keyed instance
- migration runner only fails on FK violations a migration introduced:
  pre-existing latent orphans (FK-OFF CLI surgery) are logged and
  carried, not turned into a boot crash-loop
- typing re-trigger updates the full address (channelType, platformId,
  threadId, instance) together — no torn entries on agent-shared
  sessions spanning instances
- bridge rejects empty/whitespace instance names (URL-route and
  state-namespace safety)
- add-github / add-linear SKILL.md wiring inserts include the NOT NULL
  instance column
- drop the 10s same-platform boot stagger: operational policy, not
  substrate — reintroducible skill-side for gateway-mode installs

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gavrielc gavrielc requested a review from gabi-simons as a code owner June 11, 2026 08:45
@github-actions github-actions Bot added the follows-guidelines PR was created using the current contributing template label Jun 11, 2026
gavrielc added a commit that referenced this pull request Jun 11, 2026
…'s instance migration

Both PRs branched from the same base and picked the next free number.
The runner dedupes by name so runtime behavior is unaffected either way;
the renumber avoids a barrel symbol conflict for whichever merges second.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gavrielc gavrielc merged commit 0b31695 into main Jun 11, 2026
2 checks passed
@gavrielc gavrielc deleted the feat/channel-instances branch June 11, 2026 16:39
gavrielc added a commit that referenced this pull request Jun 11, 2026
…files

The instance route-split suite (from #2733) keeps src/webhook-server.test.ts;
this branch's raw-route suite moves to src/webhook-server-raw.test.ts —
incompatible lifecycle setups (fixed port + afterEach vs random port +
afterAll) make a single merged file wrong. webhook-server.ts auto-merge
verified: raw routes take dispatch priority, stop clears both maps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

follows-guidelines PR was created using the current contributing template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant