Probe GPU device count out-of-process on Windows (#988 follow-up) by aittalam · Pull Request #994 · mozilla-ai/llamafile

aittalam · 2026-06-04T10:55:30Z

Probe GPU device count out-of-process on Windows (#988 follow-up)

Background

#988 was reported as an uncaught SIGSEGV when GPU init fails, killing CPU fallback. The fix in #989 (0.10.3) wrapped the get_device_count() probe in an in-process signal guard (gpu_run_guarded): catch the fault with a handler, siglongjmp out, and treat it as "no device" so AUTO mode falls back to CPU.

That stopped the crash but introduced a new failure on Windows: the process would no longer crash — it would silently exit later, during model load (reported here). --gpu disable worked, --gpu auto did not.

Root cause

On Windows the Vulkan probe fault isn't a plain segfault — it's a C++/SEH exception thrown by the GPU driver during instance init, surfaced across the cosmo_dlopen/ms_abi boundary as SIGSEGV (the strace shows the 0xE06D7363 "msc" exception code). siglongjmp-ing out of an in-flight foreign exception skips the unwind and leaves the C++/SEH runtime and the partially-initialized driver in a corrupted state. The damage is latent and only bites later, under the memory-heavy model load, as a silent termination.

So catching the fault in-process and continuing is fundamentally unsafe here: once the foreign exception has been thrown, the process is already compromised.

Why not just "print a message and exit"?

We considered detecting the fault and exiting with a "re-run with --gpu disable" message. Live testing on a Windows VM showed that's too aggressive: a probe fault is the common, survivable no-GPU case (any machine with vulkan-1.dll present but no valid ICD — headless servers, fresh installs, VMs). Both a 0.8B and a 4.5B model loaded fine after a caught "shallow" fault. We can't distinguish a survivable catch from a corrupting one at runtime, so exiting would break working CPU fallback for that whole class of users.

This change

Run the device-count probe in a short-lived child process (a re-exec of the binary), on Windows only:

A __attribute__((constructor)) in gpu_backend.c (linked into every binary via gpu.a) checks two env vars; if set it cosmo_dlopens the DSO, calls the count symbol, and _Exit()s with the device count. Crash signals are converted to a clean _Exit(255) — no longjmp, no unwind, nothing to corrupt.
gpu_backend_probe_oop() posix_spawns the child with a private envp, then maps its exit code: 1..253 → available, 0 → no devices, 254/255/signaled → crashed. Same user-facing log messages as before.
cuda.c / vulkan.c dispatch IsWindows() ? gpu_backend_probe_oop(b) : gpu_backend_probe(b).

A crash now dies in the child; the parent never executes the faulting driver-init code for a device-less backend, so CPU fallback stays clean regardless of how deep the fault is. Linux/macOS keep the existing in-process guard unchanged.

Verification

gpu_backend_test — all 23 checks pass (in-process path untouched).
Child transport validated on host (missing DSO/symbol → 254, count → exit N, crash → 255).
Windows VM with a deliberately broken Vulkan ICD (real driver-probe crash) + CUDA hidden, Bielik-4.5B-Q8 on CPU:
- vulkan: Vulkan crashed during device probe; trying next backend (caught in the child)
- model loaded → server is listening → /health: ok
- POST /completion "The capital of Poland is" → \boxed{Warsaw}.
- No silent exit (NOTE that the silent exit was not replicable on our setup anyway).

Notes / trade-offs

Startup cost (Windows): up to 2 extra process spawns (CUDA + Vulkan probes) re-exec'ing the APE at startup with --gpu auto. Per-backend isolation is required so one backend's crash can't take down another's result.
On the success path (≥1 device) the parent still runs driver init when it registers/uses the backend — safe, because the child already proved init succeeds.
Scope is Windows-only by design; extending the OOP probe to all platforms (retiring gpu_run_guarded) is a straightforward follow-up if desired.
Metal is unaffected (compiled at runtime, no device-count gate).

Fixes the silent-exit follow-up to #988.

…it crashes Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aittalam · 2026-06-04T11:39:46Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aittalam · 2026-06-05T16:09:26Z

Tested both in a replicated scenario and on the failing setup - both work as expected.

Probe GPU device count out-of-process on Windows to survive driver-in…

03179e2

…it crashes Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions Bot added the llamafile label Jun 4, 2026

aittalam mentioned this pull request Jun 4, 2026

Bug: uncaught SIGSEGV in llamafile-0.10.2 #988

Closed

Distinguish library load-failure from probe crash in OOP fallback log

6d3c29f

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

aittalam merged commit b1b7814 into main Jun 5, 2026
2 checks passed

aittalam deleted the fix-missing-cpu-fallback branch June 5, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probe GPU device count out-of-process on Windows (#988 follow-up)#994

Probe GPU device count out-of-process on Windows (#988 follow-up)#994
aittalam merged 2 commits into
mainfrom
fix-missing-cpu-fallback

aittalam commented Jun 4, 2026

Uh oh!

aittalam commented Jun 4, 2026

Uh oh!

aittalam commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aittalam commented Jun 4, 2026