Skip to content

Merge next/kelvin/409 to develop#882

Merged
pkova merged 478 commits into
developfrom
next/kelvin/409
Sep 24, 2025
Merged

Merge next/kelvin/409 to develop#882
pkova merged 478 commits into
developfrom
next/kelvin/409

Conversation

@pkova

@pkova pkova commented Sep 24, 2025

Copy link
Copy Markdown
Collaborator

No description provided.

joemfb and others added 30 commits May 20, 2025 11:52
Supersedes #773.

## Overview

Restages @joemfb's original work from urbit/urbit#5596. 

### Deadlock

The `libuv` deadlock bug (which was observed originally in 2022 and was
the reason for the original PR's reversion) seems to have vanished.

Why? Since I cannot reproduce the bug (after having run a live moon with
this restaged work here for over 2 weeks now, and the bug was previously
"easily and quickly reproducible" with live ships), I cannot determine
the exact cause of its resolution (or if it's truly resolved or not!).
Many things have changed since then, including the build system (`nix`
to `bazel` to now `zig`), repository split, `musl` compiler version, and
`libuv` version.

Despite the uncertainty of why the deadlock bug has seemingly resolved,
I propose to merge this to `edge` for further livenet testing by
developers. If the bug rears its ugly head once again, further analysis
and engineering can be done to fix it. If not, we can declare the bug
effectively fixed.

### King/Serf vs. Mars/Urth

The Mars/Urth split is essentially a modification of Vere's dual process
architecture, moving from a "king" and a "serf" to a "mars" and an
"urth":

| Responsibility          | Serf | King | Mars | Urth |
|-------------------------|------|------|------|------|
| Persistence (snapshot)  | X    |      | X    |      |
| Persistence (event log) |      | X    | X    |      |
| Nock computation        | X    |      | X    |      |
| I/O                     |      | X    |      | X    |

### Disk Migrations

Since Mars is now responsible for event log persistence, the epoch
system disk migrations need to be re-tested with the new architecture:

- [x] `vere-v1.x-aarch64-linux` (thank you @midden-fabler)
- [x] `vere-v2.x-aarch64-linux` (thank you @midden-fabler)
- [x] `vere-v1.x-aarch64-macos`
- [x] `vere-v2.x-aarch64-macos`
- [x] `vere-v1.x-x86_64-linux`
- [x] `vere-v2.x-x86_64-linux`
- [ ] `vere-v1.x-x86_64-macos` (waiting on @mopfel-winrux)
- [ ] `vere-v2.x-x86_64-macos` (waiting on @mopfel-winrux)

(If anyone else is able to test migrations on other platforms, please go
ahead and report back in the comments. I will update this list
accordingly.)

### To-dos
- [x] Rip out replay in subprocess
- [x] Put disk migration call in mars after replay
- [x] Get rid of all async replay stuff in `pier.c`
- [x] Square `u3_mars_init` with epoch system
- [ ] ~~Fix `kick: lost %mass on /arvo` printed at the end of `|mass`
output~~ (this bug isn't caused by this PR)
- [x] Use old code for disk read stuff in Mars/Urth
- [x] Fix `_disk_acquire` callsites
- [x] Fix `_disk_release` callsites
- [x] Handle fatal errors in `_lord_on_plea_boot`
- [x] Move appropriate subprocess arguments we used to send to work over
to `u3_lord_boot`
- [x] Handle pace in `mars.c`
- [x] Rift needs to deconstructed in `king.c` and event processed in
`u3_mars_boot`
- [x] Ensure double boot protection is still working properly (note:
double boot from keyfile protection does not work on `develop`, not just
here)

### Future work

Rectification of names (there are some nuances here):
```
%s/serf/mars/g
%s/king/urth/g
```
joemfb and others added 23 commits September 17, 2025 21:50
This was left in by my windows work because windows requires 16 byte aligment
for the jump buffers which we were unable to provide before Joes allocator work.
I somehow managed to not remove this before merge.
I realized that branch elimination introduced in #850 led to crashes
like in this example:

```
.*(0 [%6 [%5 [%0 1] %1 0] [%1 42] [%6 [%1 %0] [%0 0 0] [%1 42]]])
```

When outer %6 is compiled, `_n_formulaic` checks the validity of
formulas in branches, replacing them with single `BAIL` instruction if
the formula is malformed, which would make it always crash. From POV of
_n_formulaic inner %6 is valid, since the condition expression and at
least one of the branch expressions are valid. But when inner %6 is
compiled, we only compile the invalid branch due to branch elimination,
leading to a crash.

For that reason _n_formulaic has to be more strict in the presence of
branch elimination.
Resolves #861

To check: 
- [x] `parsers.c`
- [x] `urwasm.c`
This was left in by my windows work because windows requires 16 byte
aligment for the jump buffers which we were unable to provide before
Joes allocator work. I somehow managed to not remove this before merge.
Contrary to the initial comment, there is no need to allocate a
character for opening brace: we replace the first leading space with it
when we are done rendering. We do, however, need space for **two** extra
characters: the closing brace and null terminator. ASAN was blowing up
on line 1883 (buffer overflow), it doesn't anymore.
Removes redundant files from the road struct and adds support for
lightweight bytecode migrations (see #831).
@pkova pkova requested a review from a team as a code owner September 24, 2025 15:13
@pkova pkova merged commit 146b7ae into develop Sep 24, 2025
4 checks passed
@pkova pkova deleted the next/kelvin/409 branch September 24, 2025 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants