Skip to content

cmd/link: linkname directive on userspace variable can override runtime variable #72032

@srosenberg

Description

@srosenberg

Go version

go version go1.23.6 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/stan_cockroachlabs_com/.cache/go-build'
GOENV='/home/stan_cockroachlabs_com/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/stan_cockroachlabs_com/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/stan_cockroachlabs_com/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.6'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/stan_cockroachlabs_com/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1873938282=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Background

For years, CockroachDB has been using the go:linkname hack to stay ahead of the runtime. In this particular case, we've been using go:linkname sched runtime.sched for (periodically) getting a precise count of the runnable goroutines; in [1], you can see this has been running, at least as of Go 1.15.

The regression I'll describe below snuck into our master after the upgrade to Go 1.23 PR was merged [2]. Luckily, one of our nightlies tripped up [3]. What followed after was a very long (several hours) investigation. The issue is best illustrated with a pared down example.

Reproduction

Build the following code using go 1.22 and go 1.23,

cat linkname_regression.go

package main

import _ "unsafe"

type schedt struct{}

//go:linkname sched runtime.sched
var sched schedt

func main() {
	select {
	default:
		println("Hello World!")
	}
}

Output of ./linkname_regression_1_22,

Hello World!

Output of GOTRACEBACK=crash ./linkname_regression_1_23,

Quit (core dumped)

dlv core ./linkname_regression_1_23 core.linkname_regres

(dlv) bt
0  0x0000000000465763 in runtime.futex
   at /usr/local/go/src/runtime/sys_linux_amd64.s:558
1  0x000000000042b250 in runtime.futexsleep
   at /usr/local/go/src/runtime/os_linux.go:69
2  0x0000000000409bc5 in runtime.lock2
   at /usr/local/go/src/runtime/lock_futex.go:116
3  0x0000000000432b93 in runtime.lockWithRank
   at /usr/local/go/src/runtime/lockrank_off.go:24
4  0x0000000000432b93 in runtime.lock
   at /usr/local/go/src/runtime/lock_futex.go:52
5  0x0000000000432b93 in runtime.mcommoninit
   at /usr/local/go/src/runtime/proc.go:932
6  0x0000000000432612 in runtime.schedinit
   at /usr/local/go/src/runtime/proc.go:823
7  0x0000000000461a7c in runtime.rt0_go
   at /usr/local/go/src/runtime/asm_amd64.s:349
   
(dlv) print runtime.sched
main.schedt {}

Summary

As of Go 1.23, go:linkname sched runtime.sched, runtime.sched is linked to the local sched, instead of the other way around. This obviously causes memory corruption since the local struct is a subset of the target/remote. The fact of this regression can be further illustrated via objdump, by comparing the sizes of the linked runtime.sched; in 1.23, it's empty, and in 1.22 it has the expected size.

objdump -t ./linkname_regression_1_23 |grep "runtime.sched$"
000000000050d300 g     O .noptrbss	0000000000000000 runtime.sched
objdump -t ./linkname_regression_1_22 |grep "runtime.sched$"
00000000004daf00 g     O .bss	0000000000001aa8 runtime.sched

[1] https://github.com/search?q=%22go%3Alinkname+sched+runtime.sched%22&type=code
[2] cockroachdb/cockroach#140626
[3] cockroachdb/cockroach#141977 (comment)

What did you see happen?

Test executions would non-deterministically fail, suggesting some form of memory corruption. In hindsight, there wasn't any obvious change in [2], which could have caused it.

What did you expect to see?

We expected the go:linkname hack to continue working. The changes described in [1] don't mention the fact that a "Handshake" would result in linking the local (user) struct into the runtime. Granted the use of go:linkname is a hack, it has been tacitly supported until 1.23.

We have a fix, which basically moves our code into a forked version of the runtime. I think it's still worth mentioning that this was a surprising regression. I realize it will likely not be addressed. Nevertheless, perhaps this issue could be a warning sign for others. Hacking the runtime can cause you delayed pain many years after :)

[1] #67401

Metadata

Metadata

Assignees

Labels

BugReportIssues describing a possible bug in the Go implementation.NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions