-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
Go version
go version go1.23.6 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/stan_cockroachlabs_com/.cache/go-build'
GOENV='/home/stan_cockroachlabs_com/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/stan_cockroachlabs_com/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/stan_cockroachlabs_com/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/stan_cockroachlabs_com/go/src/github.com/cockroachdb/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.6'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/stan_cockroachlabs_com/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1873938282=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Background
For years, CockroachDB has been using the go:linkname
hack to stay ahead of the runtime. In this particular case, we've been using go:linkname sched runtime.sched
for (periodically) getting a precise count of the runnable goroutines; in [1], you can see this has been running, at least as of Go 1.15.
The regression I'll describe below snuck into our master after the upgrade to Go 1.23 PR was merged [2]. Luckily, one of our nightlies tripped up [3]. What followed after was a very long (several hours) investigation. The issue is best illustrated with a pared down example.
Reproduction
Build the following code using go 1.22
and go 1.23
,
cat linkname_regression.go
package main
import _ "unsafe"
type schedt struct{}
//go:linkname sched runtime.sched
var sched schedt
func main() {
select {
default:
println("Hello World!")
}
}
Output of ./linkname_regression_1_22
,
Hello World!
Output of GOTRACEBACK=crash ./linkname_regression_1_23
,
Quit (core dumped)
dlv core ./linkname_regression_1_23 core.linkname_regres
(dlv) bt
0 0x0000000000465763 in runtime.futex
at /usr/local/go/src/runtime/sys_linux_amd64.s:558
1 0x000000000042b250 in runtime.futexsleep
at /usr/local/go/src/runtime/os_linux.go:69
2 0x0000000000409bc5 in runtime.lock2
at /usr/local/go/src/runtime/lock_futex.go:116
3 0x0000000000432b93 in runtime.lockWithRank
at /usr/local/go/src/runtime/lockrank_off.go:24
4 0x0000000000432b93 in runtime.lock
at /usr/local/go/src/runtime/lock_futex.go:52
5 0x0000000000432b93 in runtime.mcommoninit
at /usr/local/go/src/runtime/proc.go:932
6 0x0000000000432612 in runtime.schedinit
at /usr/local/go/src/runtime/proc.go:823
7 0x0000000000461a7c in runtime.rt0_go
at /usr/local/go/src/runtime/asm_amd64.s:349
(dlv) print runtime.sched
main.schedt {}
Summary
As of Go 1.23, go:linkname sched runtime.sched
, runtime.sched
is linked to the local sched
, instead of the other way around. This obviously causes memory corruption since the local struct is a subset of the target/remote. The fact of this regression can be further illustrated via objdump
, by comparing the sizes of the linked runtime.sched
; in 1.23, it's empty, and in 1.22 it has the expected size.
objdump -t ./linkname_regression_1_23 |grep "runtime.sched$"
000000000050d300 g O .noptrbss 0000000000000000 runtime.sched
objdump -t ./linkname_regression_1_22 |grep "runtime.sched$"
00000000004daf00 g O .bss 0000000000001aa8 runtime.sched
[1] https://github.com/search?q=%22go%3Alinkname+sched+runtime.sched%22&type=code
[2] cockroachdb/cockroach#140626
[3] cockroachdb/cockroach#141977 (comment)
What did you see happen?
Test executions would non-deterministically fail, suggesting some form of memory corruption. In hindsight, there wasn't any obvious change in [2], which could have caused it.
What did you expect to see?
We expected the go:linkname
hack to continue working. The changes described in [1] don't mention the fact that a "Handshake" would result in linking the local (user) struct into the runtime. Granted the use of go:linkname
is a hack, it has been tacitly supported until 1.23.
We have a fix, which basically moves our code into a forked version of the runtime. I think it's still worth mentioning that this was a surprising regression. I realize it will likely not be addressed. Nevertheless, perhaps this issue could be a warning sign for others. Hacking the runtime can cause you delayed pain many years after :)
[1] #67401
Metadata
Metadata
Assignees
Labels
Type
Projects
Status