Skip to content

runtime: scheduler: go-routine starvation #21053

@prasannavl

Description

@prasannavl

What version of Go are you using (go version)?

go version go1.8.3 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=D:\Workspace\Golang
set GORACE=
set GOROOT=D:\Apps\Scoop\apps\go\current
set GOTOOLDIR=D:\Apps\Scoop\apps\go\current\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1
set PKG_CONFIG=pkg-config
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2

What did you do?

Consider the below program:

package main

import (
	"fmt"
	"sync/atomic"
	"time"
)

func main() {

	var ops uint64 = 0
        // This is the key value. When this is >= to the num 
        // of physical cores, the program will never terminate. 
        // Otherwise it should work as expected.
        iters := 4
	for i := 0; i < iters; i++ {
		go func() {
			for {
				atomic.AddUint64(&ops, 1)
			}
		}()
	}

	// Wait a second to allow some ops to accumulate.
	time.Sleep(time.Second)

	opsFinal := atomic.LoadUint64(&ops)
	fmt.Println("ops:", opsFinal)
}

I was trying to get an estimate of the ops of atomics and mutexes. I wrote the above example. When instead of atomics, mutexes are used, it works as expected. However, when using atomics, when the number of go-routines are greater than the number of the actual physical threads on the system, the goroutine that's waiting on the timer (the main go-routine) is starved - I'm guessing is because of the CAS semantics of atomics that staves the runtime due to SpinWaits.

What did you expect to see?

The program actually completes in a near a second.

What did you see instead?

The program never ends using 100% of the CPU.

This tells me that the scheduler ends up starving the goroutine that's waiting on the timer. Is it possible to improve the scheduler to detect this somehow and not end up starving? This is a unique problem due to the nature of go-routines. Coming from a .NET eco-system, where the Task units are run from a thread pool, which uses physical threads, the OS scheduler handles this smoothly, and this was quite a surprise to me - but makes sense considering the nature of go-routines.

UPDATE

Surprisingly, looks like it doesn't actually have anything to do with atomics. Just turning it into an empty loop, causes the main go-routine to be starved. Quite confused on what to make of this. Is this some kind of a scheduler bug?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions