Previously trying out the nemesis scheduler on chap03 yielded good performance
results. However, using the default sherwood scheduler and turning work
stealing off did not, meaning that there is some additional (and pretty
substantial in some cases) overhead of using multi-threaded shepherds even
without work stealing.
This turns on the nemesis scheduler for chap03 and chap04. For chap03, I expect
this to resolve all performance issues. For chap04 I'm not as sure. I don't
think it will resolve all but there are still 2 key differences between the
qthreads setup and the fifo one on chap04: we're only uses the cores for
qthhreads (dataParTasksPerLocale=8) instead of the hypterthreads as well. And
we also have affinity turned on for qthreads which could have more of an impact
on chap04 than chap03.