<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.1">Jekyll</generator><link href="https://tbrindus.ca/feed.xml" rel="self" type="application/atom+xml" /><link href="https://tbrindus.ca/" rel="alternate" type="text/html" /><updated>2022-01-04T21:59:04+00:00</updated><id>https://tbrindus.ca/feed.xml</id><title type="html">Tudor Brindus</title><subtitle>Hi, I'm Tudor, and this is my slice of the internet :)</subtitle><entry><title type="html">Sometimes, the kernel lies about process memory usage</title><link href="https://tbrindus.ca/sometimes-the-kernel-lies-about-process-memory-usage/" rel="alternate" type="text/html" title="Sometimes, the kernel lies about process memory usage" /><published>2021-07-05T00:00:00+00:00</published><updated>2021-07-05T00:00:00+00:00</updated><id>https://tbrindus.ca/sometimes-the-kernel-lies-about-process-memory-usage</id><content type="html" xml:base="https://tbrindus.ca/sometimes-the-kernel-lies-about-process-memory-usage/">&lt;p&gt;Here's a short systems debugging story.&lt;/p&gt;

&lt;p&gt;On &lt;a href=&quot;https://dmoj.ca&quot;&gt;dmoj.ca&lt;/a&gt;, we run user-submitted solutions to algorithmic programming problems against a set of input files, and judge their output for correctness. One metric by which solutions are ranked on our leaderboards is memory usage. A user recently reported that some code they had submitted was reported as having consumed 4 KiB of memory, despite their code allocating a 128 KiB array. How come?&lt;/p&gt;

&lt;p&gt;This is a story about how sometimes, the kernel lies about memory usage — all in the name of performance.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;To start with, here's the (slightly edited) submission in question, solving&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; &lt;a href=&quot;https://dmoj.ca/problem/ccc11s1&quot;&gt;this problem&lt;/a&gt; in Zig:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@import&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;std&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;pub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nb&quot;&gt;@setRuntimeSafety&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;mem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;Allocator&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;heap&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;page_allocator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;u8&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;allocator&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;alloc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;131072&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stdin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;io&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;getStdIn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;inStream&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stdin&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;readAll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;u32&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;u32&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'s'&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'S'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'t'&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;or&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;'T'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stdout&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;io&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;getStdOut&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;outStream&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stdout&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;writeAll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;French&quot;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;English&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now, I don't actually know Zig, but this code seems to pretty clearly allocate a 128 KiB array on the heap.&lt;/p&gt;

&lt;p&gt;An early thought I had was, &quot;what if the array is only allocated virtual address space for, but never faulted in its entirety since the input to the program is small?&quot; Then I checked, and it turns out the input to this problem is quite large.&lt;/p&gt;

&lt;p&gt;The way the judge determines memory usage is by &lt;code class=&quot;highlighter-rouge&quot;&gt;wait4(2)&lt;/code&gt;ing on the submission process until it exits, and then parsing the contents of &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/${pid}/status&lt;/code&gt; for &lt;code class=&quot;highlighter-rouge&quot;&gt;VmHWM&lt;/code&gt;&lt;sup id=&quot;fnref:2&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt; — the &quot;high watermark RSS usage&quot; of the process.&lt;/p&gt;

&lt;p&gt;Anyway, enough beating around the bush, let's run some code in GDB.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ zig build-exe --release-safe test.zig --name test
$ gdb ./test
(gdb) catch syscall exit_group
Catchpoint 1 (syscall 'exit_group' [231])
(gdb) run
Starting program: /tmp/test 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I set up a breakpoint on &lt;code class=&quot;highlighter-rouge&quot;&gt;exit_group(2)&lt;/code&gt; so that we can inspect the state right before the process exits.&lt;/p&gt;

&lt;p&gt;Since the program asked for some input, I gave it some sample test data from the problem.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;3
The red cat sat on the mat.
Why are you so sad cat?
Don't ask that.
^D

English

Catchpoint 1 (call to syscall exit_group), std.os.linux.exit_group (status=&amp;lt;optimized out&amp;gt;)
    at /opt/zig/lib/zig/std/os/linux.zig:556
556	   unreachable;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The program read the input, outputted the answer (&lt;code class=&quot;highlighter-rouge&quot;&gt;English&lt;/code&gt;), and hit our breakpoint on &lt;code class=&quot;highlighter-rouge&quot;&gt;exit_group(2)&lt;/code&gt;. Time to poke around and see what we can find.&lt;/p&gt;

&lt;p&gt;First, we can confirm that &lt;code class=&quot;highlighter-rouge&quot;&gt;VmRSS&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;VmHWM&lt;/code&gt; for this process indeed still say 4 KiB.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ grep -E 'Vm(HWM|RSS)' /proc/9253/status 
VmHWM:	     4 kB
VmRSS:	     4 kB
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That's certainly odd, but confirms what the user reported.&lt;/p&gt;

&lt;p&gt;Back to GDB, where is this &lt;code class=&quot;highlighter-rouge&quot;&gt;input&lt;/code&gt; array actually located?&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(gdb) info proc mappings
process 9253
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
            0x200000           0x202000     0x2000        0x0 /tmp/test
            0x202000           0x213000    0x11000     0x1000 /tmp/test
            0x213000           0x214000     0x1000    0x11000 /tmp/test
      0x7ffff7fd9000     0x7ffff7ff9000    0x20000        0x0 
      0x7ffff7ff9000     0x7ffff7ffd000     0x4000        0x0 [vvar]
      0x7ffff7ffd000     0x7ffff7fff000     0x2000        0x0 [vdso]
      0x7ffffffde000     0x7ffffffff000    0x21000        0x0 [stack]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Making an educated guess, it's located at &lt;code class=&quot;highlighter-rouge&quot;&gt;0x7ffff7fd9000&lt;/code&gt;, since the size &lt;code class=&quot;highlighter-rouge&quot;&gt;0x20000&lt;/code&gt; is 128 KiB.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(gdb) x/32c 0x7ffff7fd9000
0x7ffff7fd9000:	51 '3'	10 '\n'	84 'T'	104 'h'	101 'e'	32 ' '	114 'r'	101 'e'
0x7ffff7fd9008:	100 'd'	32 ' '	99 'c'	97 'a'	116 't'	32 ' '	115 's'	97 'a'
0x7ffff7fd9010:	116 't'	32 ' '	111 'o'	110 'n'	32 ' '	116 't'	104 'h'	101 'e'
0x7ffff7fd9018:	32 ' '	109 'm'	97 'a'	116 't'	46 '.'	10 '\n' 87  'W' 104 'h'
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Bingo, that's our sample input.&lt;/p&gt;

&lt;p&gt;After staring at this for a few minutes, I had a bit of inspiration and took a look at &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/${pid}/smaps&lt;/code&gt;, which reports per-segment information in more detail.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ cat /proc/9253/smaps
...
7ffff7fd9000-7ffff7ff9000 rw-p 00000000 00:00 0 
Size:                128 kB
...
Rss:                 128 kB
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here, &lt;code class=&quot;highlighter-rouge&quot;&gt;Rss&lt;/code&gt; is clearly being reported as 128 KiB for our allocation. Why is &lt;code class=&quot;highlighter-rouge&quot;&gt;VmRSS&lt;/code&gt; not agreeing?&lt;/p&gt;

&lt;p&gt;Armed with the knowledge that something funky is up with the RSS reporting, I turned to the time-honored tradition of grepping the kernel source for vague strings. In this case, searching for &lt;code class=&quot;highlighter-rouge&quot;&gt;rss&lt;/code&gt; quickly brings up a likely culprit in &lt;a href=&quot;https://elixir.bootlin.com/linux/v5.10/source/mm/memory.c#L191&quot;&gt;lines 191-200 of &lt;code class=&quot;highlighter-rouge&quot;&gt;mm/memory.c&lt;/code&gt;&lt;/a&gt; (as of kernel v5.10).&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cm&quot;&gt;/* sync counter once per 64 page faults */&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#define TASK_RSS_EVENTS_THRESH	(64)
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;check_sync_rss_stat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;task_struct&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unlikely&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;current&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
		&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;unlikely&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rss_stat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;events&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TASK_RSS_EVENTS_THRESH&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
		&lt;span class=&quot;n&quot;&gt;sync_mm_rss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;task&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#else &lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* SPLIT_RSS_COUNTING */&lt;/span&gt;&lt;span class=&quot;cp&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&quot;Sync counter once every 64 page faults&quot;, well, that'd do it. When, and why was this code added?&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;((v5.10)) $ git log -L191,+1:mm/memory.c
commit 34e55232e59f7b19050267a05ff1226e5cd122a5
Author: KAMEZAWA Hiroyuki &amp;lt;kamezawa.hiroyu@jp.fujitsu.com&amp;gt;
Date:   Fri Mar 5 13:41:40 2010 -0800

    mm: avoid false sharing of mm_counter
    
    Considering the nature of per mm stats, it's the shared object among
    threads and can be a cache-miss point in the page fault path.
    
    This patch adds per-thread cache for mm_counter.  RSS value will be
    counted into a struct in task_struct and synchronized with mm's one at
    events.
    
    Now, in this patch, the event is the number of calls to handle_mm_fault.
    Per-thread value is added to mm at each 64 calls.
    
     rough estimation with small benchmark on parallel thread (2threads) shows
     [before]
         4.5 cache-miss/faults
     [after]
         4.0 cache-miss/faults
     Anyway, the most contended object is mmap_sem if the number of threads grows.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: KAMEZAWA Hiroyuki &amp;lt;kamezawa.hiroyu@jp.fujitsu.com&amp;gt;
    Cc: Minchan Kim &amp;lt;minchan.kim@gmail.com&amp;gt;
    Cc: Christoph Lameter &amp;lt;cl@linux-foundation.org&amp;gt;
    Cc: Lee Schermerhorn &amp;lt;lee.schermerhorn@hp.com&amp;gt;
    Cc: David Rientjes &amp;lt;rientjes@google.com&amp;gt;
    Signed-off-by: Andrew Morton &amp;lt;akpm@linux-foundation.org&amp;gt;
    Signed-off-by: Linus Torvalds &amp;lt;torvalds@linux-foundation.org&amp;gt;

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -125,0 +152,1 @@
+/* sync counter once per 64 page faults */
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, this is a performance optimization. Instead of all threads contending on updates to the same global RSS counters, each thread maintains their own counters and only update the global counters every 64 page faults. From afar and without any deeper context, this seems reasonable.&lt;/p&gt;

&lt;p&gt;Since Zig doesn't link libc in all its 1.8 MiB glory, our Zig program finishes executing having faulted less than 64 pages, and is therefore incorrectly reported in the global counters — and thus &lt;code class=&quot;highlighter-rouge&quot;&gt;VmHWM&lt;/code&gt; — as having only faulted a single 4 KiB page.&lt;/p&gt;

&lt;p&gt;Three things stand out to me from this:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;There is no real workaround if you care about the HWM, like we do. &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/${pid}/smaps&lt;/code&gt; provides accurate info for RSS, but not HWM (which only makes sense globally, not per-segment).&lt;/li&gt;
  &lt;li&gt;There is no way to turn this off, even at compile time. For getting accurate results on &lt;a href=&quot;https://dmoj.ca&quot;&gt;dmoj.ca&lt;/a&gt;, we're now running a patched kernel where we quite literally comment this code out. Maintaining patched kernels makes me sad.&lt;/li&gt;
  &lt;li&gt;The global counters are not synced on thread exit. This one kind of sounds like a bug; one can imagine a program &lt;code class=&quot;highlighter-rouge&quot;&gt;mmap(2)&lt;/code&gt;ing a large chunk of memory and spinning up many threads that each fault 63 pages before exiting. &lt;code class=&quot;highlighter-rouge&quot;&gt;VmRSS&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;VmHWM&lt;/code&gt; sound like they'd be wildly off in this case.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It turns out we're not the first to run into this inaccuracy. Prior to a patch from October 2020 titled &quot;&lt;a href=&quot;https://lore.kernel.org/linux-man/20201012153313.GI29725@dhcp22.suse.cz/T/#m777c32932711d629353b3bb000695f8f6325fdc2&quot;&gt;Document inaccurate RSS due to &lt;code class=&quot;highlighter-rouge&quot;&gt;SPLIT_RSS_COUNTING&lt;/code&gt;&lt;/a&gt;&quot;, this behavior was totally undocumented. (The patch updates &lt;code class=&quot;highlighter-rouge&quot;&gt;man 5 proc&lt;/code&gt; with a note regarding the inaccuracy in RSS accounting, but as of this writing there's still no mention in &lt;code class=&quot;highlighter-rouge&quot;&gt;man 2 getrusage&lt;/code&gt;&lt;sup id=&quot;fnref:3&quot;&gt;&lt;a href=&quot;#fn:3&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;.)&lt;/p&gt;

&lt;p&gt;The thread is worth a read, but in summary:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;There are weird cases where the accounting can be off by more than 63 pages per thread; and&lt;/li&gt;
  &lt;li&gt;There is uncertainty among the kernel maintainers about whether the performance benefit of split counters is not outweighed by the poor accounting. That's a +1 from me, at least.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…and, that's all I got for today.&lt;/p&gt;

&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot;&gt;
      &lt;p&gt;This code does a &quot;classic&quot; competitive programming trick of pre-buffering the entire input in order to avoid calling &lt;code class=&quot;highlighter-rouge&quot;&gt;read(2)&lt;/code&gt; more than necessary. System calls are expensive! &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:2&quot;&gt;
      &lt;p&gt;Why not just use the &lt;code class=&quot;highlighter-rouge&quot;&gt;struct rusage *rusage&lt;/code&gt; populated by &lt;code class=&quot;highlighter-rouge&quot;&gt;wait4(2)&lt;/code&gt;, and grab &lt;code class=&quot;highlighter-rouge&quot;&gt;ru_maxrss&lt;/code&gt; from it instead of parsing &lt;code class=&quot;highlighter-rouge&quot;&gt;VmHWM&lt;/code&gt; out of &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/${pid}/status&lt;/code&gt;? This is subtle enough that &lt;a href=&quot;https://quantum2.xyz&quot;&gt;Guanzhong&lt;/a&gt; had to point it out while reading an early draft of this post, but &lt;code class=&quot;highlighter-rouge&quot;&gt;ru_maxrss&lt;/code&gt; is reset on &lt;code class=&quot;highlighter-rouge&quot;&gt;fork(2)&lt;/code&gt;, while &lt;code class=&quot;highlighter-rouge&quot;&gt;VmHWM&lt;/code&gt; is reset on &lt;code class=&quot;highlighter-rouge&quot;&gt;exec(2)&lt;/code&gt;. If we were to use &lt;code class=&quot;highlighter-rouge&quot;&gt;ru_maxrss&lt;/code&gt;, the minimum possible memory usage reported by a submission would be that of the judge process at &lt;code class=&quot;highlighter-rouge&quot;&gt;fork(2)&lt;/code&gt; time. The judge is written in Python, so this would be tens of megabytes. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
    &lt;li id=&quot;fn:3&quot;&gt;
      &lt;p&gt;Time permitting, I intend to send in a patch updating &lt;code class=&quot;highlighter-rouge&quot;&gt;man 2 getrusage&lt;/code&gt;, and maybe another for syncing the counters on thread exit. Or maybe &lt;code class=&quot;highlighter-rouge&quot;&gt;getrusage(2)&lt;/code&gt; is somehow fine? I haven't checked this. &lt;a href=&quot;#fnref:3&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Here's a short systems debugging story. On dmoj.ca, we run user-submitted solutions to algorithmic programming problems against a set of input files, and judge their output for correctness. One metric by which solutions are ranked on our leaderboards is memory usage. A user recently reported that some code they had submitted was reported as having consumed 4 KiB of memory, despite their code allocating a 128 KiB array. How come? This is a story about how sometimes, the kernel lies about memory usage — all in the name of performance.</summary></entry><entry><title type="html">Peeking under the hood of GCC's `__builtin_expect`</title><link href="https://tbrindus.ca/how-builtin-expect-works/" rel="alternate" type="text/html" title="Peeking under the hood of GCC's `__builtin_expect`" /><published>2020-03-23T00:00:00+00:00</published><updated>2020-03-23T00:00:00+00:00</updated><id>https://tbrindus.ca/how-builtin-expect-works</id><content type="html" xml:base="https://tbrindus.ca/how-builtin-expect-works/">&lt;p&gt;If you've ever poked at high-performance C code, you've probably seen &lt;a href=&quot;https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect&quot;&gt;GCC's
&lt;code class=&quot;highlighter-rouge&quot;&gt;__builtin_expect&lt;/code&gt; extension&lt;/a&gt; being used to manually hint the likelihood of a
branch being taken a particular way.&lt;/p&gt;

&lt;p&gt;The Linux kernel famously contains macros for &lt;code class=&quot;highlighter-rouge&quot;&gt;likely&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;unlikely&lt;/code&gt; branches,
which perform the appropriate &lt;code class=&quot;highlighter-rouge&quot;&gt;__builtin_expect&lt;/code&gt; incantations.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr)   __builtin_expect(!!(expr), 1)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;…but, how does this all work? What does &quot;hinting&quot; mean, exactly, and how does
&lt;code class=&quot;highlighter-rouge&quot;&gt;__builtin_expect&lt;/code&gt; translate to generated assembly?&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Let's write a short exploratory program to find out.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;volatile&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;__builtin_expect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EXPECT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The program returns 1 if the number of parameters it is passed is even, and 0
otherwise (remember that the executable name is, &lt;a href=&quot;https://stackoverflow.com/questions/2050961/is-argv0-name-of-executable-an-accepted-standard-or-just-a-common-conventi&quot;&gt;&lt;em&gt;by convention&lt;/em&gt;&lt;/a&gt;, always
passed in &lt;code class=&quot;highlighter-rouge&quot;&gt;argv[0]&lt;/code&gt;). We use a compile-time define, &lt;code class=&quot;highlighter-rouge&quot;&gt;EXPECT&lt;/code&gt;, as a parameter to
&lt;code class=&quot;highlighter-rouge&quot;&gt;__builtin_expect&lt;/code&gt;. Our return value, &lt;code class=&quot;highlighter-rouge&quot;&gt;x&lt;/code&gt;, is marked as volatile to prevent the
compiler from optimizing it out.&lt;/p&gt;

&lt;p&gt;We can compile two versions of this binary: one with &lt;code class=&quot;highlighter-rouge&quot;&gt;EXPECT = 1&lt;/code&gt; and one
with &lt;code class=&quot;highlighter-rouge&quot;&gt;EXPECT = 0&lt;/code&gt;, and see how they differ. Recall that for &lt;code class=&quot;highlighter-rouge&quot;&gt;EXPECT = 1&lt;/code&gt;, we
are telling the compiler that we expect the &lt;code class=&quot;highlighter-rouge&quot;&gt;x = 1&lt;/code&gt; branch to be more likely,
and vice-versa.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gcc &lt;span class=&quot;nt&quot;&gt;-g&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-O2&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-DEXPECT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;1 expect.c &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; expect1
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gcc &lt;span class=&quot;nt&quot;&gt;-g&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-O2&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-DEXPECT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;0 expect.c &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; expect0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are many ways to view the generated assembly of a binary, but for this
post I'll be using an invocation of &lt;code class=&quot;highlighter-rouge&quot;&gt;gdb&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ gdb -batch -ex &quot;disassemble/m main&quot; ./expect1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;(If you are following along online, you can &lt;a href=&quot;https://gcc.godbolt.org/z/DtfPRa&quot;&gt;check out this code on gcc.godbolt.org&lt;/a&gt;, and play with the value of &lt;code class=&quot;highlighter-rouge&quot;&gt;EXPECT&lt;/code&gt; in the top-right box.)&lt;/p&gt;

&lt;p&gt;Without further ado, below is the disassembly of &lt;code class=&quot;highlighter-rouge&quot;&gt;expect1&lt;/code&gt;:&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-asm&quot;&gt;Dump of assembler code for function main:
1	int main(int argc, char **argv) {

2	  volatile int x;

3	
4	  if (__builtin_expect(argc % 2, EXPECT)) {
   0x0000000000001040 &amp;lt;+0&amp;gt;:	and    edi,0x1
   0x0000000000001043 &amp;lt;+3&amp;gt;:	je     0x1052 &amp;lt;main+18&amp;gt;

5	    x = 1;
   0x0000000000001045 &amp;lt;+5&amp;gt;:	mov    DWORD PTR [rsp-0x4],0x1

6	  } else {
7	    x = 0;
   0x0000000000001052 &amp;lt;+18&amp;gt;:	mov    DWORD PTR [rsp-0x4],0x0
   0x000000000000105a &amp;lt;+26&amp;gt;:	jmp    0x104d &amp;lt;main+13&amp;gt;

8	  }
9	
10	  return x;
   0x000000000000104d &amp;lt;+13&amp;gt;:	mov    eax,DWORD PTR [rsp-0x4]
   0x0000000000001051 &amp;lt;+17&amp;gt;:	ret

End of assembler dump.
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;A short summary of what's happening here:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;edi&lt;/code&gt; stores the value of &lt;code class=&quot;highlighter-rouge&quot;&gt;argc&lt;/code&gt; (&lt;a href=&quot;https://wiki.osdev.org/System_V_ABI#x86-64&quot;&gt;System V x86-64 ABI&lt;/a&gt;; recall &lt;code class=&quot;highlighter-rouge&quot;&gt;edi&lt;/code&gt; is
the lower 32 bits of &lt;code class=&quot;highlighter-rouge&quot;&gt;rdi&lt;/code&gt;).&lt;/li&gt;
  &lt;li&gt;Since division is expensive, GCC has replaced our &lt;code class=&quot;highlighter-rouge&quot;&gt;% 2&lt;/code&gt; with &lt;code class=&quot;highlighter-rouge&quot;&gt;&amp;amp; 1&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;If the lowest bit in &lt;code class=&quot;highlighter-rouge&quot;&gt;argc&lt;/code&gt; is 0, &lt;code class=&quot;highlighter-rouge&quot;&gt;je&lt;/code&gt; will jump to a &lt;code class=&quot;highlighter-rouge&quot;&gt;mov&lt;/code&gt; for &lt;code class=&quot;highlighter-rouge&quot;&gt;x = 0&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Otherwise, &lt;code class=&quot;highlighter-rouge&quot;&gt;x = 1&lt;/code&gt; will be executed, before &lt;code class=&quot;highlighter-rouge&quot;&gt;jmp&lt;/code&gt;-ing to the end of &lt;code class=&quot;highlighter-rouge&quot;&gt;main&lt;/code&gt; and
&lt;code class=&quot;highlighter-rouge&quot;&gt;ret&lt;/code&gt;-urning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Still, there's nothing that immediately stands out for where the branch
predictor hinting is happening. There's no magical &lt;code class=&quot;highlighter-rouge&quot;&gt;hint&lt;/code&gt; instruction, at any
rate.&lt;/p&gt;

&lt;p&gt;What if we were to &lt;code class=&quot;highlighter-rouge&quot;&gt;diff&lt;/code&gt; the disassembly of &lt;code class=&quot;highlighter-rouge&quot;&gt;expect0&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;expect1&lt;/code&gt; instead?&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gdb &lt;span class=&quot;nt&quot;&gt;-batch&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-ex&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;disassemble/m main&quot;&lt;/span&gt; ./expect1 &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; expect_1
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gdb &lt;span class=&quot;nt&quot;&gt;-batch&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-ex&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;disassemble/m main&quot;&lt;/span&gt; ./expect0 &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; expect_0
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;git diff expect_0 expect_1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now we're getting somewhere!&lt;/p&gt;

&lt;div class=&quot;language-diff highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;gh&quot;&gt;diff --git a/expect_0 b/expect_1
index a80a1bd..17f3458 100644
&lt;/span&gt;&lt;span class=&quot;gd&quot;&gt;--- a/expect_0
&lt;/span&gt;&lt;span class=&quot;gi&quot;&gt;+++ b/expect_1
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;@@ -6,15 +6,15 @@&lt;/span&gt; Dump of assembler code for function main:
 3
 4        if (__builtin_expect(argc &amp;amp; 1, EXPECT)) {
    0x0000000000001040 &amp;lt;+0&amp;gt;:    and    edi,0x1
&lt;span class=&quot;gd&quot;&gt;-   0x0000000000001043 &amp;lt;+3&amp;gt;:    jne    0x1052 &amp;lt;main+18&amp;gt;
&lt;/span&gt;&lt;span class=&quot;gi&quot;&gt;+   0x0000000000001043 &amp;lt;+3&amp;gt;:    je     0x1052 &amp;lt;main+18&amp;gt;
&lt;/span&gt;
 5          x = 1;
&lt;span class=&quot;gd&quot;&gt;-   0x0000000000001052 &amp;lt;+18&amp;gt;:   mov    DWORD PTR [rsp-0x4],0x1
-   0x000000000000105a &amp;lt;+26&amp;gt;:   jmp    0x104d &amp;lt;main+13&amp;gt;
&lt;/span&gt;&lt;span class=&quot;gi&quot;&gt;+   0x0000000000001045 &amp;lt;+5&amp;gt;:    mov    DWORD PTR [rsp-0x4],0x1
&lt;/span&gt;
 6        } else {
 7          x = 0;
&lt;span class=&quot;gd&quot;&gt;-   0x0000000000001045 &amp;lt;+5&amp;gt;:    mov    DWORD PTR [rsp-0x4],0x0
&lt;/span&gt;&lt;span class=&quot;gi&quot;&gt;+   0x0000000000001052 &amp;lt;+18&amp;gt;:   mov    DWORD PTR [rsp-0x4],0x0
+   0x000000000000105a &amp;lt;+26&amp;gt;:   jmp    0x104d &amp;lt;main+13&amp;gt;
&lt;/span&gt;
 8        }
 9
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Clearly, the branch order is reversed, and &lt;code class=&quot;highlighter-rouge&quot;&gt;jne&lt;/code&gt; is used in place of &lt;code class=&quot;highlighter-rouge&quot;&gt;je&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In other words, the &quot;preferred path&quot; for the branch predictor, when it has no
historical branch data to base a prediction off of, is the fall-through path
(i.e. the &lt;code class=&quot;highlighter-rouge&quot;&gt;else&lt;/code&gt; branch). &lt;code class=&quot;highlighter-rouge&quot;&gt;__builtin_expect&lt;/code&gt; then simply reorders code such that
the &lt;code class=&quot;highlighter-rouge&quot;&gt;else&lt;/code&gt; branch contains the programmer-specified most-likely path, and
negates the &lt;code class=&quot;highlighter-rouge&quot;&gt;if&lt;/code&gt; operand as necessary.&lt;/p&gt;

&lt;p&gt;To me at least, this behaviour &lt;em&gt;does&lt;/em&gt; seem pretty magical, and I was surprised
in not being able to readily find this mentioned online in the conventional
sources of programmer wisdom (i.e. StackOverflow).&lt;/p&gt;

&lt;p&gt;If one digs deep enough in the &lt;a href=&quot;https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf&quot;&gt;Intel 64 and IA-32 Architectures Optimization
Reference Manual&lt;/a&gt;, one can find a reference for this behaviour on page
105 (emphasis mine):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3.4.1.6 Branch Type Selection&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;The default predicted target for indirect branches and calls is the
fall-through path.&lt;/strong&gt; Fall-through prediction is overridden if and when a hardware
prediction is available for that branch. The predicted branch target from branch
prediction hardware for an indirect branch is the previously executed branch
target.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The more you know.&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">If you've ever poked at high-performance C code, you've probably seen GCC's __builtin_expect extension being used to manually hint the likelihood of a branch being taken a particular way. The Linux kernel famously contains macros for likely and unlikely branches, which perform the appropriate __builtin_expect incantations. #define unlikely(expr) __builtin_expect(!!(expr), 0) #define likely(expr) __builtin_expect(!!(expr), 1) …but, how does this all work? What does &quot;hinting&quot; mean, exactly, and how does __builtin_expect translate to generated assembly?</summary></entry><entry><title type="html">On online judging, part 5: optimizing `ptrace` filtering with `seccomp`</title><link href="https://tbrindus.ca/on-online-judging-part-5/" rel="alternate" type="text/html" title="On online judging, part 5: optimizing `ptrace` filtering with `seccomp`" /><published>2019-01-04T00:00:00+00:00</published><updated>2019-01-04T00:00:00+00:00</updated><id>https://tbrindus.ca/on-online-judging-part-5</id><content type="html" xml:base="https://tbrindus.ca/on-online-judging-part-5/">&lt;p&gt;In &lt;a href=&quot;https://tbrindus.ca/on-online-judging-part-1/&quot;&gt;part 1 of this series&lt;/a&gt;, I mentioned that the overhead of a pure
&lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;-based sandbox is about 10%. In hindsight, this number is very optimistic — it can be as high as 50%
for some workloads — but understanding &lt;em&gt;why&lt;/em&gt; requires a bit of background on how the judge keeps track of
submission time.&lt;/p&gt;

&lt;p&gt;In this post, we'll discuss both submission time-keeping, and a simple but effective method to reduce sandboxing
overhead using &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; alongside &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;understanding-the-problem&quot;&gt;Understanding the Problem&lt;/h2&gt;
&lt;p&gt;Any judge needs to keep track of how long a submission runs, to implement things like time limits.&lt;/p&gt;

&lt;p&gt;A simple method of accounting for time spent in a submission involves continuously waiting for a process to be
signalled, and keeping track of how long was spent &lt;code class=&quot;highlighter-rouge&quot;&gt;wait&lt;/code&gt;-ing. This is more &quot;fair&quot; than strictly timing how long
it takes until the process exits, since it excludes the time spent filtering syscalls in the judge code.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;wait&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;process&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;be&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;signalled&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;total_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time_limit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;kill&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;process&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;# If signal was for a syscall event (SIGTRAP) and not SIGWINCH, validate it
&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;resume&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;process&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We receive signals for all syscalls invocations when &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;-ing, but in case a process doesn't use any for a long
time, we also have set up a task to periodically send &lt;code class=&quot;highlighter-rouge&quot;&gt;SIGWINCH&lt;/code&gt; (a harmless signal that's normally ignored) just to
force &lt;code class=&quot;highlighter-rouge&quot;&gt;wait&lt;/code&gt; to return for time-keeping purposes.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;signal&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;process&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SIGWINCH&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, it's very simple to compute a naive measure of overhead: divide &lt;code class=&quot;highlighter-rouge&quot;&gt;total_time&lt;/code&gt; by the total CPU time used
by the process. Indeed, for most submissions, this figure will be less than 10% — but it doesn't tell the
whole story. What we don't (and can't) easily measure is the overhead of the context switch from the submission to the
tracer.&lt;/p&gt;

&lt;p&gt;Every time a context switch happens, the submission's performance suffers from the invalidation of the memory
it was relying on to be cached. For instance, tight loops over multi-dimensional matrices are very common, so
you'll often see user code that looks like this:&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;matrix&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# 10005 by 10005 matrix
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10005&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10005&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;matrix&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;# Some computation using the indices
&lt;/span&gt;    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;some&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;condition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# This will cause a `write` syscall
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This code should be very easy for the processor to optimize — the memory accesses are predictable, any there likely
would be very few page faults when running as a result of the memory pages used (for &lt;code class=&quot;highlighter-rouge&quot;&gt;matrix&lt;/code&gt;) being in the processor's TLB
(cache).&lt;/p&gt;

&lt;p&gt;When we force a context switch, however, we greatly increase the likelyhood of those cache entries being flushed by
the time we resume it (on processors without process-context identifiers, e.g. pre-2010 Intel processors, we basically
guarantee a TLB flush). So, when the process resumes, it will bear the overhead of having to miss the cache for memory
accesses that &lt;em&gt;should&lt;/em&gt; have been in cache if running without the tracer.&lt;/p&gt;

&lt;p&gt;As a result, when comparing the time the process takes to execute without the tracer versus with the tracer, we can
(for some extreme cases) see 2x or greater speedups, despite the overhead continuing to be reported as less than 10%.&lt;/p&gt;

&lt;h2 id=&quot;a-brief-introduction-to-seccomp&quot;&gt;A Brief Introduction to &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Now that we've dissected the problem, let's talk about the solution: &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; was initially introduced in kernel version 2.6.12 (2005), and allowed a one-way transition for a process to a
secure state where it could only invoke the &lt;code class=&quot;highlighter-rouge&quot;&gt;read&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;write&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;exit&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;sigreturn&lt;/code&gt; syscalls. This may be useful for
some secure computing projects, but it's not enough for running modern runtimes like Python. Thankfully, that's no longer the case.&lt;/p&gt;

&lt;p&gt;In 2012, as part of the 3.5 kernel, &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; was extended to provide programmable BPF filters. With this new
functionality, it's possible to write more expressive filters to implement syscall validation in the kernel, without
trapping into &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;. &lt;a href=&quot;https://lwn.net/Articles/656307/&quot;&gt;This LKML article&lt;/a&gt; provides a good, accessible overview
of how &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; BPF works.&lt;/p&gt;

&lt;p&gt;The key point is that we can add rules to &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; to unconditionally allow very frequent, safe syscalls, and instruct it to
hand control of the process over to a &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; tracer when a dangerous syscall is encountered.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;// In the tracer:&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ptrace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PTRACE_SETOPTIONS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;NULL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PTRACE_O_TRACESECCOMP&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* instead of PTRACE_O_TRACESYSGOOD */&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// In the submission, before `exec`-ing:&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;scmp_filter_ctx&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seccomp_init&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SECCOMP_RET_TRACE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;seccomp_rule_add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCMP_ACT_ALLOW&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCMP_SYS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;seccomp_rule_add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCMP_ACT_ALLOW&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SCMP_SYS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;seccomp_load&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;seccomp_release&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;execve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;With the above filter, &lt;code class=&quot;highlighter-rouge&quot;&gt;read&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;write&lt;/code&gt; will always be unconditionally allowed, while e.g. &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt; will cause a &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;
event and stop the process for inspection. That's just what we need!&lt;/p&gt;

&lt;h2 id=&quot;comparing-ptrace-and-ptrace--seccomp-tracing&quot;&gt;Comparing &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; + &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; Tracing&lt;/h2&gt;
&lt;p&gt;So far, we've established that &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; is slow, and that &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; can make things better.&lt;/p&gt;

&lt;p&gt;However, their interfaces are somewhat different, so it's worth discussing how standard &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; actions (like cancelling syscalls
and changing their return values) can be implemented with &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt;-enabled &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let's say a user submission wants to &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt; a file, and we end up denying it because it's a file they shouldn't be accessing. We want to &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt; to
return &lt;code class=&quot;highlighter-rouge&quot;&gt;ENOENT&lt;/code&gt;, so that the user submission can respond accordingly. Here's roughly what happens under the hood with &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;user submission calls &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;[pre-syscall event transfers control to judge]&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;judge reads syscall number and arguments from registers; validates them&lt;/li&gt;
  &lt;li&gt;judge sets syscall number to something harmless and fast (&lt;code class=&quot;highlighter-rouge&quot;&gt;getpid&lt;/code&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;[judge resumes process]&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;kernel executes &lt;code class=&quot;highlighter-rouge&quot;&gt;getpid&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;[post-syscall event transfers control to judge]&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;judge sets return value register to &lt;code class=&quot;highlighter-rouge&quot;&gt;ENOENT&lt;/code&gt;, thereby &quot;cancelling&quot; the &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt; syscall&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;[judge resumes process]&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;user submission's &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt; call returns with &lt;code class=&quot;highlighter-rouge&quot;&gt;ENOENT&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can see that for every syscall the user submission performs, we trap and stop the process two times. This is not at all
cache-friendly to the submission.&lt;/p&gt;

&lt;p&gt;When tracing with &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt;-enabled &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt;, however, we do not have pre- and post-syscall events — we are only notified
before an event takes place. To support functionality like cancelling syscalls, &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; allows tracers to
set the syscall register to &lt;code class=&quot;highlighter-rouge&quot;&gt;-1&lt;/code&gt; on the pre-syscall event. This will instruct the kernel will skip the syscall,
returning the register set as-is.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;user submission calls &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;[pre-syscall event transfers control to judge]&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;judge reads syscall number and arguments from registers; validates them&lt;/li&gt;
  &lt;li&gt;judge sets syscall number to &lt;code class=&quot;highlighter-rouge&quot;&gt;-1&lt;/code&gt;, and return value register to &lt;code class=&quot;highlighter-rouge&quot;&gt;ENOENT&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;[judge resumes process]&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;user submission's &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt; call returns with &lt;code class=&quot;highlighter-rouge&quot;&gt;ENOENT&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here, we only have two context switches as opposed to four, but more importantly, we only run through this logic
on unsafe syscalls like &lt;code class=&quot;highlighter-rouge&quot;&gt;open&lt;/code&gt;. Syscalls that dominate a submission's lifespan, like &lt;code class=&quot;highlighter-rouge&quot;&gt;read&lt;/code&gt; or &lt;code class=&quot;highlighter-rouge&quot;&gt;write&lt;/code&gt;, never trigger
&lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; to signal the process. That's a huge win for performance.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;As of January 2019, we've been taking a &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt; + &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace&lt;/code&gt; approach in the sandbox for the &lt;a href=&quot;https://dmoj.ca/&quot;&gt;DMOJ&lt;/a&gt;,
so as always you may &lt;a href=&quot;https://github.com/DMOJ/judge/tree/master/dmoj/cptbox&quot;&gt;peek at the source code&lt;/a&gt; to see how the
ideas expressed in this post can be implemented in practice.&lt;/p&gt;

&lt;p&gt;Empirically, speedups have been noticeable since upgrading the sandbox to use &lt;code class=&quot;highlighter-rouge&quot;&gt;seccomp&lt;/code&gt;. They are particularly apparent
for interactive tasks, which require frequent flushing of standard output, but can be felt to some extent across most problems.&lt;/p&gt;

&lt;p&gt;There are further optimizations possible with this approach (for instance, the order rules are added to the filter matters,
since they're evaluated in the order they were added — so there's an incentive to having the most common syscalls listed
first), but perhaps they'll be the subject of a future post.&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">In part 1 of this series, I mentioned that the overhead of a pure ptrace-based sandbox is about 10%. In hindsight, this number is very optimistic — it can be as high as 50% for some workloads — but understanding why requires a bit of background on how the judge keeps track of submission time. In this post, we'll discuss both submission time-keeping, and a simple but effective method to reduce sandboxing overhead using seccomp alongside ptrace.</summary></entry><entry><title type="html">Emulating microprocessors with macros</title><link href="https://tbrindus.ca/emulating-microprocessors-with-macros/" rel="alternate" type="text/html" title="Emulating microprocessors with macros" /><published>2018-12-11T00:00:00+00:00</published><updated>2018-12-11T00:00:00+00:00</updated><id>https://tbrindus.ca/emulating-microprocessors-with-macros</id><content type="html" xml:base="https://tbrindus.ca/emulating-microprocessors-with-macros/">&lt;p&gt;Whenever I work on an emulator (having written several in the past), I try to make my life as interesting as possible. After all, implementing hundreds of opcodes can be a very dull task.&lt;/p&gt;

&lt;p&gt;Most recently, I joked that C macros were powerful enough for it to be feasible to implement an simple architecture in them. One thing led to another, with the result being &lt;a href=&quot;https://github.com/Xyene/macro8080&quot;&gt;an Intel 8080 emulator core implemented purely with C macros&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, I'll go over the awful hacks that helped make this monstrosity a reality… and why perhaps it's not such a bad idea to write an emulator in macros.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;quick-intel-8080-refresher&quot;&gt;Quick Intel 8080 Refresher&lt;/h2&gt;

&lt;p&gt;The Intel 8080 is an 8-bit microprocessor with a 16-bit address bus, released back in 1974. It features 8-bit general-purpose registers &lt;code class=&quot;highlighter-rouge&quot;&gt;B&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;C&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;D&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;E&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;H&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;L&lt;/code&gt;, alongside an accumulator &lt;code class=&quot;highlighter-rouge&quot;&gt;A&lt;/code&gt;. The general-purpose registers can be treated as 16-bit &quot;pairs&quot; &lt;code class=&quot;highlighter-rouge&quot;&gt;BC&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;DE&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;HL&lt;/code&gt;, allowing for 16-bit operations to be performed. Memory is addressed through the register pair &lt;code class=&quot;highlighter-rouge&quot;&gt;HL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A table of supported opcodes and their mnemonics may be found &lt;a href=&quot;http://pastraiser.com/cpu/i8080/i8080_opcodes.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;motivation-for-macros&quot;&gt;Motivation for Macros&lt;/h2&gt;

&lt;p&gt;Macros don't just needlessly complicate the development of an emulator. There is, in fact, a very tangible benefit to using macros in implementing instructions — performance.&lt;/p&gt;

&lt;p&gt;A typical 8080 instruction is encoded as an 8-bit value, with register operands embedded in the opcode. The encoding for all &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV&lt;/code&gt; instructions is shown below.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;01 aaa bbb
|   |   └── source register
|   └── destination register
└── MOV prefix   
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For instance, the assembly &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV A, B&lt;/code&gt; is encoded as &lt;code class=&quot;highlighter-rouge&quot;&gt;01 111 000&lt;/code&gt;; &lt;code class=&quot;highlighter-rouge&quot;&gt;111&lt;/code&gt; identifies register &lt;code class=&quot;highlighter-rouge&quot;&gt;A&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;000&lt;/code&gt; identifies register &lt;code class=&quot;highlighter-rouge&quot;&gt;B&lt;/code&gt;.
A traditional emulator might implement &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV&lt;/code&gt; with a generic, lookup-based approach:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MOV&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;src&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dst&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;opcode&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mh&quot;&gt;0x7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;set_reg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dst&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_reg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// set_reg and get_reg are switch-based lookups&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;From a software engineering standpoint, this is an ideal implementation. &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV&lt;/code&gt; is succint, and &lt;code class=&quot;highlighter-rouge&quot;&gt;set_reg&lt;/code&gt;/&lt;code class=&quot;highlighter-rouge&quot;&gt;get_reg&lt;/code&gt; are dedicated helpers that can be reused in future code. A+ for code quality.&lt;/p&gt;

&lt;p&gt;And yet, this approach is suboptimal in the event that we're optimizing for performance, rather than readability. There is a large overhead (in terms of host machine cycles) for the &lt;code class=&quot;highlighter-rouge&quot;&gt;set_reg&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;get_reg&lt;/code&gt; routines that can't easily be eliminated. The compiler might end up inlining the routines and removing the overhead of the 2 calls, but the approach still requires mapping a 3-bit register ID to the actual register, twice per instruction.&lt;/p&gt;

&lt;p&gt;Instead, what if we used macros to generate code for all possible variants of an instruction? Illustrating with an example, &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV&lt;/code&gt; can be implemented like this:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#define MOV(X, Y) \
{                 \
    X = Y;        \
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course, we need to make sure that we generate code for all valid permutations of &lt;code class=&quot;highlighter-rouge&quot;&gt;X&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;Y&lt;/code&gt;, but this is easy to do programmatically.&lt;/p&gt;

&lt;h2 id=&quot;dispatching-instructions-to-macros&quot;&gt;Dispatching Instructions to Macros&lt;/h2&gt;

&lt;p&gt;At the core of every emulator is a tight loop that increments the program counter, fetches an opcode, and transfers control to the appropriate opcode handler. If all out opcodes are implemented as macros, how can we achieve this?&lt;/p&gt;

&lt;p&gt;One option is we can make use of GCC's support for &lt;a href=&quot;https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html&quot;&gt;taking the address of labels&lt;/a&gt; by generating a label for each opcode, and storing its address in a lookup table that we can later branch into. A straightforward application of this idea would look something like this:&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#define DONE goto done_opcode;
#define MOV(X, Y) \
{                 \
  X = Y;          \
  DONE;           \
}
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;run_forever&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// Declare all registers, memory, etc.&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  
  &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MOV_A_B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MOV_A_C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MOV_A_D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;goto&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ops&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;memory&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]];&lt;/span&gt;
    &lt;span class=&quot;nl&quot;&gt;done_opcode:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;nl&quot;&gt;MOV_A_B:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;MOV&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;B&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nl&quot;&gt;MOV_A_C:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;MOV&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;nl&quot;&gt;MOV_A_D:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;MOV&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The observant reader will notice that it's almost as if we're treating &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV&lt;/code&gt; as a function, substituting its invocation and return with &lt;code class=&quot;highlighter-rouge&quot;&gt;goto&lt;/code&gt;s. If we didn't want to rely on GCC-specific extensions, we could instead implement this functionality with regular functions. That said, by branching within the same function, we save the overhead stack frame management imposes.&lt;/p&gt;

&lt;p&gt;You can &lt;a href=&quot;https://gcc.godbolt.org/z/UCntF9&quot;&gt;check out this example on godbolt.org&lt;/a&gt; to view a colorized disassembly of the above code. It's worth noting that in the end, the greatest overhead for our &lt;code class=&quot;highlighter-rouge&quot;&gt;MOV&lt;/code&gt; instructions becomes &lt;code class=&quot;highlighter-rouge&quot;&gt;goto *ops[memory[PC++]]&lt;/code&gt;, which is an operation we'd have to perform regardless of how our opcode was implemented — good!&lt;/p&gt;

&lt;h2 id=&quot;handling-register-pairs&quot;&gt;Handling Register Pairs&lt;/h2&gt;

&lt;p&gt;As I mentioned earlier, the 8080 is an 8-bit processor that can work on 16-bit data in the form of &quot;register pairs&quot; &lt;code class=&quot;highlighter-rouge&quot;&gt;BC&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;DE&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;HL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If we want to be able to use macros everywhere, we need some efficient way to enforce consistency between the values assigned to &lt;code class=&quot;highlighter-rouge&quot;&gt;BC&lt;/code&gt;, and the values assigned to the individual &lt;code class=&quot;highlighter-rouge&quot;&gt;B&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;C&lt;/code&gt; registers. A simple solution here is to make use of a union of two 8-bit values and a 16-bit value, and some macros to hide the underlying complexity. Thankfully, &lt;a href=&quot;https://gcc.godbolt.org/z/CXjK7G&quot;&gt;GCC does a good job at optimizing these accesses&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;union&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;uint16_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;reg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regpair_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;register&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;register&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;regpair_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;de&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;cp&quot;&gt;#define B bc.reg[1]
#define C bc.reg[0]
#define BC bc.pair
#define D de.reg[1]
#define E de.reg[0]
#define DE de.pair
#define H hl.reg[1]
#define L hl.reg[0]
#define HL hl.pair
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will magically work on little-endian systems, due to the order in which the bytes of &lt;code class=&quot;highlighter-rouge&quot;&gt;pair&lt;/code&gt; are stored. You're out of luck on a big-endian system, but those are pretty rare to come by these days.&lt;/p&gt;

&lt;h2 id=&quot;putting-it-all-together&quot;&gt;Putting it All Together&lt;/h2&gt;

&lt;p&gt;With all the plumbing done, all that's left is to implement the rest of the opcodes. This isn't particularly hard, nor is it enlightening. One thing that deserves special attention is that due to our macros resembling functions, it is easy to forget that they are, in fact, macros — and that any use of a parameter re-evaluates it. This can lead to very subtle, hard-to-find bugs.&lt;/p&gt;

&lt;p&gt;In the end, a purely macro-based approach appears to be about an order of magnitude faster than a traditional function approach. This was tested (admittedly not rigorously) by measuring the time it took to complete all the test runs of the famous &lt;code class=&quot;highlighter-rouge&quot;&gt;8080EXER.COM&lt;/code&gt; program between &lt;a href=&quot;https://github.com/Xyene/macro8080&quot;&gt;macro8080&lt;/a&gt; (the embodiment of the ideas expressed in this post) and several other hobby emulators found on GitHub.&lt;/p&gt;

&lt;p&gt;I guess you &lt;em&gt;can&lt;/em&gt; sacrifice readability for performance!&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Whenever I work on an emulator (having written several in the past), I try to make my life as interesting as possible. After all, implementing hundreds of opcodes can be a very dull task. Most recently, I joked that C macros were powerful enough for it to be feasible to implement an simple architecture in them. One thing led to another, with the result being an Intel 8080 emulator core implemented purely with C macros. In this post, I'll go over the awful hacks that helped make this monstrosity a reality… and why perhaps it's not such a bad idea to write an emulator in macros.</summary></entry><entry><title type="html">Correct usage of `LD_PRELOAD` for hooking `libc` functions</title><link href="https://tbrindus.ca/correct-ld-preload-hooking-libc/" rel="alternate" type="text/html" title="Correct usage of `LD_PRELOAD` for hooking `libc` functions" /><published>2018-11-18T00:00:00+00:00</published><updated>2018-11-18T00:00:00+00:00</updated><id>https://tbrindus.ca/correct-ld-preload-hooking-libc</id><content type="html" xml:base="https://tbrindus.ca/correct-ld-preload-hooking-libc/">&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt; is a very powerful feature supported by the dynamic linker on most Unixes that allows shared libraries to be loaded before others (including &lt;code class=&quot;highlighter-rouge&quot;&gt;libc&lt;/code&gt;). This makes it very useful for hooking &lt;code class=&quot;highlighter-rouge&quot;&gt;libc&lt;/code&gt; functions to observe or modify the behaviour of 3rd-party applications to which you do not control the source.&lt;/p&gt;

&lt;p&gt;Unfortunately, a lot of what's been written on the subject online is subtly wrong — not wrong enough to fail outright — but just enough to bite you once when you expect it the least. In this post I'll first go over the incorrect approach often described, analyze why it's wrong, and then describe the easy fix.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;a-simple-program&quot;&gt;A simple program&lt;/h2&gt;
&lt;p&gt;Let's consider a simple C program that we'll be using to test. Our goal will be to track what files it's opening using &lt;code class=&quot;highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;FILE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ptr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;/etc/hosts&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;r&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;fclose&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Nothing special going on here — we can save it to &lt;code class=&quot;highlighter-rouge&quot;&gt;test.c&lt;/code&gt; and compile with:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gcc test.c &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;incorrectly-using-ld_preload-to-hook-fopen&quot;&gt;(Incorrectly) using &lt;code class=&quot;highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt; to hook &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Strictly speaking, &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt; is not the lowest-level you can get for opening files. &lt;code class=&quot;highlighter-rouge&quot;&gt;open(2)&lt;/code&gt; (and friends) is the syscall everything eventually trickles down to, but we can't intercept the syscall directly it with an &lt;code class=&quot;highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt; hook — that's what &lt;code class=&quot;highlighter-rouge&quot;&gt;ptrace(2)&lt;/code&gt; is for. At most, we could intercept its &lt;code class=&quot;highlighter-rouge&quot;&gt;libc&lt;/code&gt; wrapper. Nonetheless, hooking &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt; is enough for demonstration purposes.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#define _GNU_SOURCE
&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;dlfcn.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;FILE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fopen_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fopen_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;FILE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stderr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;called fopen(%s, %s)&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;__attribute__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constructor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;setup&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dlsym&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RTLD_NEXT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;fopen&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stderr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;called setup()&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can compile this code as a position-independent shared library, linking &lt;code class=&quot;highlighter-rouge&quot;&gt;libdl&lt;/code&gt; for &lt;code class=&quot;highlighter-rouge&quot;&gt;dlopen&lt;/code&gt;. Then by passing the full-path to it into the &lt;code class=&quot;highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt; environment variable, it gets loaded before &lt;code class=&quot;highlighter-rouge&quot;&gt;libc&lt;/code&gt; so &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt; gets resolved to our declaration.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gcc &lt;span class=&quot;nt&quot;&gt;-shared&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-fPIC&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-ldl&lt;/span&gt; preload_test.c &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; preload_test.so
&lt;span class=&quot;nv&quot;&gt;$ LD_PRELOAD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PWD&lt;/span&gt;/preload_test.so ./test
called setup&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
called fopen&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;/etc/hosts, r&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Let's provide a bit more background on what's going on. &lt;code class=&quot;highlighter-rouge&quot;&gt;__attribute__((constructor))&lt;/code&gt; is a GCC extension (that's supported by Clang too) which places a pointer to &lt;code class=&quot;highlighter-rouge&quot;&gt;setup&lt;/code&gt; in &lt;code class=&quot;highlighter-rouge&quot;&gt;preload_test&lt;/code&gt;s &lt;code class=&quot;highlighter-rouge&quot;&gt;.ctors&lt;/code&gt; section. The loader then knows to execute the function before anything else (in particular, before &lt;code class=&quot;highlighter-rouge&quot;&gt;main&lt;/code&gt; is called). In our setup function, we ask &lt;code class=&quot;highlighter-rouge&quot;&gt;libdl&lt;/code&gt; for the next (&lt;code class=&quot;highlighter-rouge&quot;&gt;RTLD_NEXT&lt;/code&gt;) resolution of &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt; — this should be &lt;code class=&quot;highlighter-rouge&quot;&gt;libc&lt;/code&gt;'s — and keep a pointer to it. When our &lt;code class=&quot;highlighter-rouge&quot;&gt;test&lt;/code&gt; executable runs and opens &lt;code class=&quot;highlighter-rouge&quot;&gt;/etc/hosts&lt;/code&gt;, our hooked &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt; is caled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is what a lot of articles online get wrong.&lt;/strong&gt; Sure, it seems to work for our simple test, but let's try a &quot;real&quot; application, like &lt;code class=&quot;highlighter-rouge&quot;&gt;ssh&lt;/code&gt;. If you're following along on your own, note that &lt;code class=&quot;highlighter-rouge&quot;&gt;ssh&lt;/code&gt; may or may not exhibit this behaviour on your system, depending on how it was compiled and how your system is set up.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ LD_PRELOAD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PWD&lt;/span&gt;/preload_test.so ssh
called fopen&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;/proc/filesystems, r&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
Segmentation fault
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Oops!&lt;/p&gt;

&lt;p&gt;So, what's going on? It's clear that our &lt;code class=&quot;highlighter-rouge&quot;&gt;setup&lt;/code&gt; was never called, which means that when we try to invoke &lt;code class=&quot;highlighter-rouge&quot;&gt;real_fopen&lt;/code&gt;, we're dealing with a null pointer. Basic stuff, but why? We can use &lt;code class=&quot;highlighter-rouge&quot;&gt;valgrind&lt;/code&gt; to get a better idea of what's going on (some &lt;code class=&quot;highlighter-rouge&quot;&gt;valgrind&lt;/code&gt; output omitted for brevity).&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ LD_PRELOAD=$PWD/preload_test.so valgrind --tool=memcheck ssh
called setup()
called setup()
called fopen(/proc/filesystems, r)
==2108== Jump to the invalid address stated on the next line
==2108==    at 0x0: ???
==2108==    by 0x5048B0D: selinuxfs_exists (in /lib/x86_64-linux-gnu/libselinux.so.1)
==2108==    by 0x5040D97: ??? (in /lib/x86_64-linux-gnu/libselinux.so.1)
==2108==    by 0x400F859: call_init.part.0 (dl-init.c:72)
==2108==    by 0x400F96A: call_init (dl-init.c:30)
==2108==    by 0x400F96A: _dl_init (dl-init.c:120)
==2108==    by 0x4000C59: ??? (in /lib/x86_64-linux-gnu/ld-2.24.so)
==2108==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2108== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This paints a clear picture of what's happening. &lt;code class=&quot;highlighter-rouge&quot;&gt;ssh&lt;/code&gt; depends on &lt;code class=&quot;highlighter-rouge&quot;&gt;libselinux&lt;/code&gt;, which defines its own constructor that tries &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt;-ing &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/filesystems&lt;/code&gt;. At this point in time, our &lt;code class=&quot;highlighter-rouge&quot;&gt;setup&lt;/code&gt; has not been called by the linker, but &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt; &lt;em&gt;has&lt;/em&gt; been resolved to ours. As a result, we end up invoking an uninitialized pointer and segfault.&lt;/p&gt;

&lt;h2 id=&quot;correctly-using-ld_preload-to-hook-fopen&quot;&gt;(Correctly) using &lt;code class=&quot;highlighter-rouge&quot;&gt;LD_PRELOAD&lt;/code&gt; to hook &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;With our investigation over, the fix is very simple: don't depend on a constructor to resolve &lt;code class=&quot;highlighter-rouge&quot;&gt;libc&lt;/code&gt;'s &lt;code class=&quot;highlighter-rouge&quot;&gt;fopen&lt;/code&gt;, and do it on demand when it's first needed.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#define _GNU_SOURCE
&lt;/span&gt;
&lt;span class=&quot;cp&quot;&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;dlfcn.h&amp;gt;
&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;FILE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fopen_t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;fopen_t&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kt&quot;&gt;FILE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stderr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;called fopen(%s, %s)&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dlsym&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RTLD_NEXT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;fopen&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;real_fopen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pathname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;__attribute__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;constructor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;setup&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;fprintf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stderr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;called setup()&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And now, after recompiling we can see that it works as expected:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;gcc &lt;span class=&quot;nt&quot;&gt;-shared&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-fPIC&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-ldl&lt;/span&gt; preload_test.c &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; preload_test.so
&lt;span class=&quot;nv&quot;&gt;$ LD_PRELOAD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PWD&lt;/span&gt;/preload_test.so ssh
called fopen&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;/proc/filesystems, r&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
called fopen&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;/proc/mounts, r&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
called setup&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
called fopen&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;/etc/passwd, rme&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
usage: ssh &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-1246AaCfGgKkMNnqsTtVvXxYy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-b&lt;/span&gt; bind_address] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; cipher_spec]
           &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-D&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;bind_address:]port] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-E&lt;/span&gt; log_file] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-e&lt;/span&gt; escape_char]
           &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-F&lt;/span&gt; configfile] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-I&lt;/span&gt; pkcs11] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; identity_file]
           &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-J&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;user@]host[:port]] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-L&lt;/span&gt; address] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-l&lt;/span&gt; login_name] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-m&lt;/span&gt; mac_spec]
           &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-O&lt;/span&gt; ctl_cmd] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; option] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; port] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-Q&lt;/span&gt; query_option] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-R&lt;/span&gt; address]
           &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-S&lt;/span&gt; ctl_path] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-W&lt;/span&gt; host:port] &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-w&lt;/span&gt; local_tun[:remote_tun]]
           &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;user@]hostname &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;command&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;libselinux&lt;/code&gt;'s constructor opens &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/filesystems&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;/proc/mounts&lt;/code&gt;, before the linker passes control to our &lt;code class=&quot;highlighter-rouge&quot;&gt;setup&lt;/code&gt;, and &lt;code class=&quot;highlighter-rouge&quot;&gt;/etc/passwd&lt;/code&gt; is read as part of &lt;code class=&quot;highlighter-rouge&quot;&gt;ssh&lt;/code&gt;'s initalization procedures.&lt;/p&gt;

&lt;p&gt;Overall, this is a simple fix to a problem that might otherwise go undetected during testing, but I hope the analysis of what can go wrong when relying on constructors to execute in a particular order was entertaining to read.&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">LD_PRELOAD is a very powerful feature supported by the dynamic linker on most Unixes that allows shared libraries to be loaded before others (including libc). This makes it very useful for hooking libc functions to observe or modify the behaviour of 3rd-party applications to which you do not control the source. Unfortunately, a lot of what's been written on the subject online is subtly wrong — not wrong enough to fail outright — but just enough to bite you once when you expect it the least. In this post I'll first go over the incorrect approach often described, analyze why it's wrong, and then describe the easy fix.</summary></entry><entry><title type="html">Low-latency static sites with Scaleway and Cloudflare</title><link href="https://tbrindus.ca/low-latency-static-sites-scaleway-cloudflare/" rel="alternate" type="text/html" title="Low-latency static sites with Scaleway and Cloudflare" /><published>2018-09-03T00:00:00+00:00</published><updated>2018-09-03T00:00:00+00:00</updated><id>https://tbrindus.ca/low-latency-static-sites-scaleway-cloudflare</id><content type="html" xml:base="https://tbrindus.ca/low-latency-static-sites-scaleway-cloudflare/">&lt;p&gt;For a while now, I'd been searching for a cheap but reliable hosting solution for this website.&lt;/p&gt;

&lt;p&gt;The option of hosting with Github Pages and similar services exists and has a minimal barrier to entry, but I like to be in control of my servers, so that I can occasionally use them for other tasks than just purely hosting. For instance, the machine serving this page runs both a Tor relay and acts as a backup for my large but non-sensitive files.&lt;/p&gt;

&lt;p&gt;Now, I think I've found a good solution: a €2.99/mo Scaleway plan coupled with Cloudflare for fast page load times worldwide.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;setting-up-the-server&quot;&gt;Setting up the Server&lt;/h2&gt;

&lt;p&gt;Scaleway's lowest-tier &lt;a href=&quot;https://www.scaleway.com/pricing/&quot;&gt;C1 plan&lt;/a&gt; offers 4 baremetal ARMv7 cores, 2GB RAM, 50GB SSD and unmetered 200mbit/s bandwidth for €2.99/mo. (There are x86 plans too, but ARM is cool.) They also offer extra SSD storage priced at €1/50GB/mo. That's a pretty sweet deal, with the downside that their only datacenters are located in Paris and Amsterdam — at least an extra 100ms away for users in North America compared to more traditional hosting options like New York or Montreal.&lt;/p&gt;

&lt;p&gt;That's where Cloudflare comes in.&lt;/p&gt;

&lt;p&gt;If you're hosting a site, chances are you're already using Cloudflare, or heard of it. In short, it acts as a proxy in front of your site, so that requests to your domain are routed through Cloudflare before hitting your server. This allows Cloudflare to filter traffic and protect you against DDoS attacks, but at first glance it would seem that an extra proxy step would only increase latency to your content.&lt;/p&gt;

&lt;p&gt;For dynamic websites, this may well be true. However, if you're running a mostly-static site, you can leverage Cloudflare's edge node caching to speed things up tremendously. By default, Cloudflare will cache typically-static content like images, CSS, JavaScript, etc. — that means that your server would only be hit for the HTML markup of your site, while static content would be served directly from Cloudflare's edge nodes &lt;a href=&quot;https://www.cloudflare.com/network/&quot;&gt;around the world&lt;/a&gt;. It's also free.&lt;/p&gt;

&lt;p&gt;One upside of running a static website is you can easily get Cloudflare to cache your HTML, too. For most requests, your users would experience only the latency to their local Cloudflare edge node (check &lt;a href=&quot;https://tbrindus.ca/cdn-cgi/trace&quot;&gt;here&lt;/a&gt; to see yours). In principle, a user from Australia should have the same fast loading time as a user from Canada, despite your server being a cheapo €2.99/mo box in Europe.&lt;/p&gt;

&lt;h2 id=&quot;configuring-the-site&quot;&gt;Configuring the Site&lt;/h2&gt;

&lt;p&gt;Configuration on Cloudflare's end is easy. Simply navigate to &lt;em&gt;Page Rules&lt;/em&gt; and add a new rule targetting your desired pages. Specify &lt;em&gt;Cache Level&lt;/em&gt; as &lt;em&gt;Cache Everything&lt;/em&gt; to force HTML caching and &lt;em&gt;Edge Cache TTL&lt;/em&gt;: &lt;em&gt;a day&lt;/em&gt; (or something similarly long), and you're off!&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/posts/low-latency-static-sites-scaleway-cloudflare/cloudflare-page-rules.png&quot; width=&quot;1200&quot; style=&quot;width: 100%;margin: 0px auto;display: block;margin:1.5em 0&quot; /&gt;&lt;/p&gt;

&lt;p&gt;That leaves making your site interact nicely with such aggressive caching. In particular, you probably don't want changes to your site to take an entire day to propagate to your users. This is easy to deal with by triggering Cloudflare's cache purge API whenever your site is rebuilt. You can obtain an API token from the bottom of &lt;a href=&quot;https://dash.cloudflare.com/profile&quot;&gt;your profile page&lt;/a&gt;, and your site's zone ID from its main overview page, after which purging Cloudflare's cache is just a simple cURL away:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;curl &lt;span class=&quot;nt&quot;&gt;-X&lt;/span&gt; POST &lt;span class=&quot;s2&quot;&gt;&quot;https://api.cloudflare.com/client/v4/zones/&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;zone_id&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;/purge_cache&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;X-Auth-Email: &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;auth_email&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;X-Auth-Key: &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;auth_key&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     &lt;span class=&quot;nt&quot;&gt;-H&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Content-Type: application/json&quot;&lt;/span&gt; &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
     &lt;span class=&quot;nt&quot;&gt;--data&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{&quot;purge_everything&quot;:true}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This should play well with most static site generators.&lt;/p&gt;

&lt;p&gt;You could build upon this to only purge pages that were changed via a filesystem watching process, and so on — for my purposes, purging everything was acceptable.&lt;/p&gt;

&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;

&lt;p&gt;That's all I have to say on this subject, hopefully you found it interesting :)&lt;/p&gt;

&lt;p&gt;It's not revolutionary by any means, but I know at least myself and some of my colleagues were surprised at just how effective this approach was to lowering page load times while saving on hosting bills.&lt;/p&gt;

&lt;p&gt;Till next time!&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">For a while now, I'd been searching for a cheap but reliable hosting solution for this website. The option of hosting with Github Pages and similar services exists and has a minimal barrier to entry, but I like to be in control of my servers, so that I can occasionally use them for other tasks than just purely hosting. For instance, the machine serving this page runs both a Tor relay and acts as a backup for my large but non-sensitive files. Now, I think I've found a good solution: a €2.99/mo Scaleway plan coupled with Cloudflare for fast page load times worldwide.</summary></entry><entry><title type="html">Mining for Tor v3 onions in the cloud</title><link href="https://tbrindus.ca/tor-v3-vanity-mining/" rel="alternate" type="text/html" title="Mining for Tor v3 onions in the cloud" /><published>2018-03-22T00:00:00+00:00</published><updated>2018-03-22T00:00:00+00:00</updated><id>https://tbrindus.ca/tor-v3-vanity-mining</id><content type="html" xml:base="https://tbrindus.ca/tor-v3-vanity-mining/">&lt;p&gt;Tor supports a new hidden service protocol as of v0.3.2.1-alpha, &lt;a href=&quot;https://blog.torproject.org/tor-0321-alpha-released-support-next-gen-onion-services-and-kist-scheduler&quot;&gt;released back in October 2017&lt;/a&gt;, and is now in stable branches. Dubbed the &quot;v3&quot; onion service protocol, among other changes, it replaces SHA1/DH/RSA1024 with SHA3/ed25519/curve25519 for much improved cryptographic security.&lt;/p&gt;

&lt;p&gt;I already had a v2 onion site up at &lt;a href=&quot;http://tbrindus6tjv6wpi.onion&quot;&gt;tbrindus6tjv6wpi.onion&lt;/a&gt;, so I thought it would be an interesting exercise to mine a v3 vanity domain prefixed with &lt;code class=&quot;highlighter-rouge&quot;&gt;tbrindus&lt;/code&gt;. For this, I set up 15 servers to mine for a matching prefix — more on this below!&lt;/p&gt;

&lt;p&gt;It took well over a week of mining, but as of today, this site can also be accessed through the v3 hidden service &lt;a href=&quot;http://tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion/&quot;&gt;tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion&lt;/a&gt;!&lt;/p&gt;

&lt;!--more--&gt;

&lt;style&gt;
td:nth-child(5), td:nth-child(6), td:nth-child(7) {
    text-align: right;
}
&lt;/style&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
  window.MathJax = {
      messageStyle: 'none',
      tex2jax: {
          inlineMath: [ ['$','$'], [&quot;\\(&quot;,&quot;\\)&quot;] ],
          displayMath: [ ['$$','$$'], [&quot;\\[&quot;,&quot;\\]&quot;] ],
          processEscapes: true
      },
      showMathMenu: false
  };
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; src=&quot;//cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML&quot;&gt;&lt;/script&gt;

&lt;h2 id=&quot;a-bit-of-background&quot;&gt;A bit of background&lt;/h2&gt;

&lt;p&gt;Tor hidden service domain &quot;names&quot; aren't really domain names as most are used to. You can enter them in your (Tor) browser, but you can't buy a particular domain you want — a hidden service hostname is a prefix of the base32-encoded public key of the service.&lt;/p&gt;

&lt;p&gt;If you want a particular onion, you must randomly generate billions of keys until one happens to hash into a string starting with the prefix you're looking for. In the case of &lt;code class=&quot;highlighter-rouge&quot;&gt;tbrindus&lt;/code&gt;, an 8-letter prefix, there are $32^8 = 1\,099\,511\,627\,776$ possible combinations. Every additional letter increases the space (and hence expected computation time) by a factor of 32.&lt;/p&gt;

&lt;p&gt;V2 onions have been around for a long time, so there exist GPU-based miners like &lt;a href=&quot;https://github.com/lachesis/scallion&quot;&gt;Scallion&lt;/a&gt; which can hash at frightening (several gigahashes a second) rates. In fact, Scallion was used to brute force 32-bit GPG key ids to demonstrate that 32-bit ids are insecure (&lt;a href=&quot;https://evil32.com/&quot;&gt;evil32.com&lt;/a&gt; for more on that).&lt;/p&gt;

&lt;p&gt;Tor's switch to ed25519 means that existing tools for generating vanity names like Scallion can't be used — at the time of writing, the best bet for v3 vanity names is &lt;a href=&quot;https://github.com/cathugger/mkp224o&quot;&gt;mkp224o&lt;/a&gt;, a CPU-based miner.&lt;/p&gt;

&lt;p&gt;I expected &lt;code class=&quot;highlighter-rouge&quot;&gt;mkp224o&lt;/code&gt; to be orders of magnitudes slower than GPU-based mining, so I spun up 15 servers across several providers (I'm looking for a new host, and thought this would be a good opportunity to test some new ones out).&lt;/p&gt;

&lt;h2 id=&quot;setting-up-the-servers&quot;&gt;Setting up the servers&lt;/h2&gt;
&lt;p&gt;Getting &lt;code class=&quot;highlighter-rouge&quot;&gt;mkp224o&lt;/code&gt; set up and running is fairly simple. On most development machines you'd probably have everything required preinstalled, with perhaps the exception of &lt;code class=&quot;highlighter-rouge&quot;&gt;libsodium-dev&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;On a typical Debian-based distro, you can get everything you need to get running with:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;apt &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;autoconf build-essential git libsodium-dev
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;git clone https://github.com/cathugger/mkp224o.git
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;mkp224o
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;./autogen.sh
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;./configure &lt;span class=&quot;c&quot;&gt;# see below&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;make
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For ARM servers, I passed &lt;code class=&quot;highlighter-rouge&quot;&gt;--enable-donna&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;configure&lt;/code&gt;, while for x86_64 boxes I used either &lt;code class=&quot;highlighter-rouge&quot;&gt;--enable-amd64-51-30k&lt;/code&gt; or &lt;code class=&quot;highlighter-rouge&quot;&gt;--enable-amd64-64-24k&lt;/code&gt;, whichever provided the greatest hashrate.&lt;/p&gt;

&lt;p&gt;For mining, I specified a filter for &lt;code class=&quot;highlighter-rouge&quot;&gt;tbrindus&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;./mkp224o &lt;span class=&quot;nt&quot;&gt;-s&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-T&lt;/span&gt; tbrindus
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;…and waited. I waited a long time.&lt;/p&gt;

&lt;h2 id=&quot;mining-results&quot;&gt;Mining results&lt;/h2&gt;
&lt;p&gt;V2 onions can be hashed incredibly fast on common GPUs with Scallion, with many cards capable of several gigahashes per second. On my laptop's GTX 960M, Scallion pulled in 1 GH/s, and mined &lt;a href=&quot;http://tbrindus6tjv6wpi.onion&quot;&gt;tbrindus6tjv6wpi.onion&lt;/a&gt; in under 10 minutes.&lt;/p&gt;

&lt;p&gt;For comparison, the 15 servers I ran &lt;code class=&quot;highlighter-rouge&quot;&gt;mkp224o&lt;/code&gt; on for 6 days pulled in an aggregate 5 MH/s, or 0.5% of what my fairly standard laptop graphics card can compute.&lt;/p&gt;

&lt;p&gt;Below, I've put together a table of the setups I ran to compute &lt;a href=&quot;http://tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion/&quot;&gt;tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion&lt;/a&gt;.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Host&lt;/th&gt;
      &lt;th&gt;Plan&lt;/th&gt;
      &lt;th&gt;OS&lt;/th&gt;
      &lt;th&gt;CPU&lt;/th&gt;
      &lt;th&gt;RAM&lt;/th&gt;
      &lt;th&gt;Hashes/s&lt;/th&gt;
      &lt;th&gt;Contrib.&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;sup&gt;&lt;a href=&quot;#fn1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
      &lt;td&gt;C2S&lt;/td&gt;
      &lt;td&gt;Debian 9.0&lt;/td&gt;
      &lt;td&gt;4x Intel Atom C2550 @ 2.3GHz&lt;/td&gt;
      &lt;td&gt;8GB&lt;/td&gt;
      &lt;td&gt;229,400&lt;/td&gt;
      &lt;td&gt;4.76%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;ARM64-16GB&lt;/td&gt;
      &lt;td&gt;Debian 9.0&lt;/td&gt;
      &lt;td&gt;16x ARMv8 Cavium ThunderX&lt;/td&gt;
      &lt;td&gt;16GB&lt;/td&gt;
      &lt;td&gt;1,300,000&lt;/td&gt;
      &lt;td&gt;26.97%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;ARM64-8GB&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04&lt;/td&gt;
      &lt;td&gt;8x ARMv8 Cavium ThunderX&lt;/td&gt;
      &lt;td&gt;8GB&lt;/td&gt;
      &lt;td&gt;626,000&lt;/td&gt;
      &lt;td&gt;12.99%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;ARM64-2GB&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04&lt;/td&gt;
      &lt;td&gt;4x ARMv8 Cavium ThunderX&lt;/td&gt;
      &lt;td&gt;2GB&lt;/td&gt;
      &lt;td&gt;314,000&lt;/td&gt;
      &lt;td&gt;6.51%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;sup&gt;&lt;a href=&quot;#fn2&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
      &lt;td&gt;ARM64-2GB&lt;/td&gt;
      &lt;td&gt;Debian 9.3&lt;/td&gt;
      &lt;td&gt;4x ARMv8 Cavium ThunderX&lt;/td&gt;
      &lt;td&gt;2GB&lt;/td&gt;
      &lt;td&gt;218,000&lt;/td&gt;
      &lt;td&gt;4.52%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;C1&lt;/td&gt;
      &lt;td&gt;Debian 9.0&lt;/td&gt;
      &lt;td&gt;2x Intel Atom C2750 @ 2.3GHz&lt;/td&gt;
      &lt;td&gt;2GB&lt;/td&gt;
      &lt;td&gt;113,500&lt;/td&gt;
      &lt;td&gt;2.35%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;DigitalOcean&lt;/td&gt;
      &lt;td&gt;Compute 4GB&lt;/td&gt;
      &lt;td&gt;Debian 9.4&lt;/td&gt;
      &lt;td&gt;2x Intel Xeon E5-2697A v4 @ 2.5GHz&lt;/td&gt;
      &lt;td&gt;4GB&lt;/td&gt;
      &lt;td&gt;470,000&lt;/td&gt;
      &lt;td&gt;9.75%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Azure&lt;/td&gt;
      &lt;td&gt;Standard B2s&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04&lt;/td&gt;
      &lt;td&gt;2x Intel Xeon E5-2673 v4 @ 2.294GHz&lt;/td&gt;
      &lt;td&gt;4GB&lt;/td&gt;
      &lt;td&gt;68,000&lt;/td&gt;
      &lt;td&gt;1.41%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Azure&lt;/td&gt;
      &lt;td&gt;Standard B2s&lt;/td&gt;
      &lt;td&gt;Debian 9.3&lt;/td&gt;
      &lt;td&gt;2x Intel Xeon E5-2673 v4 @ 2.294GHz&lt;/td&gt;
      &lt;td&gt;4GB&lt;/td&gt;
      &lt;td&gt;80,000&lt;/td&gt;
      &lt;td&gt;1.66%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Azure&lt;/td&gt;
      &lt;td&gt;Standard B2s&lt;/td&gt;
      &lt;td&gt;FreeBSD 11.1&lt;/td&gt;
      &lt;td&gt;2x Intel Xeon E5-2673 v4 @ 2.294GHz&lt;/td&gt;
      &lt;td&gt;4GB&lt;/td&gt;
      &lt;td&gt;69,000&lt;/td&gt;
      &lt;td&gt;1.43%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSDNodes&lt;sup&gt;&lt;a href=&quot;#fn3&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
      &lt;td&gt;8GB KVM&lt;/td&gt;
      &lt;td&gt;Debian 9.3&lt;/td&gt;
      &lt;td&gt;2x Intel (Skylake, IBRS) @ 2.299GHz&lt;/td&gt;
      &lt;td&gt;8GB&lt;/td&gt;
      &lt;td&gt;274,500&lt;/td&gt;
      &lt;td&gt;5.69%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSDNodes&lt;sup&gt;&lt;a href=&quot;#fn3&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
      &lt;td&gt;16GB KVM&lt;/td&gt;
      &lt;td&gt;Debian 9.3&lt;/td&gt;
      &lt;td&gt;4x Intel (Skylake, IBRS) @ 2.299GHz&lt;/td&gt;
      &lt;td&gt;16GB&lt;/td&gt;
      &lt;td&gt;540,000&lt;/td&gt;
      &lt;td&gt;11.20%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSDNodes&lt;/td&gt;
      &lt;td&gt;8GB Container&lt;/td&gt;
      &lt;td&gt;Debian 9.4&lt;/td&gt;
      &lt;td&gt;4x Intel Xeon E5-2697 v3 @ 766MHz&lt;/td&gt;
      &lt;td&gt;8GB&lt;/td&gt;
      &lt;td&gt;78,000&lt;/td&gt;
      &lt;td&gt;1.62%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;—&lt;sup&gt;&lt;a href=&quot;#fn4&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
      &lt;td&gt;Raspberry Pi 3&lt;/td&gt;
      &lt;td&gt;Raspbian 9.1&lt;/td&gt;
      &lt;td&gt;4x ARM Cortex-A53 @ 1.2GHz&lt;/td&gt;
      &lt;td&gt;1GB&lt;/td&gt;
      &lt;td&gt;70,000&lt;/td&gt;
      &lt;td&gt;1.45%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;—&lt;sup&gt;&lt;a href=&quot;#fn4&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
      &lt;td&gt;Optiplex 960&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04&lt;/td&gt;
      &lt;td&gt;4x Intel 2 Quad Q9400 @ 2.659GHz&lt;/td&gt;
      &lt;td&gt;4GB&lt;/td&gt;
      &lt;td&gt;370,000&lt;/td&gt;
      &lt;td&gt;7.68%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;—&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;4,820,400&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;100.00%&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p id=&quot;fn1&quot;&gt;1. This was a dedicated machine.&lt;/p&gt;
&lt;p id=&quot;fn2&quot;&gt;2. This machine was provisioned with the same specs as the other ARM64-2GB instance, but was also running a Tor relay, which explains the difference in hashrate.&lt;/p&gt;
&lt;p id=&quot;fn3&quot;&gt;3. CPU steal time on these machines was constantly at 20% or higher.&lt;/p&gt;
&lt;p id=&quot;fn4&quot;&gt;4. I ran these machines uninterrupted at home.&lt;/p&gt;

&lt;h2 id=&quot;a-quick-statistical-analysis&quot;&gt;A quick statistical analysis&lt;/h2&gt;
&lt;p&gt;OK, so it took a long time. I accumulated far more in server expenses than I had originally planned on, but at least I got a sense of pride and accomplishment from it. &lt;!-- thanks jason --&gt;&lt;/p&gt;

&lt;p&gt;The search for a hash prefix of &lt;code class=&quot;highlighter-rouge&quot;&gt;tbrindus&lt;/code&gt; is probabilistic and memoryless: you never get &quot;closer&quot; to mining a hash; every hash has an equal probability $\frac 1 {32^{\text{length(prefix)}}} = \frac 1 {32^8}$ of matching. Since it's essentially a Poisson process, and we can use an exponential distribution to estimate how long it takes, on average, for a match to be found.&lt;/p&gt;

&lt;p&gt;The CDF of an exponential distribution has the form $1 - e^{-\lambda x}$.&lt;/p&gt;

&lt;p&gt;We can perform 4,820,400 hashes per second (86,400 seconds in a day) with each hash having a probability of $\frac 1 {32^8}$, so we can determine the probability that we'll find a match in $x$ days (let's call it $f(x)$ for simplicity) by taking $\lambda = \frac{86\,400 \times 4\,820\,400}{32^8}$.&lt;/p&gt;

&lt;script type=&quot;math/tex; mode=display&quot;&gt;f(x) = 1 - e^{-\lambda x} = 1 - e^{-\frac{86\,400 \times 4\,820\,400}{32^8} x}&lt;/script&gt;

&lt;p&gt;Since I like graphs, let's graph this function.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/posts/tor-v3-vanity/hash-probability.png&quot; width=&quot;1000&quot; style=&quot;width: 75%;margin: 0px auto;display: block;margin-top:2em&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The expected value of an exponential distribution is given by $\frac 1 \lambda$, so we can take this and plug in our $\lambda$ to find out the expected number of days for generating a prefix of 8 characters:&lt;/p&gt;

&lt;script type=&quot;math/tex; mode=display&quot;&gt;\frac 1 \lambda = \frac 1 {\frac{86\,400 \times 4\,820\,400}{32^8}} \approx 2.64\text{ days}&lt;/script&gt;

&lt;p&gt;Alright, so I definitely overshot that.&lt;/p&gt;

&lt;h2 id=&quot;bonus-unixbench-of-the-servers&quot;&gt;Bonus: UnixBench of the servers&lt;/h2&gt;
&lt;p&gt;Since I had all these servers up and running already, I figured it'd be interesting to compare UnixBench scores to see how they
correlated to hashrate. In the table below, I've included the hashrate of several servers I was particularly interested in, as well
as their single core and multi-core performance determined by running UnixBench on an unloaded system.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Host&lt;/th&gt;
      &lt;th&gt;Plan&lt;/th&gt;
      &lt;th&gt;OS&lt;/th&gt;
      &lt;th&gt;Hashes/s&lt;/th&gt;
      &lt;th&gt;Num. Cores&lt;/th&gt;
      &lt;th&gt;Single core perf.&lt;/th&gt;
      &lt;th&gt;Multi-core perf.&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;ARM64-16GB&lt;/td&gt;
      &lt;td&gt;Debian 9.0&lt;/td&gt;
      &lt;td&gt;1,300,000&lt;/td&gt;
      &lt;td&gt;16&lt;/td&gt;
      &lt;td&gt;401.2&lt;/td&gt;
      &lt;td&gt;1641.6&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;ARM64-8GB&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04 LTS&lt;/td&gt;
      &lt;td&gt;626,000&lt;/td&gt;
      &lt;td&gt;8&lt;/td&gt;
      &lt;td&gt;380.5&lt;/td&gt;
      &lt;td&gt;1514.1&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;ARM64-2GB&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04 LTS&lt;/td&gt;
      &lt;td&gt;314,000&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;400.9&lt;/td&gt;
      &lt;td&gt;1020.3&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Scaleway&lt;/td&gt;
      &lt;td&gt;C1&lt;/td&gt;
      &lt;td&gt;Debian 9.0&lt;/td&gt;
      &lt;td&gt;113,500&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;621.0&lt;/td&gt;
      &lt;td&gt;1047.7&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Azure&lt;/td&gt;
      &lt;td&gt;Standard B2s&lt;/td&gt;
      &lt;td&gt;Ubuntu 16.04&lt;/td&gt;
      &lt;td&gt;68,000&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;472.2&lt;/td&gt;
      &lt;td&gt;340.0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSDNodes&lt;/td&gt;
      &lt;td&gt;16GB KVM&lt;/td&gt;
      &lt;td&gt;Debian 9.3&lt;/td&gt;
      &lt;td&gt;540,000&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;472.3&lt;/td&gt;
      &lt;td&gt;1363.2&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SSDNodes&lt;/td&gt;
      &lt;td&gt;8GB KVM&lt;/td&gt;
      &lt;td&gt;Debian 9.3&lt;/td&gt;
      &lt;td&gt;274,500&lt;/td&gt;
      &lt;td&gt;4&lt;/td&gt;
      &lt;td&gt;616.8&lt;/td&gt;
      &lt;td&gt;1382.8&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;I've also attached the raw UnixBench logs below, for convenience.&lt;/p&gt;

&lt;details&gt;&lt;summary&gt;Scaleway &amp;mdash; ARM64-16GB&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.23-std-1 -- #1 SMP Mon Apr 24 13:18:14 UTC 2017
   Machine: aarch64 (unknown)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   05:13:38 up 3 days,  1:08,  1 user,  load average: 11.74, 15.14, 15.76; runlevel 2018-03-15

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:13:38 - 05:41:33
16 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        8372406.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1825.0 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1014.4 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        181638.7 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           51750.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        422317.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                              476739.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  29308.4 lps   (10.0 s, 7 samples)
Process Creation                               2046.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2597.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1107.5 lpm   (60.0 s, 2 samples)
System Call Overhead                         863802.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8372406.5    717.4
Double-Precision Whetstone                       55.0       1825.0    331.8
Execl Throughput                                 43.0       1014.4    235.9
File Copy 1024 bufsize 2000 maxblocks          3960.0     181638.7    458.7
File Copy 256 bufsize 500 maxblocks            1655.0      51750.8    312.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     422317.9    728.1
Pipe Throughput                               12440.0     476739.6    383.2
Pipe-based Context Switching                   4000.0      29308.4     73.3
Process Creation                                126.0       2046.2    162.4
Shell Scripts (1 concurrent)                     42.4       2597.0    612.5
Shell Scripts (8 concurrent)                      6.0       1107.5   1845.8
System Call Overhead                          15000.0     863802.9    575.9
                                                                   ========
System Benchmarks Index Score                                         401.2

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:41:33 - 06:09:37
16 CPUs in system; running 16 parallel copies of tests

Dhrystone 2 using register variables      132993486.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    29057.8 MWIPS (10.0 s, 7 samples)
Execl Throughput                               7995.3 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        137360.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           29373.3 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        630759.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                             7424668.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 401144.7 lps   (10.0 s, 7 samples)
Process Creation                              10546.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  15213.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   2003.4 lpm   (60.2 s, 2 samples)
System Call Overhead                        1277419.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  132993486.5  11396.2
Double-Precision Whetstone                       55.0      29057.8   5283.2
Execl Throughput                                 43.0       7995.3   1859.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     137360.5    346.9
File Copy 256 bufsize 500 maxblocks            1655.0      29373.3    177.5
File Copy 4096 bufsize 8000 maxblocks          5800.0     630759.5   1087.5
Pipe Throughput                               12440.0    7424668.0   5968.4
Pipe-based Context Switching                   4000.0     401144.7   1002.9
Process Creation                                126.0      10546.1    837.0
Shell Scripts (1 concurrent)                     42.4      15213.0   3588.0
Shell Scripts (8 concurrent)                      6.0       2003.4   3339.0
System Call Overhead                          15000.0    1277419.8    851.6
                                                                   ========
System Benchmarks Index Score                                        1641.6
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;Scaleway &amp;mdash; ARM64-8GB&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.4.121-mainline-rev1 -- #1 SMP Sun Mar 11 16:44:34 UTC 2018
   Machine: aarch64 (aarch64)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   05:13:17 up 2 days, 53 min,  1 user,  load average: 5.56, 7.47, 7.82; runlevel 2018-03-16

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:13:17 - 05:41:24
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        8502417.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1741.0 MWIPS (10.1 s, 7 samples)
Execl Throughput                               1112.8 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        165427.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           54377.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        343939.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                              462211.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  14746.0 lps   (10.0 s, 7 samples)
Process Creation                               2370.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2677.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1050.2 lpm   (60.0 s, 2 samples)
System Call Overhead                         998124.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8502417.0    728.6
Double-Precision Whetstone                       55.0       1741.0    316.6
Execl Throughput                                 43.0       1112.8    258.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     165427.5    417.7
File Copy 256 bufsize 500 maxblocks            1655.0      54377.8    328.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     343939.2    593.0
Pipe Throughput                               12440.0     462211.7    371.6
Pipe-based Context Switching                   4000.0      14746.0     36.9
Process Creation                                126.0       2370.8    188.2
Shell Scripts (1 concurrent)                     42.4       2677.5    631.5
Shell Scripts (8 concurrent)                      6.0       1050.2   1750.4
System Call Overhead                          15000.0     998124.5    665.4
                                                                   ========
System Benchmarks Index Score                                         380.5

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:41:24 - 06:09:38
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables       67785992.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    13990.1 MWIPS (10.1 s, 7 samples)
Execl Throughput                               5098.5 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        285233.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           73046.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1005166.1 KBps  (30.0 s, 2 samples)
Pipe Throughput                             3663311.5 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 222918.1 lps   (10.0 s, 7 samples)
Process Creation                               8125.0 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  10717.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1391.7 lpm   (60.2 s, 2 samples)
System Call Overhead                        3636949.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   67785992.9   5808.6
Double-Precision Whetstone                       55.0      13990.1   2543.7
Execl Throughput                                 43.0       5098.5   1185.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     285233.4    720.3
File Copy 256 bufsize 500 maxblocks            1655.0      73046.0    441.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    1005166.1   1733.0
Pipe Throughput                               12440.0    3663311.5   2944.8
Pipe-based Context Switching                   4000.0     222918.1    557.3
Process Creation                                126.0       8125.0    644.8
Shell Scripts (1 concurrent)                     42.4      10717.2   2527.6
Shell Scripts (8 concurrent)                      6.0       1391.7   2319.6
System Call Overhead                          15000.0    3636949.3   2424.6
                                                                   ========
System Benchmarks Index Score                                        1514.1
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;Scaleway &amp;mdash; ARM64-2GB&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.4.121-mainline-rev1 -- #1 SMP Sun Mar 11 16:44:34 UTC 2018
   Machine: aarch64 (aarch64)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   05:14:10 up 3 days,  7:45,  1 user,  load average: 2.75, 3.74, 3.91; runlevel 2018-03-14

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:14:10 - 05:42:12
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        8555429.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1747.9 MWIPS (10.1 s, 7 samples)
Execl Throughput                               1224.4 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        184524.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           58246.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        438788.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                              465226.2 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  14792.3 lps   (10.0 s, 7 samples)
Process Creation                               2629.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3095.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    884.2 lpm   (60.0 s, 2 samples)
System Call Overhead                        1011139.0 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    8555429.5    733.1
Double-Precision Whetstone                       55.0       1747.9    317.8
Execl Throughput                                 43.0       1224.4    284.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     184524.9    466.0
File Copy 256 bufsize 500 maxblocks            1655.0      58246.7    351.9
File Copy 4096 bufsize 8000 maxblocks          5800.0     438788.5    756.5
Pipe Throughput                               12440.0     465226.2    374.0
Pipe-based Context Switching                   4000.0      14792.3     37.0
Process Creation                                126.0       2629.9    208.7
Shell Scripts (1 concurrent)                     42.4       3095.2    730.0
Shell Scripts (8 concurrent)                      6.0        884.2   1473.6
System Call Overhead                          15000.0    1011139.0    674.1
                                                                   ========
System Benchmarks Index Score                                         400.9

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:42:12 - 06:10:18
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       34136207.1 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     6989.1 MWIPS (10.2 s, 7 samples)
Execl Throughput                               3526.3 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        218968.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           61412.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        830973.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1848545.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 121851.3 lps   (10.0 s, 7 samples)
Process Creation                               6271.4 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7046.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    955.8 lpm   (60.1 s, 2 samples)
System Call Overhead                        3570647.2 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   34136207.1   2925.1
Double-Precision Whetstone                       55.0       6989.1   1270.7
Execl Throughput                                 43.0       3526.3    820.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     218968.8    553.0
File Copy 256 bufsize 500 maxblocks            1655.0      61412.5    371.1
File Copy 4096 bufsize 8000 maxblocks          5800.0     830973.8   1432.7
Pipe Throughput                               12440.0    1848545.0   1486.0
Pipe-based Context Switching                   4000.0     121851.3    304.6
Process Creation                                126.0       6271.4    497.7
Shell Scripts (1 concurrent)                     42.4       7046.2   1661.8
Shell Scripts (8 concurrent)                      6.0        955.8   1593.0
System Call Overhead                          15000.0    3570647.2   2380.4
                                                                   ========
System Benchmarks Index Score                                        1020.3
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;Scaleway &amp;mdash; C1&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.20-std-1 -- #1 SMP Tue Apr 4 12:56:17 UTC 2017
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   CPU 0: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz (4787.8 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Atom(TM) CPU C2750 @ 2.40GHz (4787.8 bogomips)
          x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:14:11 up 3 days,  1:28,  1 user,  load average: 2.01, 2.14, 2.06; runlevel 2018-03-15

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:14:12 - 05:42:08
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       12323865.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     2014.1 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1223.1 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        415672.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          120361.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        985611.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1170708.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  46541.0 lps   (10.0 s, 7 samples)
Process Creation                               3049.4 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3348.8 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    685.8 lpm   (60.1 s, 2 samples)
System Call Overhead                        1446516.0 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   12323865.3   1056.0
Double-Precision Whetstone                       55.0       2014.1    366.2
Execl Throughput                                 43.0       1223.1    284.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     415672.5   1049.7
File Copy 256 bufsize 500 maxblocks            1655.0     120361.9    727.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     985611.5   1699.3
Pipe Throughput                               12440.0    1170708.3    941.1
Pipe-based Context Switching                   4000.0      46541.0    116.4
Process Creation                                126.0       3049.4    242.0
Shell Scripts (1 concurrent)                     42.4       3348.8    789.8
Shell Scripts (8 concurrent)                      6.0        685.8   1142.9
System Call Overhead                          15000.0    1446516.0    964.3
                                                                   ========
System Benchmarks Index Score                                         621.0

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:42:08 - 06:10:06
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       24552470.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4016.2 MWIPS (10.0 s, 7 samples)
Execl Throughput                               2918.3 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        485532.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          131304.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1365028.4 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2329059.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 116038.9 lps   (10.0 s, 7 samples)
Process Creation                               7104.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   5589.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    722.6 lpm   (60.1 s, 2 samples)
System Call Overhead                        2260798.6 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   24552470.3   2103.9
Double-Precision Whetstone                       55.0       4016.2    730.2
Execl Throughput                                 43.0       2918.3    678.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     485532.6   1226.1
File Copy 256 bufsize 500 maxblocks            1655.0     131304.9    793.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    1365028.4   2353.5
Pipe Throughput                               12440.0    2329059.7   1872.2
Pipe-based Context Switching                   4000.0     116038.9    290.1
Process Creation                                126.0       7104.8    563.9
Shell Scripts (1 concurrent)                     42.4       5589.5   1318.3
Shell Scripts (8 concurrent)                      6.0        722.6   1204.3
System Call Overhead                          15000.0    2260798.6   1507.2
                                                                   ========
System Benchmarks Index Score                                        1047.7
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;Azure &amp;mdash; Standard B2S&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.13.0-1011-azure -- #14-Ubuntu SMP Thu Feb 15 16:15:39 UTC 2018
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   CPU 0: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (4589.4 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:22:33 up 6 days,  8:37,  1 user,  load average: 0.08, 0.62, 1.38; runlevel 2018-03-11

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:22:33 - 05:50:38
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       28065805.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3310.3 MWIPS (8.7 s, 7 samples)
Execl Throughput                               2546.1 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        257690.1 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           55889.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        535177.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                              315663.7 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  25281.3 lps   (10.0 s, 7 samples)
Process Creation                               3911.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2343.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    862.3 lpm   (60.0 s, 2 samples)
System Call Overhead                         268361.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   28065805.5   2405.0
Double-Precision Whetstone                       55.0       3310.3    601.9
Execl Throughput                                 43.0       2546.1    592.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     257690.1    650.7
File Copy 256 bufsize 500 maxblocks            1655.0      55889.7    337.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     535177.7    922.7
Pipe Throughput                               12440.0     315663.7    253.7
Pipe-based Context Switching                   4000.0      25281.3     63.2
Process Creation                                126.0       3911.9    310.5
Shell Scripts (1 concurrent)                     42.4       2343.0    552.6
Shell Scripts (8 concurrent)                      6.0        862.3   1437.2
System Call Overhead                          15000.0     268361.9    178.9
                                                                   ========
System Benchmarks Index Score                                         472.2

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:50:38 - 06:18:55
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       12561408.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1364.4 MWIPS (10.5 s, 7 samples)
Execl Throughput                               1285.0 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        108284.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           29067.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        813617.5 KBps  (30.0 s, 2 samples)
Pipe Throughput                              195193.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  59307.3 lps   (10.0 s, 7 samples)
Process Creation                               2751.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3681.4 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    322.3 lpm   (60.1 s, 2 samples)
System Call Overhead                         280762.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   12561408.5   1076.4
Double-Precision Whetstone                       55.0       1364.4    248.1
Execl Throughput                                 43.0       1285.0    298.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     108284.8    273.4
File Copy 256 bufsize 500 maxblocks            1655.0      29067.9    175.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     813617.5   1402.8
Pipe Throughput                               12440.0     195193.3    156.9
Pipe-based Context Switching                   4000.0      59307.3    148.3
Process Creation                                126.0       2751.5    218.4
Shell Scripts (1 concurrent)                     42.4       3681.4    868.3
Shell Scripts (8 concurrent)                      6.0        322.3    537.1
System Call Overhead                          15000.0     280762.9    187.2
                                                                   ========
System Benchmarks Index Score                                         340.0
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;SSDNodes &amp;mdash; KVM 16GB&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.0-5-amd64 -- #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   CPU 0: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 2: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 3: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:32:38 up 24 days,  9:44,  2 users,  load average: 0.86, 0.95, 2.01; runlevel 2018-02-21

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:32:39 - 06:00:50
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       18638854.5 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3603.5 MWIPS (9.3 s, 7 samples)
Execl Throughput                                543.0 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        326203.0 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          107831.8 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        782124.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                              772372.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  27040.7 lps   (10.0 s, 7 samples)
Process Creation                               1912.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   1867.0 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    685.7 lpm   (60.1 s, 2 samples)
System Call Overhead                         603214.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   18638854.5   1597.2
Double-Precision Whetstone                       55.0       3603.5    655.2
Execl Throughput                                 43.0        543.0    126.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     326203.0    823.7
File Copy 256 bufsize 500 maxblocks            1655.0     107831.8    651.6
File Copy 4096 bufsize 8000 maxblocks          5800.0     782124.6   1348.5
Pipe Throughput                               12440.0     772372.4    620.9
Pipe-based Context Switching                   4000.0      27040.7     67.6
Process Creation                                126.0       1912.9    151.8
Shell Scripts (1 concurrent)                     42.4       1867.0    440.3
Shell Scripts (8 concurrent)                      6.0        685.7   1142.9
System Call Overhead                          15000.0     603214.3    402.1
                                                                   ========
System Benchmarks Index Score                                         472.3

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 06:00:50 - 06:29:14
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       63227839.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    14671.5 MWIPS (9.4 s, 7 samples)
Execl Throughput                               4394.5 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        347374.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          109273.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        830966.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2702931.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 307088.8 lps   (10.0 s, 7 samples)
Process Creation                               4009.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   6331.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1825.1 lpm   (60.1 s, 2 samples)
System Call Overhead                        2090415.5 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   63227839.0   5418.0
Double-Precision Whetstone                       55.0      14671.5   2667.5
Execl Throughput                                 43.0       4394.5   1022.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     347374.8    877.2
File Copy 256 bufsize 500 maxblocks            1655.0     109273.0    660.3
File Copy 4096 bufsize 8000 maxblocks          5800.0     830966.2   1432.7
Pipe Throughput                               12440.0    2702931.3   2172.8
Pipe-based Context Switching                   4000.0     307088.8    767.7
Process Creation                                126.0       4009.3    318.2
Shell Scripts (1 concurrent)                     42.4       6331.9   1493.4
Shell Scripts (8 concurrent)                      6.0       1825.1   3041.8
System Call Overhead                          15000.0    2090415.5   1393.6
                                                                   ========
System Benchmarks Index Score                                        1363.2
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;
&lt;details&gt;&lt;summary&gt;SSDNodes &amp;mdash; KVM 8GB&lt;/summary&gt;&lt;pre&gt;&lt;code&gt;========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: redacted: GNU/Linux
   OS: GNU/Linux -- 4.9.0-5-amd64 -- #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04)
   Machine: x86_64 (unknown)
   Language: en_US.utf8 (charmap=&quot;UTF-8&quot;, collate=&quot;UTF-8&quot;)
   CPU 0: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   CPU 1: Intel Core Processor (Skylake, IBRS) (4600.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET
   05:27:18 up 24 days,  9:39,  2 users,  load average: 1.83, 2.76, 2.63; runlevel 2018-02-21

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:27:18 - 05:55:29
2 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       20712375.2 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4089.5 MWIPS (10.0 s, 7 samples)
Execl Throughput                                869.8 lps   (29.6 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        414717.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          118528.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1037781.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                              839599.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  39673.2 lps   (10.0 s, 7 samples)
Process Creation                               2367.3 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3917.3 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1015.1 lpm   (60.0 s, 2 samples)
System Call Overhead                         646058.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   20712375.2   1774.8
Double-Precision Whetstone                       55.0       4089.5    743.5
Execl Throughput                                 43.0        869.8    202.3
File Copy 1024 bufsize 2000 maxblocks          3960.0     414717.4   1047.3
File Copy 256 bufsize 500 maxblocks            1655.0     118528.4    716.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    1037781.8   1789.3
Pipe Throughput                               12440.0     839599.1    674.9
Pipe-based Context Switching                   4000.0      39673.2     99.2
Process Creation                                126.0       2367.3    187.9
Shell Scripts (1 concurrent)                     42.4       3917.3    923.9
Shell Scripts (8 concurrent)                      6.0       1015.1   1691.8
System Call Overhead                          15000.0     646058.8    430.7
                                                                   ========
System Benchmarks Index Score                                         616.8

------------------------------------------------------------------------
Benchmark Run: Sun Mar 18 2018 05:55:29 - 06:23:42
2 CPUs in system; running 2 parallel copies of tests

Dhrystone 2 using register variables       38935462.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     8156.1 MWIPS (10.0 s, 7 samples)
Execl Throughput                               4726.3 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        692577.9 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          203840.1 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       1799195.8 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1621602.5 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 211656.3 lps   (10.0 s, 7 samples)
Process Creation                               9135.5 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   7138.5 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   1195.9 lpm   (60.1 s, 2 samples)
System Call Overhead                        1202392.1 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   38935462.3   3336.4
Double-Precision Whetstone                       55.0       8156.1   1482.9
Execl Throughput                                 43.0       4726.3   1099.1
File Copy 1024 bufsize 2000 maxblocks          3960.0     692577.9   1748.9
File Copy 256 bufsize 500 maxblocks            1655.0     203840.1   1231.7
File Copy 4096 bufsize 8000 maxblocks          5800.0    1799195.8   3102.1
Pipe Throughput                               12440.0    1621602.5   1303.5
Pipe-based Context Switching                   4000.0     211656.3    529.1
Process Creation                                126.0       9135.5    725.0
Shell Scripts (1 concurrent)                     42.4       7138.5   1683.6
Shell Scripts (8 concurrent)                      6.0       1195.9   1993.2
System Call Overhead                          15000.0    1202392.1    801.6
                                                                   ========
System Benchmarks Index Score                                        1382.8
&lt;/code&gt;&lt;/pre&gt;&lt;/details&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;These benchmarks should be taken with a grain of salt, since UnixBench tests a fair bit more than just CPU throughput. However, what appears to be
fairly clear is that though the ARMv8 cores are 20-30% slower than the mixture of competing x86_64 cores in a contest of single core performance, 
they win out in multi-core hashrate simply due to their number.&lt;/p&gt;

&lt;p&gt;I suppose this isn't really a thrilling discovery — it makes immediate sense — but I found it fairly interesting that it's cheaper to scale out
in number of cores rather than up in per-core performance… at least when it comes to mining vanity Tor domains.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Overall, this was a larger undertaking than I would have assumed at first, and I spent a long time monitoring (nonexistent) progress. In the end, it was fun to do, so hopefully it was fun to read about too!&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Tor supports a new hidden service protocol as of v0.3.2.1-alpha, released back in October 2017, and is now in stable branches. Dubbed the &quot;v3&quot; onion service protocol, among other changes, it replaces SHA1/DH/RSA1024 with SHA3/ed25519/curve25519 for much improved cryptographic security. I already had a v2 onion site up at tbrindus6tjv6wpi.onion, so I thought it would be an interesting exercise to mine a v3 vanity domain prefixed with tbrindus. For this, I set up 15 servers to mine for a matching prefix — more on this below! It took well over a week of mining, but as of today, this site can also be accessed through the v3 hidden service tbrindusxnnqwmzov5qof56hyion6usmciqwykffxqsawswhk73aq5yd.onion!</summary></entry><entry><title type="html">Setting up an SSTP VPN on Windows Server with LetsEncrypt</title><link href="https://tbrindus.ca/windows-sstp-vpn-letsencrypt/" rel="alternate" type="text/html" title="Setting up an SSTP VPN on Windows Server with LetsEncrypt" /><published>2018-03-08T00:00:00+00:00</published><updated>2018-03-08T00:00:00+00:00</updated><id>https://tbrindus.ca/windows-sstp-vpn-letsencrypt</id><content type="html" xml:base="https://tbrindus.ca/windows-sstp-vpn-letsencrypt/">&lt;p&gt;Setting up a VPN on Windows Server for remote access to company resources comes up often enough, and a great deal has been written on the subject online.&lt;/p&gt;

&lt;p&gt;However, back when I first went through the whole process, I found it time-consuming to sift through all the outdated information
floating around, so I created this document for personal reference. I've had the opportunity to test them out on a number
of fresh installs, and worked out a bunch of kinks that way.&lt;/p&gt;

&lt;p&gt;These instructions assume a brand new install of Windows Server 2016, but they should be easily adaptable to other scenarios.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;installing-the-necessary-software&quot;&gt;Installing the Necessary Software&lt;/h2&gt;

&lt;h3 id=&quot;iis-and-rras&quot;&gt;IIS and RRAS&lt;/h3&gt;
&lt;p&gt;In Server Manager, Manager → Add Roles and Features, check &lt;strong&gt;Remote Access&lt;/strong&gt; and &lt;strong&gt;Web Server (IIS)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Under the &lt;em&gt;Features&lt;/em&gt; pane, select &lt;strong&gt;Remote Server Administration Tools&lt;/strong&gt; and all submodules, and under &lt;em&gt;Remote Access Role Services&lt;/em&gt;,
select &lt;strong&gt;DirectAccess and VPN&lt;/strong&gt; and &lt;strong&gt;Routing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Install.&lt;/p&gt;

&lt;h3 id=&quot;win-acme&quot;&gt;win-acme&lt;/h3&gt;
&lt;p&gt;Grab a copy of win-acme &lt;a href=&quot;https://github.com/PKISharp/win-acme&quot;&gt;from Github&lt;/a&gt;; we'll be using it to streamline the requesting of SSL
certificates from LetsEncrypt.&lt;/p&gt;

&lt;h2 id=&quot;setting-up-the-routing-and-remote-access-service&quot;&gt;Setting up the Routing and Remote Access Service&lt;/h2&gt;
&lt;p&gt;First, we must get RRAS set up.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Run &lt;code class=&quot;highlighter-rouge&quot;&gt;rrasmgmt.msc&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;Right click server → Configure → Custom Configuration → VPN Access &amp;amp; Demand-dial connections&lt;/li&gt;
  &lt;li&gt;Start the service&lt;/li&gt;
  &lt;li&gt;Right click the server → Properties&lt;/li&gt;
  &lt;li&gt;IPv4 tab, select static address pool and choose an appropriate IP range for VPN clients (e.g. 192.168.26.0 — 192.168.26.50)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, ensure that the Default Web Site host in IIS has an HTTPS binding, and furthermore has its &lt;strong&gt;Server Name Identification&lt;/strong&gt; box
&lt;strong&gt;unticked&lt;/strong&gt; — the host used for an SSTP VPN must not require SNI.&lt;/p&gt;

&lt;p&gt;To begin, we should get rid of any certificates for the VPN host.&lt;/p&gt;

&lt;div class=&quot;language-powershell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$hostname&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;vpn.company.com&quot;&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Get-ChildItem&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-Path&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Cert:\LocalMachine\My&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Where-Object&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;$_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Subject&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-match&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$hostname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Remove-Item&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Typically, certificates for IIS are stored in the WebHosting certificate store. However, RRAS can only use certificates under the
Personal certificate store, so we must ask win-acme to place the certificate in the Personal store explicitly.&lt;/p&gt;

&lt;div class=&quot;language-powershell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;/letsencrypt.exe&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;--plugin&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;iisbinding&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;--manualhost&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$hostname&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;  &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;--certificatestore&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;My&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;--notaskscheduler&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, we can fetch a PowerShell object referencing our certificate. By default win-acme removes old (expired) certificates when
requesting a new one with the same host, so we can just filter by hostname.&lt;/p&gt;

&lt;div class=&quot;language-powershell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$cert&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Get-ChildItem&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-Path&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;Cert:\LocalMachine\My&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Where-Object&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;$_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Subject&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-match&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$hostname&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Finally, we can import the RRAS module, and set our RRAS cert to the one we just created.&lt;/p&gt;

&lt;div class=&quot;language-powershell highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Import-Module&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;RemoteAccess&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Stop-Service&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;RemoteAccess&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Set-RemoteAccess&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;-SslCertificate&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$cert&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;$&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Start-Service&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;nx&quot;&gt;RemoteAccess&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;LetsEncrypt certificates expire every 3 months, so it's a good idea to make this script run periodically in Task Scheduler, so that
you're not faced with unexpected VPN outages.&lt;/p&gt;

&lt;p&gt;Next, since RRAS doesn't start up by default on a machine boot, we should make it do so,&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Open up &lt;code class=&quot;highlighter-rouge&quot;&gt;Services&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Find the &lt;em&gt;Remote Access Connection Manager&lt;/em&gt; service, right-click → &lt;em&gt;Properties&lt;/em&gt; → &lt;em&gt;Startup type: Automatic&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;testing-the-vpn&quot;&gt;Testing the VPN&lt;/h2&gt;
&lt;p&gt;Now, you may set up the VPN on a Windows machine, and attempt connecting. The VPN should connect, without any connectivity to the
internet or the host machine. If the VPN immediately disconnects upon a connection attempt, you can use the &lt;code class=&quot;highlighter-rouge&quot;&gt;rasdial&lt;/code&gt; (&lt;code class=&quot;highlighter-rouge&quot;&gt;rasdial /?&lt;/code&gt; 
for usage help) command in a command prompt on the client to get more detailed error information than from the regular Windows
interface.&lt;/p&gt;

&lt;p&gt;At this point, you may or may not be able to ping the host machine from your client when connected to the VPN (you can use 
&lt;code class=&quot;highlighter-rouge&quot;&gt;ipconfig /all&lt;/code&gt; on the host to determine its VPN IP, and try &lt;code class=&quot;highlighter-rouge&quot;&gt;ping&lt;/code&gt;-ing it from the client).&lt;/p&gt;

&lt;p&gt;Note that you will not be able to access the internet. To fix this, you must configure your client not to attempt to use the server
gateway (because it doesn't exist). Open the Network and Sharing Center, and click into &lt;em&gt;Change adapter settings&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Right-click the VPN connection you just created, and select &quot;Properties&quot;. Switch to the Networking tab. Select the &lt;em&gt;Internet Protocol
Version 4 (TCP/IPv4)&lt;/em&gt; list item, then click the &lt;em&gt;Properties&lt;/em&gt; button. Click &lt;em&gt;Advanced&lt;/em&gt;, and uncheck &lt;em&gt;Use default gateway on remote network&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id=&quot;troubleshooting&quot;&gt;Troubleshooting&lt;/h2&gt;
&lt;h3 id=&quot;vpn-user-must-be-allowed-to-dial-in&quot;&gt;VPN user must be allowed to dial-in&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Run &lt;code class=&quot;highlighter-rouge&quot;&gt;mmc.exe&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Add the &lt;em&gt;Local Users and Groups&lt;/em&gt; snap-in from the File menu&lt;/li&gt;
  &lt;li&gt;Click into your user account, then right-click &lt;em&gt;Properties&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;&lt;em&gt;Dial-in&lt;/em&gt; tab, &lt;em&gt;Allow access&lt;/em&gt; under &lt;em&gt;Network Access Permission&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;network-policy-server-must-allow-vpn-connections&quot;&gt;Network Policy Server must allow VPN connections&lt;/h3&gt;
&lt;p&gt;If you have NPS enabled, you will have to configure it to allow VPN connections.&lt;/p&gt;

&lt;p&gt;Under the NPS snap-in from &lt;code class=&quot;highlighter-rouge&quot;&gt;mmc.exe&lt;/code&gt; → &lt;em&gt;Advanced Configuration&lt;/em&gt; → &lt;em&gt;Network Policies&lt;/em&gt; → &lt;em&gt;Grant access&lt;/em&gt; to both policies relating to VPN connections (they are deny by default).&lt;/p&gt;

&lt;h3 id=&quot;host-machine-must-be-discoverable&quot;&gt;Host machine must be discoverable&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Open up the &lt;strong&gt;Network and Sharing Center&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;Click &lt;em&gt;Advanced sharing settings&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;Expand the &lt;em&gt;Private&lt;/em&gt; and &lt;em&gt;Guest or Public&lt;/em&gt; groups, and turn on &lt;em&gt;Network Discovery&lt;/em&gt; and &lt;em&gt;File and printer sharing&lt;/em&gt; on both&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;At this point, clients should be able to connect to the VPN host, and any file shares created on it should be mountable. A minor caveat to be aware of is that LetsEncrypt certificates expire every 3 months, so you must either have a reminder in your calendar to renew the certificate, or have a scheduled script
to request a new certificate and reconfigure RRAS to use it.&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Setting up a VPN on Windows Server for remote access to company resources comes up often enough, and a great deal has been written on the subject online. However, back when I first went through the whole process, I found it time-consuming to sift through all the outdated information floating around, so I created this document for personal reference. I've had the opportunity to test them out on a number of fresh installs, and worked out a bunch of kinks that way. These instructions assume a brand new install of Windows Server 2016, but they should be easily adaptable to other scenarios.</summary></entry><entry><title type="html">Blazing-fast Java2D rendering</title><link href="https://tbrindus.ca/blazing-fast-java2d/" rel="alternate" type="text/html" title="Blazing-fast Java2D rendering" /><published>2017-10-18T00:00:00+00:00</published><updated>2017-10-18T00:00:00+00:00</updated><id>https://tbrindus.ca/blazing-fast-java2d</id><content type="html" xml:base="https://tbrindus.ca/blazing-fast-java2d/">&lt;p&gt;Anyone who has ever attempted to draw anything more than almost-static scenes with Java2D can attest that it sluggishly chugs along. 
Some will even say it's even unusable for repainting at 60Hz or higher without taking a toll on CPU.&lt;/p&gt;

&lt;p&gt;Today, we'll look at what we can do to speed up rendering, in ways that (at the time of writing) I have not seen discussed anywhere
online. Probably because it's a big hack.&lt;/p&gt;

&lt;!--more--&gt;

&lt;h2 id=&quot;vanilla-java2d-rendering-and-caveats&quot;&gt;Vanilla Java2D Rendering, and Caveats&lt;/h2&gt;

&lt;p&gt;The reader may be familiar with the way Java handles drawing in swing. If not,
&lt;a href=&quot;http://www.oracle.com/technetwork/java/painting-140037.html&quot;&gt;the official documentation&lt;/a&gt; is a good starting place.&lt;/p&gt;

&lt;p&gt;What's important to note is that when you wish to update a frame in your Java2D application, you must first &lt;em&gt;request&lt;/em&gt; a repaint,
which is processed by putting a repaint event onto the event queue. If there are things earlier in the event queue, they must
first get processed before you can repaint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's not even guaranteed that a repaint request will cause a repaint&lt;/strong&gt; — sometimes,
multiple repaint events can get &quot;squashed&quot; into one, causing jittery animations. Of course, there are workarounds like
&lt;a href=&quot;http://docs.oracle.com/javase/6/docs/api/javax/swing/JComponent.html#paintImmediately%28int,%20int,%20int,%20int%29&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;paintImmediately&lt;/code&gt;&lt;/a&gt;,
for example, but none provide outstanding performance for even the simplest scenes: there's simply too much abstraction,
which is a killer when every millisecond counts to obtain an immersive rendering experience.&lt;/p&gt;

&lt;h3 id=&quot;a-simple-benchmark&quot;&gt;A simple benchmark&lt;/h3&gt;

&lt;p&gt;Below is a simple Swing application that does nothing more than draw a red, full-window rectangle. We'll be using it as a
benchmark for the purposes of this post — though it is not a particularly good real-life example, the effects of Java2D
abstraction are fairly uniform across the entire API: if we can get this to run quickly, so too will everything else.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PaintFrame&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JFrame&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;frameCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;setSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;720&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;680&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;JPanel&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;canvas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;canvas&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JPanel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
            &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;paint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Graphics&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gfx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;gfx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setColor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;RED&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;gfx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fillRect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getWidth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getHeight&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;frameCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BorderLayout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;CENTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;setLocationRelativeTo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;setVisible&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;canvas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;paintImmediately&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getWidth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getHeight&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can also hook our frame up to a simple, but illustrative, test.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PaintTest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;PaintFrame&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;frame&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PaintFrame&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Timer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;schedule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TimerTask&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

            &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
            &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
                &lt;span class=&quot;nc&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Averaging %.2f fps!\n&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;frame&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;frameCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Running on my Intel HD 5500 integrated graphics, the example code above averages ~1,200 frames per second. This quickly drops
to below 400fps when making the window fullscreen, which is unacceptable for anything where framerate matters.&lt;/p&gt;

&lt;p&gt;The significance of this deserves a bit more explanation (400fps is far more than the human eye can see!)
Here, we're doing nothing more than drawing a red rectangle as fast as possible; there is no application logic taking up resources
at the same time, and a single frame takes 2.5ms to process.&lt;/p&gt;

&lt;p&gt;To maintain a 120fps framerate, each frame should be processed in ~8.3ms. If we're taking 2.5ms just to draw a single red rectangle,
that leaves 5.8ms per frame for application logic: rendering would consume ~30% of application time. Naturally, rendering time
increases the more you have to draw per frame, and our 2.5ms measurement is for a single rectangle.&lt;/p&gt;

&lt;p&gt;Now that we've seen how vanilla Java2D rendering performs, let's see if we can do better.&lt;/p&gt;

&lt;h2 id=&quot;a-hack-for-fast-rendering&quot;&gt;A Hack for Fast Rendering&lt;/h2&gt;

&lt;p&gt;Java2D provides output with OpenGL, Direct3D, GDI, and more, depending on platform. Most of these are inherently active-rendering APIs,
so there should be no technical barrier preventing us from rendering directly to them… except for abstraction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: if your application needs to run on more than just Oracle's VM (or equivalently, OpenJDK), your mileage may vary
with this approach. As I mentioned earlier, it's a hack specific to the internals of the Oracle API implementation, so it's unlikely
to work anywhere else.&lt;/p&gt;

&lt;p&gt;Let's start off with an observation. If we try printing out a &lt;code class=&quot;highlighter-rouge&quot;&gt;Graphics&lt;/code&gt; object passed to &lt;code class=&quot;highlighter-rouge&quot;&gt;paint&lt;/code&gt;, we'll see that it's implemented by
&lt;a href=&quot;http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/java2d/SunGraphics2D.java&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;sun.java2d.SunGraphics2D&lt;/code&gt;&lt;/a&gt;.
We can also see that we're passed a different object each frame, so that's already a waste of GC resources, if we're pumping out 
hundreds of frames a second.&lt;/p&gt;

&lt;p&gt;If we could construct our own&lt;code class=&quot;highlighter-rouge&quot;&gt;SunGraphics2D&lt;/code&gt; object, we'd be able to reuse it and any underlying resources outside of our
&lt;code class=&quot;highlighter-rouge&quot;&gt;paint&lt;/code&gt; method, directly in our rendering thread. The &lt;code class=&quot;highlighter-rouge&quot;&gt;SunGraphics2D&lt;/code&gt; constructor is pretty benign, so that's promising.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SunGraphics2D&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;SurfaceData&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Color&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Color&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Font&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;At first glance, this seems fairly mild for such a fundamental class. The only thing that appears tricky is the &lt;code class=&quot;highlighter-rouge&quot;&gt;SurfaceData&lt;/code&gt; parameter.&lt;/p&gt;

&lt;h3 id=&quot;obtaining-a-surfacedata&quot;&gt;Obtaining a &lt;code class=&quot;highlighter-rouge&quot;&gt;SurfaceData&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;SurfaceData&lt;/code&gt; sounds exactly like what one would expect an abstraction of a native surface to be called, and if we
&lt;a href=&quot;http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/sun/java2d/SurfaceData.java#SurfaceData&quot;&gt;dig into its source&lt;/a&gt;,
it becomes evident that &lt;code class=&quot;highlighter-rouge&quot;&gt;SurfaceData&lt;/code&gt; implementations (the class itself is marked &lt;code class=&quot;highlighter-rouge&quot;&gt;abstract&lt;/code&gt;) do the heavy lifting
in rendering Java2D. If we search for implementations, we get names like &lt;code class=&quot;highlighter-rouge&quot;&gt;D3DWindowSurfaceData&lt;/code&gt;, &lt;code class=&quot;highlighter-rouge&quot;&gt;GDIWindowSurfaceData&lt;/code&gt;,
&lt;code class=&quot;highlighter-rouge&quot;&gt;XSurfaceData&lt;/code&gt;, and so on.&lt;/p&gt;

&lt;p&gt;It's clear that any rendering we do will have to be platform-dependent, so let's stick to the
&lt;code class=&quot;highlighter-rouge&quot;&gt;GDIWindowSurfaceData&lt;/code&gt; for now. Naturally, this is will work only on Windows, but idea is what's important, and generalizes to other 
platform-specific surface implementations.&lt;/p&gt;

&lt;p&gt;If we take a look at the
&lt;a href=&quot;http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/java2d/windows/GDIWindowSurfaceData.java#128&quot;&gt;source for &lt;code class=&quot;highlighter-rouge&quot;&gt;GDIWindowSurfaceData&lt;/code&gt;&lt;/a&gt;,
we find a very helpful function:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;GDIWindowSurfaceData&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;createData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;WComponentPeer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;peer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;nc&quot;&gt;SurfaceType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sType&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getSurfaceType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;peer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getDeviceColorModel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;GDIWindowSurfaceData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;peer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;…and all we need to use it is a &lt;code class=&quot;highlighter-rouge&quot;&gt;WComponentPeer&lt;/code&gt;, which we can obtain from our panel's (deprecated) &lt;a href=&quot;#&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;getPeer&lt;/code&gt;&lt;/a&gt; method!
&lt;strong&gt;Note: &lt;code class=&quot;highlighter-rouge&quot;&gt;getPeer&lt;/code&gt; is removed in the Java 9 EAP&lt;/strong&gt;; equivalently, you can use reflection to fetch the &lt;code class=&quot;highlighter-rouge&quot;&gt;peer&lt;/code&gt; field directly.&lt;/p&gt;

&lt;p&gt;Importantly, all &lt;code class=&quot;highlighter-rouge&quot;&gt;SurfaceData&lt;/code&gt; implementations provide a &lt;code class=&quot;highlighter-rouge&quot;&gt;createData&lt;/code&gt; static method, so it's possible to use reflection to make 
accessing code more portable and elegant. But, that's beyond the scope of this post.&lt;/p&gt;

&lt;h3 id=&quot;an-improved-benchmark&quot;&gt;An improved benchmark&lt;/h3&gt;

&lt;p&gt;Putting these together, we can come up with a solution that allows us to draw at our own pace, outside of Swing/AWT entirely.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PaintFrame&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;JFrame&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;frameCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;setSize&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;720&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;680&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
	&lt;span class=&quot;c1&quot;&gt;// Note that this now a heavyweight Panel: JPanels don't have real native peers&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Panel&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;canvas&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Panel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;canvas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BorderLayout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;CENTER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;setLocationRelativeTo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;setVisible&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;nc&quot;&gt;ComponentPeer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;peer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;canvas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getPeer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;SurfaceData&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;surfaceData&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;GDIWindowSurfaceData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;WComponentPeer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;peer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;SunGraphics2D&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gfx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SunGraphics2D&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;surfaceData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;BLACK&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;BLACK&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;gfx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setColor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;RED&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;gfx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fillRect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getWidth&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getHeight&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;frameCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the same hardware, &lt;strong&gt;our new rendering approach can pump out ~14,000 frames per second&lt;/strong&gt;, which drops to ~6,000fps when fullscreen.
&lt;strong&gt;That's a 20x speed improvement over regular rendering!&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;a-practical-conclusion&quot;&gt;A Practical Conclusion&lt;/h2&gt;

&lt;p&gt;It's nice to be able to say we can render 20x faster by employing this approach. But, it's not without caveats: you need to implement
a backend for each platform you wish to be able to render on, or at least provide a fallback to regular Swing drawing when you
cannot.&lt;/p&gt;

&lt;p&gt;In other words, it's not practical for simple one-off Swing applications. Nor is it practical for speeding up general rendering of
Swing &lt;em&gt;components&lt;/em&gt;. However, if your task involves repainting a large component as fast as possible, this is definitely the fastest
you can get without linking 3rd party libraries to perform Direct3D/OpenGL rendering yourself.&lt;/p&gt;

&lt;p&gt;You can view a &lt;a href=&quot;https://github.com/Xyene/Nitrous-Emulator/tree/master/src/main/java/nitrous/renderer&quot;&gt;more complete implementation&lt;/a&gt;
of the ideas expressed in this post in a Gameboy Color emulator I wrote, where Java2D was taking more time to render than the rest
of the emulation combined (which spurred me to develop this approach).&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Anyone who has ever attempted to draw anything more than almost-static scenes with Java2D can attest that it sluggishly chugs along. Some will even say it's even unusable for repainting at 60Hz or higher without taking a toll on CPU. Today, we'll look at what we can do to speed up rendering, in ways that (at the time of writing) I have not seen discussed anywhere online. Probably because it's a big hack.</summary></entry><entry><title type="html">Java internals, or when `true != true`</title><link href="https://tbrindus.ca/when-true-ne-true/" rel="alternate" type="text/html" title="Java internals, or when `true != true`" /><published>2017-10-14T00:00:00+00:00</published><updated>2017-10-14T00:00:00+00:00</updated><id>https://tbrindus.ca/when-true-ne-true</id><content type="html" xml:base="https://tbrindus.ca/when-true-ne-true/">&lt;p&gt;Most programmers have heard jokes about inserting a Greek question mark (&lt;code class=&quot;highlighter-rouge&quot;&gt;;&lt;/code&gt;, U+037E) into Java code in place of a
semicolon to cause &quot;inexplicable&quot; compilation errors.&lt;/p&gt;

&lt;p&gt;But, it's too easy to discover. What about something that manifests itself at runtime, but when inspected —
either by printing to &lt;code class=&quot;highlighter-rouge&quot;&gt;stdout&lt;/code&gt; or through a debugger — shows nothing amiss?&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;Using the internal &lt;a href=&quot;http://www.docjar.com/docs/api/sun/misc/Unsafe.html&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;sun.misc.Unsafe&lt;/code&gt;&lt;/a&gt; (and targetting Hotspot VMs),
we can create a boolean that compares equal to neither &lt;code class=&quot;highlighter-rouge&quot;&gt;true&lt;/code&gt; nor &lt;code class=&quot;highlighter-rouge&quot;&gt;false&lt;/code&gt;, but when inspected, will always manifest
itself as &lt;code class=&quot;highlighter-rouge&quot;&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Let's take a look.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;sun.misc.Unsafe&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;java.lang.reflect.Field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Tainted&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;toTaint&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;argv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;Field&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_unsafe&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;forName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;sun.misc.Unsafe&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getDeclaredField&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;theUnsafe&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;_unsafe&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setAccessible&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;nc&quot;&gt;Unsafe&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Unsafe&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_unsafe&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;unsafe&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;putInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tainted&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;unsafe&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;staticFieldOffset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tainted&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getDeclaredField&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;toTaint&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTaint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTaint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTaint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;toTaint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;test&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nc&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;printf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;%s == %s: %s\n&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The output of the above code is shown below.&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;
&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;
&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So, what's going on?&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;Unsafe&lt;/code&gt; class allows us to play around with the raw data backing Java objects. Since this is inherently unsafe,
we have to jump around a few hoops: specifically, we must use reflection to grab the &lt;code class=&quot;highlighter-rouge&quot;&gt;Unsafe&lt;/code&gt; instance (this can be blocked
by a security manager, for security-concious applications). The alternative is to set our classes as part of the &lt;code class=&quot;highlighter-rouge&quot;&gt;bootclasspath&lt;/code&gt;
and use &lt;a href=&quot;http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/sun/misc/Unsafe.java#83&quot;&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;Unsafe.getUnsafe()&lt;/code&gt;&lt;/a&gt;
directly, but that's less elegant.&lt;/p&gt;

&lt;p&gt;Once we have our &lt;code class=&quot;highlighter-rouge&quot;&gt;Unsafe&lt;/code&gt; instance, we can use it to determine the offset in memory from the base of our class of our &lt;code class=&quot;highlighter-rouge&quot;&gt;toTaint&lt;/code&gt;
boolean. Then, we can use &lt;code class=&quot;highlighter-rouge&quot;&gt;putInt&lt;/code&gt; to set the value of &lt;code class=&quot;highlighter-rouge&quot;&gt;toTaint&lt;/code&gt; to the integer 2.&lt;/p&gt;

&lt;p&gt;But what does this mean?&lt;/p&gt;

&lt;p&gt;If we look into the internals of the JVM, we can find &lt;a href=&quot;http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/javavm/export/jni.h#l57&quot;&gt;the declaration of &lt;code class=&quot;highlighter-rouge&quot;&gt;jboolean&lt;/code&gt;&lt;/a&gt; (the internal representation of a &lt;code class=&quot;highlighter-rouge&quot;&gt;boolean&lt;/code&gt; object)
in &lt;code class=&quot;highlighter-rouge&quot;&gt;jni.h&lt;/code&gt; as an &lt;code class=&quot;highlighter-rouge&quot;&gt;unsigned char&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-cpp highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;char&lt;/span&gt;   &lt;span class=&quot;n&quot;&gt;jboolean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;unsigned&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;jchar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;short&lt;/span&gt;           &lt;span class=&quot;n&quot;&gt;jshort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;           &lt;span class=&quot;n&quot;&gt;jfloat&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;typedef&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt;          &lt;span class=&quot;n&quot;&gt;jdouble&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This makes sense: there's no data type for storing just one bit of data, and an &lt;code class=&quot;highlighter-rouge&quot;&gt;unsigned char&lt;/code&gt; is guaranteed to be &lt;em&gt;at least&lt;/em&gt; 8 bits.
That is, a boolean can actually store any number in the range 0 to 255, and we're setting it to the integer value 2.&lt;/p&gt;

&lt;p&gt;Internally, when the JVM does equality comparisons, &lt;strong&gt;it doesn't only check one specific bit of both boolean values&lt;/strong&gt; (that'd be a silly waste of time);
instead, &lt;strong&gt;it simply compares all 8 bits&lt;/strong&gt;. A real &lt;code class=&quot;highlighter-rouge&quot;&gt;true&lt;/code&gt; value has only the least significant bit set (i.e., is equal to the integer 1). So, a real
&lt;code class=&quot;highlighter-rouge&quot;&gt;true&lt;/code&gt; will not compare equal to our tainted boolean (set to 2), nor will it to a real &lt;code class=&quot;highlighter-rouge&quot;&gt;false&lt;/code&gt; (stored as 0).&lt;/p&gt;

&lt;p&gt;However, this boolean is functionally equivalent otherwise: conditional branching operations look to see only if the value is nonzero, so an &lt;code class=&quot;highlighter-rouge&quot;&gt;if (toTaint)&lt;/code&gt;
block of code would still execute as expected.&lt;/p&gt;

&lt;p&gt;With that in mind, we can take a look at the &lt;a href=&quot;http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/Boolean.java#187&quot;&gt;code of the &lt;code class=&quot;highlighter-rouge&quot;&gt;Boolean&lt;/code&gt; class&lt;/a&gt; to explain the final bit of the puzzle:&lt;/p&gt;

&lt;div class=&quot;language-java highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;true&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;false&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;When we're printing out our boolean, internally &lt;code class=&quot;highlighter-rouge&quot;&gt;toString&lt;/code&gt; must be called on our object, so the &lt;code class=&quot;highlighter-rouge&quot;&gt;boolean&lt;/code&gt; is autoboxed to a &lt;code class=&quot;highlighter-rouge&quot;&gt;Boolean&lt;/code&gt;, and
the above code is called. As we've discussed already, branch operations treat any nonzero value as &lt;code class=&quot;highlighter-rouge&quot;&gt;true&lt;/code&gt;, so our boolean will always be
represented by the string &lt;code class=&quot;highlighter-rouge&quot;&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;And that wraps up our goal! The &lt;code class=&quot;highlighter-rouge&quot;&gt;Unsafe&lt;/code&gt; class has many practical uses for legitimate applications, but sometimes trying out
illegitimate things is the best way to learn something new — which hopefully this post has helped with!&lt;/p&gt;</content><author><name></name></author><category term="jekyll" /><category term="update" /><summary type="html">Most programmers have heard jokes about inserting a Greek question mark (;, U+037E) into Java code in place of a semicolon to cause &quot;inexplicable&quot; compilation errors. But, it's too easy to discover. What about something that manifests itself at runtime, but when inspected — either by printing to stdout or through a debugger — shows nothing amiss?</summary></entry></feed>