
related overhead. But doing so in practice is far from trivial [8, 44]
and can adversely affect both latency and throughput. We survey
these approaches and contrast them with ours in Section 2.
Our approach rests on the observation that the high interrupt
rates experienced by a core running an I/O-intensive guest are
mostly generated by devices assigned to the guest. Indeed, we
measure rates of over 150K physical interrupts per second, even
while employing standard techniques to reduce the number of
interrupts, such as interrupt coalescing [
4
,
43
,
54
] and hybrid
polling [
17
,
45
]. As noted, the resulting guest/host context switches
are nearly exclusively responsible for the inferior performance
relative to bare metal. To eliminate these switches, we propose
ELI (ExitLess Interrupts), a software-only approach for handling
physical interrupts directly within the guest in a secure manner.
With ELI, physical interrupts are delivered directly to guests,
allowing them to process their devices’ interrupts without host in-
volvement; ELI makes sure that each guest forwards all other inter-
rupts to the host. With x86 hardware, interrupts are delivered using
a software-controlled table of pointers to functions, such that the
hardware invokes the
k
-th function whenever an interrupt of type
k fires. Instead of utilizing the guest’s table, ELI maintains, manip-
ulates, and protects a “shadow table”, such that entries associated
with assigned devices point to the guest’s code, whereas the other
entries are set to trigger an exit to the host. We describe x86 interrupt
handling relevant to ELI and ELI itself in
Section 3
and
Section 4
,
respectively. ELI leads to a mostly exitless execution as depicted in
Figure 1(c).
We experimentally evaluate ELI in
Section 5
with micro and
macro benchmarks. Our baseline configuration employs standard
techniques to reduce (coalesce) the number of interrupts, demon-
strating ELI’s benefit beyond the state-of-the-art. We show that ELI
improves the throughput and latency of guests by 1.3x–1.6x. No-
tably, whereas I/O-intensive guests were so far limited to 60%–65%
of bare-metal throughput, with ELI they reach performance that is
within 97%–100% of the optimum. Consequently, ELI makes it
possible to, e.g., consolidate traditional data-center workloads that
nowadays remain non-virtualized due to unacceptable performance
loss.
In
Section 6
we describe how ELI protects the aforementioned
table, maintaining security and isolation while still allowing guests
to handle interrupts directly. In
Section 7
we discusses potential
hardware support that would simplify ELI’s design and implementa-
tion. Finally, in
Section 8
we discuss the applicability of ELI and
our future work directions, and in Section 9 we conclude.
2. Motivation and Related Work
For the past several decades, interrupts have been the main method
by which hardware devices can send asynchronous events to the
operating system [
13
]. The main advantage of using interrupts
to receive notifications from devices over polling them is that
the processor is free to perform other tasks while waiting for an
interrupt. This advantage applies when interrupts happen relatively
infrequently [
39
], as has been the case until high performance
storage and network adapters came into existence. With these
devices, the CPU can be overwhelmed with interrupts, leaving no
time to execute code other than the interrupt handler [
34
]. When
the operating system is run in a guest, interrupts have a higher cost
since every interrupt causes multiple exits [2, 9, 26].
In the remainder of this section we introduce the existing ap-
proaches to reduce the overheads induced by interrupts, and we
highlight the novelty of ELI in comparison to these approaches. We
subdivide the approaches into two categories.
2.1 Generic Interrupt Handling Approaches
We now survey approaches that equally apply to bare metal and
virtualized environments.
Polling disables interrupts entirely and polls the device for new
events at regular intervals. The benefit is that handling device events
becomes synchronous, allowing the operating system to decide
when to poll and thus limit the number of handler invocations. The
drawbacks are added latency and wasted cycles when no events are
pending. If polling is done on a different core, latency is improved,
yet a core is wasted. Polling also consumes power since the processor
cannot enter an idle state.
A
hybrid
approach for reducing interrupt-handling overhead
is to dynamically switch between using interrupts and polling [
17
,
22
,
34
]. Linux uses this approach by default through the NAPI
mechanism [
45
]. Switching between interrupts and polling does
not always work well in practice, partly due to the complexity of
predicting the number of interrupts a device will issue in the future.
Another approach is
interrupt coalescing
[
4
,
43
,
54
], in which
the OS programs the device to send one interrupt in a time interval
or one interrupt per several events, as opposed to one interrupt per
event. As with the hybrid approaches, coalescing delays interrupts
and hence might suffer from the same shortcomings in terms of
latency. In addition, coalescing has other adverse effects and cannot
be used as the only interrupt mitigation technique. Zec et al. [
54
]
show that coalescing can burst TCP traffic that was not bursty
beforehand. It also increases latency [
28
,
40
], since the operating
system can only handle the first packet of a series when the last
coalesced interrupt for the series arrived. Deciding on the right
model and parameters for coalescing is complex and depends on the
workload, particularly when the workload runs within a guest [
15
].
Getting it right for a wide variety of workloads is hard if not
impossible [
4
,
44
]. Unlike coalescing, ELI does not reduce the
number of interrupts; instead it streamlines the handling of interrupts
targeted at virtual machines. Coalescing and ELI are therefore
complementary: coalescing reduces the number of interrupts, and
ELI reduces their price. Furthermore, with ELI, if a guest decides
to employ coalescing, it can directly control the interrupt rate and
latency, leading to predictable results. Without ELI, the interrupt rate
and latency cannot be easily manipulated by changing the coalescing
parameters, since the host’s involvement in the interrupt path adds
variability and uncertainty.
All evaluations in Section 5 were performed with the default
Linux configuration, which combines the hybrid approach (via
NAPI) and coalescing.
2.2 Virtualization-Specific Approaches
Using an emulated or paravirtual [
7
,
41
] device provides much flexi-
bility on the host side, but its performance is much lower than that of
device assignment, not to mention bare metal. Liu [
31
] shows that
device assignment of SR-IOV devices [16] can achieve throughput
close to bare metal at the cost of as much as 2x higher CPU utiliza-
tion. He also demonstrates that interrupts have a great impact on
performance and are a major expense for both the transmit and re-
ceive paths. For this reason, although applicable to the emulated and
paravirtual case as well, ELI’s main focus is on improving device
assignment.
Interrupt overhead is amplified in virtualized environments. The
Turtles project [
9
] shows interrupt handling to cause a 25% increase
in CPU utilization for a single-level virtual machine when compared
with bare metal, and a 300% increase in CPU utilization for a nested
virtual machine. There are software techniques [
3
] to reduce the
number of exits by finding blocks of exiting instructions and exiting
only once for the whole block. These techniques can increase the
efficiency of running a virtual machine when the main reason for
the overhead is in the guest code. When the reason is in external