Spectre Attacks: Exploiting Speculative Execution
Paul Kocher1, 2, 3, 4,
5, 6, 7, 5,
5, 6, 5, 8
Copyright By PowCoder代写 加微信 powcoder
1 Independent (www.paulkocher.com), 2 Google Project Zero,
3 G DATA Advanced Analytics, 4 University of Pennsylvania and University of Maryland, 5 Graz University of Technology, 6 Cyberus Technology,
7 Rambus, Cryptography Research Division, 8 University of Adelaide and Data61
Abstract—Modern processors use branch prediction and spec- ulative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try to guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access the victim’s memory and registers, and can perform operations with measurable side effects.
Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim’s confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim’s process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software secu- rity mechanisms, including operating system process separation, containerization, just-in-time (JIT) compilation, and countermea- sures to cache timing and side-channel attacks. These attacks represent a serious threat to actual systems since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices.
While makeshift processor-specific countermeasures are possi- ble in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementa- tions are (and are not) permitted to leak.
I. INTRODUCTION
Computations performed by physical devices often leave observable side effects beyond the computation’s nominal outputs. Side-channel attacks focus on exploiting these side effects to extract otherwise-unavailable secret information. Since their introduction in the late 90’s [43], many physical effects such as power consumption [41, 42], electromagnetic radiation [58], or acoustic noise [20] have been leveraged to extract cryptographic keys as well as other secrets.
Physical side-channel attacks can also be used to extract secret information from complex devices such as PCs and mobile phones [21, 22]. However, because these devices often execute code from a potentially unknown origin, they face additional threats in the form of software-based attacks, which do not require external measurement equipment. While some attacks exploit software vulnerabilities (such as buffer overflows [5] or double-free errors [12]), other software attacks
leverage hardware vulnerabilities to leak sensitive information. Attacks of the latter type include microarchitectural attacks exploiting cache timing [8, 30, 48, 52, 55, 69, 74], branch prediction history [1, 2], branch target buffers [14, 44] or open DRAM rows [56]. Software-based techniques have also been used to mount fault attacks that alter physical memory [39] or internal CPU values [65].
Several microarchitectural design techniques have facilitated the increase in processor speed over the past decades. One such advancement is speculative execution, which is widely used to increase performance and involves having the CPU guess likely future execution directions and prematurely execute instructions on these paths. More specifically, consider an example where the program’s control flow depends on an uncached value located in external physical memory. As this memory is much slower than the CPU, it often takes several hundred clock cycles before the value becomes known. Rather than wasting these cycles by idling, the CPU attempts to guess the direction of control flow, saves a checkpoint of its register state, and proceeds to speculatively execute the program on the guessed path. When the value eventually arrives from memory, the CPU checks the correctness of its initial guess. If the guess was wrong, the CPU discards the incorrect speculative execution by reverting the register state back to the stored checkpoint, resulting in performance comparable to idling. However, if the guess was correct, the speculative execution results are committed, yielding a significant performance gain as useful work was accomplished during the delay.
From a security perspective, speculative execution involves executing a program in possibly incorrect ways. However, because CPUs are designed to maintain functional correctness by reverting the results of incorrect speculative executions to their prior states, these errors were previously assumed to be safe.
A. Our Results
In this paper, we analyze the security implications of such incorrect speculative execution. We present a class of microar- chitectural attacks which we call Spectre attacks. At a high level, Spectre attacks trick the processor into speculatively executing instruction sequences that should not have been executed under correct program execution. As the effects of these instructions on the nominal CPU state are eventually
reverted, we call them transient instructions. By influencing which transient instructions are speculatively executed, we are able to leak information from within the victim’s memory address space.
We empirically demonstrate the feasibility of Spectre attacks by exploiting transient instruction sequences to leak informa- tion across security domains both from unprivileged native code, as well as from portable JavaScript code.
Attacks using Native Code. As a proof-of-concept, we create a simple victim program that contains secret data within its memory address space. Next, we search the compiled victim binary and the operating system’s shared libraries for instruction sequences that can be used to leak information from the victim’s address space. Finally, we write an attacker program that exploits the CPU’s speculative execution feature to execute the previously-found sequences as transient instruc- tions. Using this technique, we are able to read memory from the victim’s address space, including the secrets stored within it.
AttacksusingJavaScriptandeBPF. Inadditiontoviolating process isolation boundaries using native code, Spectre attacks can also be used to violate sandboxing, e.g., by mounting them via portable JavaScript code. Empirically demonstrating this, we show a JavaScript program that successfully reads data from the address space of the browser process running it. In addition, we demonstrate attacks leveraging the eBPF interpreter and JIT in Linux.
B. Our Techniques
At a high level, Spectre attacks violate memory isola- tion boundaries by combining speculative execution with data exfiltration via microarchitectural covert channels. More specifically, to mount a Spectre attack, an attacker starts by locating or introducing a sequence of instructions within the process address space which, when executed, acts as a covert channel transmitter that leaks the victim’s memory or register contents. The attacker then tricks the CPU into speculatively and erroneously executing this instruction sequence, thereby leaking the victim’s information over the covert channel. Finally, the attacker retrieves the victim’s information over the covert channel. While the changes to the nominal CPU state resulting from this erroneous speculative execution are eventually reverted, previously leaked information or changes to other microarchitectural states of the CPU, e.g., cache contents, can survive nominal state reversion.
The above description of Spectre attacks is general, and needs to be concretely instantiated with a way to induce erroneous speculative execution as well as with a microar- chitectural covert channel. While many choices are possible for the covert channel component, the implementations de- scribed in this work use cache-based covert channels [64], i.e., Flush+Reload [74] and Evict+Reload [25, 45].
We now proceed to describe our techniques for inducing and influencing erroneous speculative execution.
Variant 1: Exploiting Conditional Branches. In this variant of Spectre attacks, the attacker mistrains the CPU’s branch
predictor into mispredicting the direction of a branch, causing the CPU to temporarily violate program semantics by execut- ing code that would not have been executed otherwise. As we show, this incorrect speculative execution allows an attacker to read secret information stored in the program’s address space. Indeed, consider the following code example:
In the example above, assume that the variable x contains attacker-controlled data. To ensure the validity of the memory access to array1, the above code contains an if statement whose purpose is to verify that the value of x is within a legal range. We show how an attacker can bypass this if statement, thereby reading potentially secret data from the process’s address space.
First, during an initial mistraining phase, the attacker in- vokes the above code with valid inputs, thereby training the branch predictor to expect that the if will be true. Next, during the exploit phase, the attacker invokes the code with a value of x outside the bounds of array1. Rather than waiting for determination of the branch re- sult, the CPU guesses that the bounds check will be true and already speculatively executes instructions that evaluate array2[array1[x]*4096] using the malicious x. Note that the read from array2 loads data into the cache at an address that is dependent on array1[x] using the malicious x, scaled so that accesses go to different cache lines and to avoid hardware prefetching effects.
When the result of the bounds check is eventually de- termined, the CPU discovers its error and reverts any changes made to its nominal microarchitectural state. How- ever, changes made to the cache state are not reverted, so the attacker can analyze the cache contents and find the value of the potentially secret byte retrieved in the out-of-bounds read from the victim’s memory.
Variant 2: Exploiting Indirect Branches. Drawing from return-oriented programming (ROP) [63], in this variant the attacker chooses a gadget from the victim’s address space and influences the victim to speculatively execute the gadget. Unlike ROP, the attacker does not rely on a vulnerability in the victim code. Instead, the attacker trains the Branch Target Buffer (BTB) to mispredict a branch from an indirect branch instruction to the address of the gadget, resulting in speculative execution of the gadget. As before, while the effects of incorrect speculative execution on the CPU’s nominal state are eventually reverted, their effects on the cache are not, thereby allowing the gadget to leak sensitive information via a cache side channel. We empirically demonstrate this, and show how careful gadget selection allows this method to read arbitrary memory from the victim.
To mistrain the BTB, the attacker finds the virtual address of the gadget in the victim’s address space, then performs indirect branches to this address. This training is done from the attacker’s address space. It does not matter what resides at the gadget address in the attacker’s address space; all that is
if (x < array1_size)
y = array2[array1[x] * 4096];
required is that the attacker’s virtual addresses during training match (or alias to) those of the victim. In fact, as long as the attacker handles exceptions, the attack can work even if there is no code mapped at the virtual address of the gadget in the attacker’s address space.
Other Variants. Further attacks can be designed by vary- ing both the method of achieving speculative execution and the method used to leak the information. Examples include mistraining return instructions, leaking information via timing variations, and contention on arithmetic units.
C. Targeted Hardware and Current Status
Hardware. We have empirically verified the vulnerabil- ity of several Intel processors to Spectre attacks, including , Haswell, Broadwell, Skylake, and Kaby Lake processors. We have also verified the attack’s applicability to AMD Ryzen CPUs. Finally, we have also successfully mounted Spectre attacks on several ARM-based Samsung and Qualcomm processors found in popular mobile phones.
Current Status. Using the practice of responsible disclosure, disjoint groups of authors of this paper provided preliminary versions of our results to partially overlapping groups of CPU vendors and other affected companies. In coordination with industry, the authors also participated in an embargo of the results. The Spectre family of attacks is documented under CVE-2017-5753 and CVE-2017-5715.
D. Meltdown
Meltdown [47] is a related microarchitectural attack which exploits out-of-order execution to leak kernel memory. Melt- down is distinct from Spectre attacks in two main ways. First, unlike Spectre, Meltdown does not use branch prediction. Instead, it relies on the observation that when an instruction causes a trap, following instructions are executed out-of- order before being terminated. Second, Meltdown exploits a vulnerability specific to many Intel and some ARM processors which allows certain speculatively executed instructions to bypass memory protection. Combining these issues, Meltdown accesses kernel memory from user space. This access causes a trap, but before the trap is issued, the instructions that follow the access leak the contents of the accessed memory through a cache covert channel.
In contrast, Spectre attacks work on a wider range of proces- sors, including most AMD and ARM processors. Furthermore, the KAISER mechanism [29], which has been widely applied as a mitigation to the Meltdown attack, does not protect against Spectre.
II. BACKGROUND
In this section, we describe some of the microarchitectural components of modern high-speed processors, how they im- prove performance, and how they can leak information from running programs. We also describe return-oriented program- ming (ROP) and gadgets.
A. Out-of-order Execution
An out-of-order execution paradigm increases the utilization of the processor’s components by allowing instructions further down the instruction stream of a program to be executed in parallel with, and sometimes before, preceding instructions.
Modern processors internally work with micro-ops, emu- lating the instruction set of the architecture, i.e., instructions are decoded into micro-ops [15]. Once all of the micro- ops corresponding to an instruction, as well as all preceding instructions, have been completed, the instructions can be retired, committing in their changes to registers and other architectural state and freeing the reorder buffer space. As a result, instructions are retired in program execution order.
B. Speculative Execution
Often, the processor does not know the future instruction stream of a program. For example, this occurs when out-of- order execution reaches a conditional branch instruction whose direction depends on preceding instructions whose execution is not completed yet. In such cases, the processor can preserve its current register state, make a prediction as to the path that the program will follow, and speculatively execute instructions along the path. If the prediction turns out to be correct, the results of the speculative execution are committed (i.e., saved), yielding a performance advantage over idling during the wait. Otherwise, when the processor determines that it followed the wrong path, it abandons the work it performed speculatively by reverting its register state and resuming along the correct path.
We refer to instructions which are performed erroneously (i.e., as the result of a misprediction), but may leave mi- croarchitectural traces, as transient instructions. Although the speculative execution maintains the architectural state of the program as if execution followed the correct path, microarchi- tectural elements may be in a different (but valid) state than before the transient execution.
Speculative execution on modern CPUs can run several hundred instructions ahead. The limit is typically governed by the size of the reorder buffer in the CPU. For instance, on the Haswell microarchitecture, the reorder buffer has sufficient space for 192 micro-ops [15]. Since there is not a one-to-one relationship between the number of micro-ops and instructions, the limit depends on which instructions are used.
C. Branch Prediction
During speculative execution, the processor makes guesses as to the likely outcome of branch instructions. Better pre- dictions improve performance by increasing the number of speculatively executed operations that can be successfully committed.
The branch predictors of modern Intel processors, e.g., processors, have multiple prediction mecha- nisms for direct and indirect branches. Indirect branch in- structions can jump to arbitrary target addresses computed at runtime. For example, x86 instructions can jump to an address in a register, memory location, or on the stack e.g., “jmp
eax”, “jmp [eax]”, and “ret”. Indirect branches are also supported on ARM (e.g., “MOV pc, r14”), MIPS (e.g., “jr $ra”), RISC-V (e.g., “jalr x0,x1,0”), and other proces- sors. To compensate for the additional flexibility as compared to direct branches, indirect jumps and calls are optimized using at least two different prediction mechanisms [35].
Intel [35] describes that the processor predicts
• “Direct Calls and Jumps” in a static or monotonic manner,
• “Indirect Calls and Jumps” either in a monotonic manner,
or in a varying manner, which depends on recent program
behavior, and for
• “Conditional Branches” the branch target and whether the
branch will be taken.
Consequently, several processor components are used for
predicting the outcome of branches. The Branch Target Buffer (BTB) keeps a mapping from addresses of recently executed branch instructions to destination addresses [44]. Processors can use the BTB to predict future code addresses even before decoding the branch instructions. Evtyushkin et al. [14] ana- lyzed the BTB of an Intel Haswell processor and concluded that only the 31 least significant bits of the branch address are used to index the BTB.
For conditional branches, recording the target address is not necessary for predicting the outcome of the branch since the destination is typically encoded in the instruction while the condition is determined at runtime. To improve predictions, the processor maintains a record of branch outcomes, both for recent direct and indirect branches. Bhattacharya et al. [9] analyzed the structure of branch history prediction in recent Intel processors.
Although return instructions are a type of indirect branch, a separate mechanism for predicting the destination address is often used in modern CPUs. The Return Stack Buffer (RSB) maintains a copy of the most recently used portion of the call stack [15]. If no data is available in the RSB, different processors will either stall the execution or use the BTB as a fallback [15].
Branch-prediction logic, e.g., BTB and RSB, is typically not shared across physical cores [19]. Hence, the processor learns only from previous branches executed on the same core.
D. The Memory Hierarchy
To bridge the speed gap between the faster processor and the slower memory, processors use a hierarchy of
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com