代写代考 CPU2017 benchmarks, and our VLSI measurements show low power and area overh

No-FAT: Architectural Support for Low Overhead Memory Safety Checks
Mohamed Ziad, . Arroyo, , , and Department of Computer Science
Columbia University
{mtarek, miguel, evgeny, ryan.piersma,

Abstract—Memory safety continues to be a significant software reliability and security problem, and low overhead and low complexity hardware solutions have eluded computer designers. In this paper, we explore a pathway to deployable memory safety defenses. Our technique builds on a recent trend in software: the usage of binning memory allocators. We observe that if memory allocation sizes (e.g., malloc sizes) are made an architectural feature, then it is possible to overcome many of the thorny issues with traditional approaches to memory safety such as com- patibility with unsecured software and significant performance degradation. We show that our architecture, No-FAT, incurs an overhead of 8% on SPEC CPU2017 benchmarks, and our VLSI measurements show low power and area overheads. Finally, as No-FAT’s hardware is aware of the memory allocation sizes, it effectively mitigates certain speculative attacks (e.g., Spectre- V1) with no additional cost. When our solution is used for pre- deployment fuzz testing it can improve fuzz testing bandwidth by an order of magnitude compared to state-of-the-art approaches.
Index Terms— , Fuzzing, Memory Safety, Microarchitecture, Spectre-V1, Systems Security.
I. INTRODUCTION
Memory safety violations in programs have provided a sig- nificant opportunity for exploitation by attackers. For instance, Microsoft recently revealed that the root cause of around 70% of all exploits targeting their products are software memory safety violations [37]. Similarly, the Project Zero team at Google reports that memory corruption issues are the root- cause of 68% of listed CVEs for zero-day vulnerabilities between 2014 and 2019 [22].
To address the threat of memory safety, software checking tools (e.g., AddressSanitizer [52]) and fuzz testing are widely deployed. In software fuzz testing, binaries are instrumented with a tool like AddressSanitizer to detect memory safety vulnerabilities and run with inputs mutated from a set of exemplary inputs in the hopes of detecting bugs before de- ployment. Google has reported that it has been fuzzing about 25,000 machines continuously since 2016, which has resulted in the identification of many critical bugs in software such as Google Chrome and several open source projects [7]. Assum- ing 15 cents per CPU hour for large memory machines—a requirement for reasonable performance on fuzz testing—the investment in software fuzzing for detecting memory errors could be close to a billion dollars at just one company.
Despite a Herculean effort by software vendors, memory safety vulnerabilities continue to slip through, ending up in deployed systems. Recognizing that pre-deployment fuzz tests can never be complete, companies have also proposed post- deployment crowdsourced fuzz testing [38], [59]. For instance, Mozilla recently created a framework for fuzzing software using a cluster of fuzzers made by users who are willing to trade and contribute their CPU resources (e.g., using office workstations after-hours for fuzz testing) [38]. Assuming that many companies participate and these tests run for enough time, on a global scale, the amount of energy invested in pro- ducing reliable software may be even higher than the amount of time running the software with crowdsourced testing. Thus, increasing the efficiency of memory error detection can have significant green benefits in addition to improving security and reliability.
Researchers and commercial vendors have also stepped up to the call to reduce inefficiencies in software testing and secu- rity. There is a long history of academic proposals that have continuously chipped away at these overheads for detecting memory safety vulnerabilities over the past 25 years ([51], [3], [47], [26], [14], [39], [61], [56], [50], [55], [27]). Commercial vendors have also proposed or manufactured hardware with support to mitigate these overheads (Intel’s MPX [43], ARM’s MTE [2], and Oracle’s ADI [45]).
In this paper, we show that the overheads of providing memory safety can be decreased even further with novel hardware support. Traditional memory safety techniques incur overheads from two sources: (1) storage of metadata to detect memory safety violations, and (2) computational overheads of memory safety checks based on the stored metadata. No-FAT1, our system, eliminates all metadata attached to pointers, and hides the computational overheads of the metadata checks by performing them in parallel with regular lookups.
The technological change that facilitates these improve- ments in No-FAT is the increasing adoption of binning al- locators. Binning memory allocators use collections of pages (called bins), where each bin is used to allocate objects of the same size. Using bins enables the allocator to quickly serve
1The name is an allusion to No-Fat Milk, which has fewer calories. Also, closely related work in this area refer to their schemes as Fat and Low Fat pointers.

Allocation Size Address
Computed Base
ptr = malloc( 4*sizeof(char));
ptr[1] = ‘A’; …
Source Code
ptr = malloc(4) secure_store ptr[1],’A’,ptr
Compute ==? Base
Allocation Base
Fig. 1: A high level overview of how No-FAT makes allocation size an architectural feature.
allocation requests and increases performance by maintaining allocation locality [21], [20], [18], [34].
No-FAT, when used with a binning allocator, is able to implicitly derive allocations bounds information (i.e., the base address and size) from the pointer itself without relying on explicit metadata. The hardware/software contract has to be tweaked slightly to facilitate No-FAT and binning allocators working together: the standard allocation sizes used by a binning allocator need to be supplied to the hardware and special load and store instructions are created to access the allocation sizes. In other words, the memory allocation size (e.g., malloc size) becomes an architectural feature.
We illustrate one type of protection offered by No-FAT with an example shown in Figure 1. The program allocates an array of four characters and writes ‘A’ to the second element, ptr[1]. To make the allocation an architectural feature, No-FAT modifies the compiler to propagate the al- location base address (i.e., ptr) to memory instructions (i.e., secure_store). Before accessing memory, No-FAT computes the allocation base of the memory address (i.e., ptr[1]). No-FAT will then compare the computed base address against the compiler propagated base address (i.e., ptr) and raise an exception in the case of a mismatch. Moreover, No-FAT enforces temporal protection by generating a random tag upon memory allocation and storing it in the cur- rently unused upper bits of the pointer. Then, upon executing memory instructions, No-FAT verifies that tags match.
No-FAT also provides resilience against a subset of specu- lative execution attacks, namely Spectre-V1 [29]. Spectre-V1 attacks exploit speculative execution to access out-of-bounds memory, effectively bypassing software-based bounds checks. No-FAT’s memory instructions are aware of allocation bounds information. Thus, allocation bounds information can be used to verify if memory accesses are within valid bounds even for speculatively executed instructions.
A challenge with prior memory safety proposals, or for that matter any proposal that involves ISA changes, is practical deployment considerations. Are the new instructions and in- terfaces compatible with older software? Will there be per- formance degradation when using older software? Does code have to re-written or can it be simply recompiled? Luckily, the key ideas described in this paper viz., the idea of standardizing
memory allocation structures, and using that information to provide memory safety, have been very well tested with software implementations [1], [16], [19]. Moreover, No-FAT has three advantages over prior works. First, prior works suffer from high performance overheads (100%) which this work mitigates. Second, we show that a degree of temporal safety and intra-object spatial safety can be offered by our implemen- tation over prior software works with simple modifications. Third, we improve over prior works by providing support for arbitrary sized allocations (as opposed to power-of-two allocation sizes).
All of No-FAT’s software transformations are performed using the Clang/LLVM compiler framework [32]. Our ex- perimental results with the SPEC CPU2017 benchmark suite indicate that the overheads of No-FAT are on average 8% with very conservative measurements. Our VLSI implementation results with 45nm NangateOpenCell show that No-FAT can be efficiently added to modern processors with negligible performance, area, and power overheads.
In summary, this paper makes the case for standardizing memory allocation sizes and explicitly making this informa- tion available to the architecture. We observe at least three distinct benefits:
• Improving fuzz-testing time. Currently companies spend hundreds of millions of dollars testing software programs for bugs. A majority of these bugs tend to be intricate memory safety bugs. Exposing allocation sizes to hardware simplifies the checks for memory safety and improves the fuzz testing bandwidth by over 10x based on state-of-the-art solutions (e.g., AddressSanitizer based fuzzing).
• Improving run-time security. Despite the best effort of software engineers to produce bug-free code, some of these bugs do end up in production, and pose a risk to end users. If users wish to protect against remaining residual risk, our solution offers the lowest overhead protection among all published memory safety solutions that thwart data corruption attacks.
• Improving resilience to Spectre-V1 attacks. Exposing allocation sizes to the hardware allows the hardware to effectively perform bounds checking even for speculative memory accesses.

II. NO-FAT SYSTEM OVERVIEW
A. Preliminaries
Binning memory allocators have gained prominence in the past decade and are now widely used [21], [20], [18], [34]. In a binning alloctor, the heap is divided into regions where each region is used to allocate objects of a pre-determined size. Thus, the memory size returned to a program is rounded up to one of the standard sizes offered by the allocator. For example, allocation requests that are less than 16 bytes come from the first region, allocation requests for 16 to 32 bytes come from the second region and so on. In contrast, non- binning allocators can provide the exact amount of memory requested by the program at the cost of an additional allocation header to store its size [33]. Binning allocators trade off a little memory fragmentation for faster allocation and deallocation times, and practically speaking, the fragmentation overheads tend to be negligible for most programs. In this paper, we expose the pre-determined sizes offered by a binning memory allocator to the hardware to provide memory safety.2
B. How does No-FAT provide Inter-allocation Spatial Memory Safety?
The goal of inter-allocation spatial memory safety is to be able to identify pointer-based accesses that access addresses outside the region of memory allocated to that pointer. To perform this check we need three pieces of information: (1) the starting address of the allocation, (2) the size of the space allocated to the pointer, and (3) the address of the pointer- based access. The benefit of binning allocators is that (1) and (2) can be computed from (3) using simple arithmetic and concurrently with the data access.
Given a pointer address, we determine the region that the pointer is from. Say each region is S GiB, and the heap starts at address H. Then the region of the pointer is (ptr − H) >> log2(S). Once the region is known, we can know the size of the allocation because all allocations from the region are of the same size. The base address of the allocation can be computed by ⌊(ptr/size)⌋∗size, whereas the combination of integer division and multiplication has the effect of rounding ptr down to the nearest size(ptr)-aligned boundary, which is the base address.
For example, let us assume that the heap starting address is H = 0x380000000000 and the memory allocator uses 64 bins (i.e., regions) each of size 32 GiB, where the third region is used to store allocations of size 32B. When the program executes char* A = malloc(32), the memory allocator might return the following base address: 0x381000000040. Now, given an arbitrary pointer ptr = 0x381000000045, the hardware computes the region number by subtracting the heap starting address (i.e., 0x001000000045) and ignor- ing the 35 LSBs (i.e., 0x002, which is the third region). Then, the hardware retrieves the allocation size from the
2A recent study [60] proposes passing semantic information from software to hardware to achieve better resource utilization and enhance performance. However, neither allocation size nor fine-grained security were included.
hardware table. Finally, the base address can be computed as ⌊(0x381000000045/32)⌋ ∗ 32 = 0x381000000040.
How does this information help protect against attacks? Let us say an attacker has the ability to control the index variable of a dynamically allocated array. With this ability, the attacker can cause the pointer to go out-of-bounds and subvert the memory instruction in order to read/write from a different allocation. If we simply calculated the base address from the attacker modified address we would not be able to catch the attack since we do not have an expectation of what the base address ought to have been. To avoid this case, No- FAT extends memory access instructions with an extra operand that carries a trusted base address. The trusted base address is simply the base address returned by malloc. This way we can verify the correctness of the access by computing the base address of the input pointer and matching it against the trusted base address, which is part of the instruction.
Computing the base address of a pointer for every memory access instruction is a costly operation as it includes a 64-bit division operation followed by a 64-bit multiplication. Divi- sion is a relatively expensive operation even on modern CPUs. To simplify the bounds checking operation, No-FAT uses the following check isValid(ptr,base) = ptr−base < size(base). The idea is simple. As the instruction holds the trusted base address, we first compute its corresponding size by extracting the region number as explained before. Then, we compare this size to the difference between the input pointer and the trusted (i.e., instruction-based) base address. If the pointer overflows to an adjacent allocation, the difference will be larger than the computed difference. If the pointer underflows to a previous allocation, ptr − base will be a negative number that will be interpreted as a large positive number that is ≥ size(base) as we use unsigned arithmetic. To make No-FAT compatible with unprotected code, mem- ory instructions that need to perform the check are emitted using special instructions. Specifically No-FAT uses Secure Load (secure_load) and Secure Store (secure_store) instructions (see Section III) that use the allocation base address as a distinct operand. This operand is propagated in the binary using a compiler pass (see Section V). This way secure_load and secure_store can verify access boundaries using the isValid check, as described above. On machines which do not have hardware support for No-FAT, secure_load can be interpreted as a regular load and the third operand will be ignored. C. How does No-FAT provide Intra-allocation Spatial Memory Safety? The goal of intra-allocation spatial memory safety is to prevent overflows from one field to another within the same allocation. The strategy used by No-FAT for intra-allocation safety is to convert the intra-allocation memory safety problem to an inter-allocation problem. No-FAT uses a source-to-source transformation, Buf2Ptr, which has been previously used in the area of data layout optimizations for enhancing perfor- mance [25], [49], [66]. Buf2Ptr promotes buffer fields, which exist in C/C++ structs (or classes), into their own allocations. To illustrate Buf2Ptr, consider the example in Listing 2. The array field, buf[10], within the struct, Foo, is replaced with a promoted pointer, p_buf, and a new variable for the original array is defined (Foo_buf[10]). As a result of this transformation, allocations, deallocations, and usages of the original field must also be properly promoted. For example, an allocation for a composite data type (e.g., Foo) becomes separate allocations based on the number of fields promoted (e.g., Foo_buf). As the standalone allocations have their own base address, they can be protected with No-FAT, as described above. Instead, we use a different, simpler abstraction. Whenever a data pointer goes out of context (i.e., passed to another function or spilled to memory), we first verify that it is an in- bounds pointer using a Verify Bounds (verify_bounds) instruction (see Section III). When a pointer is loaded from memory, we first compute its base address using a compute_base instruction and propagate this base address to all the following memory instructions as a third operand. With this approach, can the attacker abuse pointers that escape to another function? This is not possible because (1) we verify the bounds of the pointer before spilling it to memory and (2) we protect the memory with No-FAT so we are assured that the pointer stored in memory cannot be overwritten. This abstraction also permits No-FAT to use only intra-procedural analysis, which simplifies the implementation. Going back to our example, we first verify the bounds of q before calling Bar(q). This is done with one verify_bounds instruction that takes the base address of q as an operand and matches it against the computed base address of p + 16. Inside Bar, we first call compute_base with q as an operand to retrieve its base address and propagate this base address to all memory instructions that uses q as an address. III. ARCHITECTURE SUPPORT No-FAT adds the following instructions to the ISA: • secure_store/secure_load , , : These instructions use three register operands. The values in registers R1 and R2 point to the store/load address and source/destination register as usual. The value in register R3 is reserved for the allocation base address and is propagated by the compiler. Upon executing this instruction, the hardware computes the allocation size of R3 and compares it to the difference between the address stored in R1 and R3. An ex- ception is thrown in case of R1 − R3 ≥ size(R3). Additionally, the hardware matches the upper 16 bits of R1 and R3 to detect temporal memory safety violations.
• verify_bounds , : This instruction is used to check the bounds of pointers before storing them to memory (or passing them to a different function). It uses two register operands. The value in register R1 is a pointer whereas the value in register R2 is reserved for the allocation base address and is propagated by the compiler. Similar to secure_store and secure_load, upon executing this instruction, the hardware computes the allocation size of R2 and compares it to the difference between the address stored in R1 and R2. An exception is thrown in case of R1 − R2 ≥ size(R2) to indicate that an out-of-bounds pointer is being stored to memory.
• compute_base , : This instruction takes a memory address (i.e., pointer) as input in R1 and returns the allocation base address of this pointer in R2. This instruction is used to retrieve the c

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts