Using Address Independent Seed Encryption and Bonsai s to Make Secure Processors OS- and Performance-Friendly ∗
, , Dept. of Electrical and Computer Engineering North Carolina State University {bmrogers, schhabr,
College of Computing Georgia Institute of Technology
In today’s digital world, computer security issues have become increasingly important. In particular, researchers have proposed designs for secure processors which utilize hardware-based mem- ory encryption and integrity verification to protect the privacy and integrity of computation even from sophisticated physical attacks. However, currently proposed schemes remain hampered by prob- lems that make them impractical for use in today’s computer sys- tems: lack of virtual memory and Inter-Process Communication support as well as excessive storage and performance overheads. In this paper, we propose 1) Address Independent Seed Encryption (AISE), a counter-mode based memory encryption scheme using a novel seed composition, and 2) Bonsai s (BMT), a novel -based memory integrity verification technique, to elim- inate these system and performance issues associated with prior counter-mode memory encryption and integrity veri- fication schemes. We present both a qualitative discussion and a quantitative analysis to illustrate the advantages of our techniques over previously proposed approaches in terms of complexity, feasi- bility, performance, and storage. Our results show that AISE+BMT reduces the overhead of prior memory encryption and integrity ver- ification schemes from 12% to 2% on average, while eliminating critical system-level problems.
Copyright By PowCoder代写 加微信 powcoder
1. Introduction
With the tremendous amount of digital information stored on today’s computer systems, and with the increasing motivation and ability of malicious attackers to target this wealth of information, computer security has become an increasingly important topic. An important research effort towards such computer security issues fo- cuses on protecting the privacy and integrity of computation to pre- vent attackers from stealing or modifying critical information. This type of protection is important for enabling many important fea- tures of secure computing such as enforcement of Digital Rights Management, reverse engineering and software piracy prevention, and trusted distributed computing.
One important emerging security threat exploits the fact that most current computer systems communicate data in its plaintext form along wires between the processor chip and other chips such as the main memory. Also, the data is stored in its plaintext form in the main memory. This presents a situation where, by dumping the memory content and scanning it, attackers may gain a lot of valu- able sensitive information such as passwords [12]. Another serious
∗This work is supported in part by the National Science Foundation through grants CCF-0347425, CCF-0447783, and CCF-0541080.
and feasible threat is physical or hardware attacks which involve placing a bus analyzer that snoops data communicated between the processor chip and other chips [7, 8]. Although physical attacks may be more difficult to perform than software-based attacks, they are very powerful as they can bypass any software security protec- tion employed in the system. The proliferation of mod-chips that bypass Digital Rights Management protection in game systems has demonstrated that given sufficient financial payoffs, physical attacks are very realistic threats.
Recognizing these threats, computer architecture researchers have recently proposed various types of secure processor architec- tures [4, 5, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26]. Secure pro- cessors assume that off-chip communication is vulnerable to attack and that the chip boundary provides a natural security boundary. Under these assumptions, secure processors seek to provide private and tamper-resistant execution environments [23] through memory encryption [5, 13, 14, 16, 18, 19, 20, 22, 23, 24, 25, 26] and mem- ory integrity verification [4, 14, 16, 17, 18, 20, 22, 23, 24, 26]. The chip industry also recognizes the need for secure processors, as ev- ident, for example, in the recent effort by IBM in the SecureBlue project [9] and Dallas Semiconductor [15]. Memory encryption protects computation privacy from passive attacks, where an adver- sary attempts to silently observe critical information, by encrypting and decrypting code and data as it moves on and off the processor chip. Memory integrity verification protects computation integrity from active attacks, where an adversary attempts to modify values in off-chip storage or communication channels, by computing and verifying Message Authentication Codes (MACs) as code and data moves on and off the processor chip.
Unfortunately, current memory encryption and integrity verifica- tion designs are not yet suitable for use in general purpose comput- ing systems. In particular, we show in this paper that current secure processor designs are incompatible with important features such as virtual memory, Inter-Process Communication (IPC), in addition to having large performance and storage overheads. The challenges are detailed as follows:
Memory Encryption. Recently proposed memory encryption schemes for secure processors have utilized counter-mode encryp- tion due to its ability to hide cryptographic delays on the critical path of memory fetches. This is achieved by applying a block cipher to a seed to generate a cryptographic pad, which is then bit-wise XORed with the memory block to encrypt or decrypt it. A seed is selected to be independent from the data block value so that pad generation can be started while the data block is being fetched.
In counter-mode encryption, the choice of seed value is critical for both security and performance. The security of counter-mode requires the uniqueness of each pad value, which implies that each
seed must be unique. In prior studies [16, 18, 19, 20, 23, 24, 25, 26], to ensure that pads are unique across different blocks in memory (spatial uniqueness), the block address is used as one of the seed’s components. To ensure that pads are unique across different val- ues of a particular block over time (temporal uniqueness), a counter value which is incremented on each write back is also used as a seed component. From the performance point of view, if most cache misses find the counters of the missed blocks available on-chip, ei- ther because they are cached or predicted, then seeds can be com- posed at the cache miss time, and pad generation can occur in par- allel with fetching the blocks from memory.
However, using the address (virtual or physical) as a seed com- ponent causes a significant system-level dilemma in general purpose computing systems that must support virtual memory and Inter- Process Communication (IPC). A virtual memory mechanism typi- cally involves managing pages to provide process isolation and shar- ing between processes. It often manages the main memory by ex- tending the physical memory to swap memory located on the disk.
Using the physical address as a seed component creates re- encryption work on page swapping. When a page is swapped out to disk and then back into memory, it will likely reside at a new phys- ical address. This requires the blocks of the page to be decrypted using their previous physical addresses and re-encrypted with their new physical addresses. In addition, encrypted pages in memory cannot be simply swapped out to disk as this creates potential pad reuse between the swapped out page and the new page at that phys- ical address in memory. This leaves an open problem as to how to protect pages on disk. We could entrust the OS to encrypt and decrypt swapped pages in software if the OS is assumed to be au- thentic, trusted, and executing on the secure processor. However this is likely not the most desirable solution because it makes the secure processor’s hardware-based security mechanisms contingent on a secure and uncompromised OS. Alternatively, we could rely on hardware to re-encrypt swapped pages, however this solution has its own set of problems. First, this requires supporting two encryption methods in hardware. Second, there is the issue of who can request the page re-encryptions, and how these requests are made, which requires an extra authentication mechanism.
Using virtual address as a seed component can lead to vulnera- ble pad reuse because different processes use the same virtual ad- dresses. While we can prevent this by adding process ID to the seed [24], this solution creates a new set of serious system-level problems. First, this renders process IDs non-reusable, and current OSes have a limited range of process IDs. Second, shared mem- ory based inter-process communication (IPC) mechanisms are in- feasible to use (e.g. mmap). The reason is that different processes access a shared page in memory using different combinations of virtual address and process ID. This results in different encryptions and decryptions of the shared data. Third, other OS features that also utilize page sharing cannot be supported. For example, pro- cess forking cannot utilize the copy-on-write optimization because the page in the parent and child are encrypted differently. This also holds true for shared libraries. This lack of IPC support is especially problematic in the era of CMPs. Finally, storage is required for vir- tual addresses at the lowest level on-chip cache, which is typically physically indexed and tagged.
The root cause of problems when using address in seed compo- sition is that address is used as a fundamental component of memory management. Using address also as a basis for security intermingles security and memory management in undesirable ways.
Memory Integrity Verification. Recently proposed memory in- tegrity verification schemes for secure processors have leveraged a variety of techniques [4, 9, 14, 17, 20, 22, 23, 24]. However, the security of -based schemes [4] has been shown to be stronger than other schemes because every block read from mem- ory is verified individually (as opposed to [23]), and data replay attacks can be detected in addition to spoofing and splicing attacks, which are detectable by simply associating a single MAC per data block [14]. In memory integrity verification, a tree of MAC values is built over the memory. The root of this tree never goes off-chip, as a special on-chip register is used to hold its current value. When a memory block is fetched, its integrity can be checked by verifying its chain of MAC values up to the root MAC. Since the on-chip root MAC contains information about every block in the physical memory, an attacker cannot modify or replay any value in memory.
Despite its strong security, integrity verification suf- fers from two significant issues. First, since a built over the main memory computes MACs on memory events (cache misses and writebacks) generated by the processor, it covers the physical memory, but not swap memory which resides on disk. Hence, al- though schemes can prevent attacks against values read from memory, there is no protection for data brought into memory from the disk. This is a significant security vulnerability since by tampering with swap memory on disk, attackers can indirectly tam- per with main memory. One option would be to entrust the OS to protect pages swapped to and from the disk, however as with mem- ory encryption it requires the assumption of a trusted OS. Another option, as discussed in [22], is to associate one and on- chip secure root per process. However, managing multiple s results in extra on-chip storage and complexity.
Another significant problem is the storage overhead of internal nodes in both the on-chip cache and main memory. To avoid repeated computation of internal nodes as blocks are read from memory, a popular optimization lets recently accessed internal nodes be cached on-chip. Using this optimiza- tion, the verification of a memory block only needs to proceed up the tree until the first cached node is found. Thus, it is not neces- sary to fetch and verify all nodes up to the root on each memory access, significantly improving memory bandwidth con- sumption and verification performance. However, our results show that nodes can occupy as much as 50% of the total L2 cache space, which causes the application to suffer from a large number of cache capacity misses.
Contributions. In this paper, we investigate system-level issues in secure processors, and propose mechanisms to address these is- sues that are simple yet effective. Our first contribution is Address Independent Seed Encryption (AISE), which decouples security and memory management by composing seeds using logical identifiers instead of virtual or physical addresses. The logical identifier of a block is the concatenation of a logical page identifier with the page offset of the block. Each page has a logical page identifier which is distinct across the entire memory and over the lifetime of the sys- tem. It is assigned to the page the first time the page is allocated or when it is loaded from disk. AISE provides better security since it provides complete seed/pad uniqueness for every block in the sys- tem (both in the physical and swap memory). At the same time, it also easily supports virtual memory and shared-memory based IPC mechanisms, and simplifies page swap mechanisms by not requir- ing decryption and re-encryption on a page swap.
The second contribution of this paper is a novel and efficient extension to based memory integrity verification that allows extending the to protect off-chip data (i.e. both physical and swap memory) with a single and secure root MAC over the physical memory. Essentially, our approach al- lows pages in the swap memory to be incorporated into the so that they can be verified when they are reloaded into mem- ory.
Finally, we propose Bonsai s (BMTs), a novel orga- nization of the that naturally leverages counter-mode encryption to reduce its memory storage and performance over- heads. We observe that if each data block has a MAC value com- puted over the data and its counter, a replay attack must attempt to replay an old data, MAC, and counter value together. A built over the memory is able to detect any changes to the data MAC, which prevents any undetected changes to counter val- ues or data. Our key insight is that: (1) there are many more MACs of data than MACs of counters, since counters are much smaller than data blocks, (2) a that protects counters prevents any undetected counter modification, (3) if counter modification is thus prevented, the does not need to be built over data MACs, and (4) the over counters is much smaller and significantly shallower than the one over data. As a result, we can build such a Bonsai over the counters which prevents data replay attacks using a much smaller tree for less memory stor- age overhead, fewer MACs to cache, and a better worst-case sce- nario if we miss on all levels of the tree up to the root. As our re- sults show, BMT memory integrity verification reduces the perfor- mance overhead significantly, from 12.1% to 1.8% across all SPEC 2000 benchmarks [21], along with reducing the storage overhead in memory from 33.5% to 21.5%.
In the remainder of this paper, we discuss related work in sec- tion 2. Section 3 describes our assumed attack model. Section 4 describes our proposed encryption technique while section 5 de- scribes our proposed integrity verification techniques in detail. Sec- tion 6 shows our experimental setup, and section 7 discusses our results and findings. Finally, section 8 summarizes our main contri- butions and results.
2 Related Work
Research on secure processor architectures [4, 5, 9, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26] consists of memory encryption for ensuring data privacy and memory integrity verification for ensur- ing data integrity. Early memory encryption schemes utilized direct encryption modes [5, 13, 14, 22], in which a block cipher such as AES [2] is applied directly on a memory block to generate the plain- text or ciphertext when the block is read from or written to memory. Since, on a cache miss for a block, the block must first be fetched on chip before it can be decrypted, the long latency of decryption is added directly to the memory fetch latency, resulting in execu- tion time overheads of up to 35% (almost 17% on average) [25]. In addition, there is a security concern for using direct encryption because different blocks having the same data value would result in the same encrypted value (ciphertext). This property implies that the statistical distribution of plaintext values matches the statistical distribution of ciphertext values, and may be exploited by attackers.
As a result of these concerns, recent studies have leveraged counter-mode encryption techniques [16, 18, 19, 20, 23, 24, 25, 26]. Counter-mode encryption overlaps decryption and memory fetch
by decoupling them. This decoupling is achieved by applying a block cipher to a seed value to generate a cryptographic pad. The actual encryption or decryption is performed through an XOR of the plaintext or ciphertext with this pad. The security of counter- mode depends on the guarantee that each pad value (and thus each seed) is only used once. Consequently, a block’s seed is typically constructed by concatenating the address of the block with a per- block counter value which is incremented each time the block is encrypted [19, 23, 24, 25]. If the seed components are available on chip at cache miss time, decryption can be started while the block is fetched from memory. Per-block counters can be cached on chip [23, 24, 25] or predicted [19].
Several different approaches have previously been studied for memory integrity verification in secure processors. These ap- proaches include a MAC-based scheme where a MAC is computed and stored with each memory block when the processor writes to memory, and the MAC is verified when the processor reads from memory [14]. In [23], a Log Hash scheme was proposed where the overhead of memory integrity verification is reduced by check- ing the integrity of a series of values read from memory at periodic intervals during a program’s execution using incremental, multiset hash functions. based schemes have also been pro- posed where a tree of MAC values is stored over the physical mem- ory [4]. The root of the tree, which stores information about every block in memory, is kept in a secure register on-chip. integrity verification is often preferable over other schemes because of its security strength. In addition to spoofing and splicing attacks, replay attacks can also be prevented. We note that the Log Hash scheme can also prevent replay attacks, but as shown in [20], the long time intervals between integrity checks can leave the system open to attack.
The proposed scheme in this study differs from prior studies in the following ways. Our memory encryption avoids intermin- gling security with memory management by using logical identifiers (rather than address) as seed components. Our memory integrity verification scheme extends protection to the disk in a novel way, and our BMT scheme significantly reduces the size. The implications of this design will be discussed in detail in the following sections.
3. Attack Model and Assumptions
As in prior studies on hardware-based memory encryption and integrity verification, our attack model
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com