Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors
Yoongu Kim1 1 Lee1 Donghyuk Lee1 2 1
1 University 2Intel Labs
Abstract. Memory isolation is a key property of a reliable and secure computing system — an access to one memory ad- dress should not have unintended side effects on data stored in other addresses. However, as DRAM process technology scales down to smaller dimensions, it becomes more difficult to prevent DRAM cells from electrically interacting with each other. In this paper, we expose the vulnerability of commodity DRAM chips to disturbance errors. By reading from the same address in DRAM, we show that it is possible to corrupt data in nearby addresses. More specifically, activating the same row in DRAM corrupts data in nearby rows. We demonstrate this phenomenon on Intel and AMD systems using a malicious program that generates many DRAM accesses. We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers. From this we conclude that many deployed systems are likely to be at risk. We identify the root cause of disturbance errors as the repeated toggling of a DRAM row’s wordline, which stresses inter-cell coupling effects that accelerate charge leakage from nearby rows. We provide an extensive characterization study of disturbance er- rors and their behavior using an FPGA-based testing plat- form. Among our key findings, we show that (i) it takes as few as 139K accesses to induce an error and (ii) up to one in every 1.7K cells is susceptible to errors. After examining var- ious potential ways of addressing the problem, we propose a low-overhead solution to prevent the errors.
Copyright By PowCoder代写 加微信 powcoder
1. Introduction
The continued scaling of DRAM process technology has enabled smaller cells to be placed closer to each other. Cram- ming more DRAM cells into the same area has the well- known advantage of reducing the cost-per-bit of memory. Increasing the cell density, however, also has a negative impact on memory reliability due to three reasons. First, a small cell can hold only a limited amount of charge, which reduces its noise margin and renders it more vulner- able to data loss [14, 47, 72]. Second, the close proximity of cells introduces electromagnetic coupling effects between them, causing them to interact with each other in undesirable ways [14, 42, 47, 55]. Third, higher variation in process tech- nology increases the number of outlier cells that are excep- tionally susceptible to inter-cell crosstalk, exacerbating the two effects described above.
As a result, high-density DRAM is more likely to suffer from disturbance, a phenomenon in which different cells in- terfere with each other’s operation. If a cell is disturbed beyond its noise margin, it malfunctions and experiences a disturbance error. Historically, DRAM manufacturers have been aware of disturbance errors since as early as the Intel 1103, the first commercialized DRAM chip [58]. To mitigate
Work done while at University. 978-1-4799-4394-4/14/$31.00c 2014IEEE
disturbance errors, DRAM manufacturers have been employ- ing a two-pronged approach: (i) improving inter-cell isola- tion through circuit-level techniques [22, 32, 49, 61, 73] and (ii) screening for disturbance errors during post-production testing [3, 4, 64]. We demonstrate that their efforts to contain disturbance errors have not always been successful, and that erroneous DRAM chips have been slipping into the field.1
In this paper, we expose the existence and the widespread nature of disturbance errors in commodity DRAM chips sold and used today. Among 129 DRAM modules we analyzed (comprising 972 DRAM chips), we discovered disturbance errors in 110 modules (836 chips). In particular, all modules manufactured in the past two years (2012 and 2013) were vul- nerable, which implies that the appearance of disturbance er- rors in the field is a relatively recent phenomenon affecting more advanced generations of process technology. We show that it takes as few as 139K reads to a DRAM address (more generally, to a DRAM row) to induce a disturbance error. As a proof of concept, we construct a user-level program that continuously accesses DRAM by issuing many loads to the same address while flushing the cache-line in between. We demonstrate that such a program induces many disturbance errors when executed on Intel or AMD machines.
We identify the root cause of DRAM disturbance errors as voltage fluctuations on an internal wire called the wordline. DRAM comprises a two-dimensional array of cells, where each row of cells has its own wordline. To access a cell within a particular row, the row’s wordline must be enabled by rais- ing its voltage — i.e., the row must be activated. When there are many activations to the same row, they force the word- line to toggle on and off repeatedly. According to our obser- vations, such voltage fluctuations on a row’s wordline have a disturbance effect on nearby rows, inducing some of their cells to leak charge at an accelerated rate. If such a cell loses too much charge before it is restored to its original value (i.e., refreshed), it experiences a disturbance error.
We comprehensively characterize DRAM disturbance er- rors on an FPGA-based testing platform to understand their behavior and symptoms. Based on our findings, we exam- ine a number of potential solutions (e.g., error-correction and frequent refreshes), which all have some limitations. We pro- pose an effective and low-overhead solution, called PARA, that prevents disturbance errors by probabilistically refresh- ing only those rows that are likely to be at risk. In contrast to other solutions, PARA does not require expensive hardware structures or incur large performance penalties. This paper makes the following contributions.
1The industry has been aware of this problem since at least 2012, which is when a number of patent applications were filed by Intel regarding the problem of “row hammer” [6, 7, 8, 9, 23, 24]. Our paper was under review when the earliest of these patents was released to the public.
To our knowledge, this is the first paper to expose the widespread existence of disturbance errors in commodity DRAM chips from recent years.
Weconstructauser-levelprogramthatinducesdisturbance errors on real systems (Intel/AMD). Simply by reading from DRAM, we show that such a program could poten- tially breach memory protection and corrupt data stored in pages that it should not be allowed to access.
We provide an extensive characterization of DRAM dis- turbance errors using an FPGA-based testing platform and 129 DRAM modules. We identify the root cause of distur- bance errors as the repeated toggling of a row’s wordline. We observe that the resulting voltage fluctuation could dis- turb cells in nearby rows, inducing them to lose charge at an accelerated rate. Among our key findings, we show that (i) disturbable cells exist in 110 out of 129 modules, (ii) up to one in 1.7K cells is disturbable, and (iii) toggling the wordline as few as 139K times causes a disturbance error.
After examining a number of possible solutions, we pro- pose PARA (probabilistic adjacent row activation), a low- overhead way of preventing disturbance errors. Every time a wordline is toggled, PARA refreshes the nearby rows with a very small probability (p1). As a wordline is tog- gled many times, the increasing disturbance effects are off- set by the higher likelihood of refreshing the nearby rows.
2. DRAM Background
In this section, we provide the necessary background on DRAM organization and operation to understand the cause and symptoms of disturbance errors.
2.1. High-Level Organization
DRAM chips are manufactured in a variety of configura- tions [34], currently ranging in capacities of 1–8 Gbit and in data-bus widths of 4–16 pins. (A particular capacity does not imply a particular data-bus width.) By itself, an individual DRAM chip has only a small capacity and a narrow data-bus. That is why multiple DRAM chips are commonly ganged to- gether to provide a large capacity and a wide data-bus (typi- cally 64-bit). Such a “gang” of DRAM chips is referred to as a DRAM rank. One or more ranks are soldered onto a circuit board to form a DRAM module.
2.2. Low-Level Organization
As Figure 1a shows, DRAM comprises a two-dimensional array of DRAM cells, each of which consists of a capacitor and an access-transistor. Depending on whether its capaci- tor is fully charged or fully discharged, a cell is in either the charged state or the discharged state, respectively. These two states are used to represent a binary data value.
As Figure 1b shows, every cell lies at the intersection of two perpendicular wires: a horizontal wordline and a vertical bitline. A wordline connects to all cells in the horizontal di- rection (row) and a bitline connects to all cells in the vertical direction (column). When a row’s wordline is raised to a high voltage, it enables all of the access-transistors within the row, which in turn connects all of the capacitors to their respective bitlines. This allows the row’s data (in the form of charge) to be transferred into the row-buffer shown in Figure 1a. Bet- ter known as sense-amplifiers, the row-buffer reads out the charge from the cells — a process that destroys the data in
row-buffer
row 4 row3 row2 row1 row0
a. Rows of cells
Figure 1. DRAM consists of cells
the cells — and immediately writes the charge back into the cells [38, 41, 43]. Subsequently, all accesses to the row are served by the row-buffer on behalf of the row. When there are no more accesses to the row, the wordline is lowered to a low voltage, disconnecting the capacitors from the bitlines. A group of rows is called a bank, each of which has its own dedicated row-buffer. (The organization of a bank is simi- lar to what was shown in Figure 1a.) Finally, multiple banks come together to form a rank. For example, Figure 2 shows a 2GB rank whose 256K rows are vertically partitioned into eight banks of 32K rows, where each row is 8KB (D64Kb) in size [34]. Having multiple banks increases parallelism be- cause accesses to different banks can be served concurrently.
b. A single cell
data cmd addr
Figure 2. Memory controller, buses, rank, and banks 2.3. Accessing DRAM
An access to a rank occurs in three steps: (i) “opening” the desired row within a desired bank, (ii) accessing the desired columns from the row-buffer, and (iii) “closing” the row.
1. Open Row. A row is opened by raising its wordline. This connects the row to the bitlines, transferring all of its data into the bank’s row-buffer.
2. Read/Write Columns. The row-buffer’s data is accessed by reading or writing any of its columns as needed.
3. Close Row. Before a different row in the same bank can be opened, the original row must be closed by lowering its wordline. In addition, the row-buffer is cleared.
The memory controller, which typically resides in the pro-
cessor (Figure 2), guides the rank through the three steps by issuing commands and addresses as summarized in Table 1. After a rank accepts a command, some amount of delay is re- quired before it becomes ready to accept another command. This delay is referred to as a DRAM timing constraint [34]. For example, the timing constraint defined between a pair of ACTIVATEs to the same row (in the same bank) is referred to as tRC (row cycle time), whose typical value is 50 nanosec- onds [34]. When trying to open and close the same row as quickly as possible, tRC becomes the bottleneck — limiting the maximum rate to once every tRC.
1. Open Row
2. Read/Write Column 3. Close Row
ACTIVATE (ACT) READ/WRITE PRECHARGE (PRE)
Address(es)
Bank, Row Bank, Column Bank
electromagnetic coupling [15, 49, 55]. This partially enables the adjacent row of access-transistors for a short amount of time and facilitates the leakage of charge. Second, bridges are a well-known class of DRAM faults in which conductive channels are formed between unrelated wires and/or capaci- tors [3, 4]. One study on embedded DRAM (eDRAM) found that toggling a wordline could accelerate the flow of charge between two bridged cells [29]. Third, it has been reported that toggling a wordline for hundreds of hours can perma- nently damage it by hot-carrier injection [17]. If some of the hot-carriers are injected into the neighboring rows, this could modify the amount of charge in their cells or alter the charac- teristic of their access-transistors to increase their leakiness.
Disturbance errors occur only when the cumulative inter- ference effects of a wordline become strong enough to disrupt the state of nearby cells. In the next section, we demonstrate a small piece of software that achieves this by continuously reading from the same row in DRAM.
4. Real System Demonstration
We induce DRAM disturbance errors on Intel ( , , and Haswell) and AMD (Piledriver) sys- tems using a 2GB DDR3 module. We do so by running Code 1a, which is a program that generates a read to DRAM on every data access. First, the two mov instructions read from DRAM at address X and Y and install the data into a register and also the cache. Second, the two clflush instructions evict the data that was just installed into the cache. Third, the mfence instruction ensures that the data is fully flushed before any subsequent memory instruction is executed.3 Fi- nally, the code jumps back to the first instruction for another iteration of reading from DRAM. (Note that Code 1a does not require elevated privileges to execute any of its instructions.)
Refresh (Section 2.4)
Table 1. DRAM commands and addresses [34]
2.4. Refreshing DRAM
The charge stored in a DRAM cell is not persistent. This is due to various leakage mechanisms by which charge can dis- perse: e.g., subthreshold leakage [56] and gate-induced drain leakage [57]. Eventually, the cell’s charge-level would de- viate beyond the noise margin, causing it to lose data — in other words, a cell has only a limited retention time. Be- fore this time expires, the cell’s charge must be restored (i.e., refreshed) to its original value: fully charged or fully dis- charged. The DDR3 DRAM specifications [34] guarantee a retention time of at least 64 milliseconds, meaning that all cells within a rank need to be refreshed at least once during this time window. Refreshing a cell can be accomplished by opening the row to which the cell belongs. Not only does the row-buffer read the cell’s altered charge value but, at the same time, it restores the charge to full value (Section 2.2). In fact, refreshing a row and opening a row are identical op- erations from a circuits perspective. Therefore, one possible way for the memory controller to refresh a rank is to issue an ACT command to every row in succession. In practice, there exists a separate REF command which refreshes many rows at a time (Table 1). When a rank receives a REF, it automati- cally refreshes several of its least-recently-refreshed rows by internally generating ACT and PRE pairs to them. Within any given 64ms time window, the memory controller issues a suf- ficient number of REF commands to ensure that every row is refreshed exactly once. For a DDR3 DRAM rank, the mem- ory controller issues 8192 REF commands during 64ms, once every 7.8us (D64ms/8192) [34].
3. Mechanics of Disturbance Errors
In general, disturbance errors occur whenever there is a strong enough interaction between two circuit components (e.g., capacitors, transistors, wires) that should be isolated from each other. Depending on which component interacts with which other component and also how they interact, many different modes of disturbance are possible.
Among them, we identify one particular disturbance mode that afflicts commodity DRAM chips from all three major manufacturers. When a wordline’s voltage is toggled repeat- edly, some cells in nearby rows leak charge at a much faster rate. Such cells cannot retain charge for even 64ms, the time interval at which they are refreshed. Ultimately, this leads to the cells losing data and experiencing disturbance errors.
Without analyzing DRAM chips at the device-level, we cannot make definitive claims about how a wordline interacts with nearby cells to increase their leakiness. We hypothe- size, based on past studies and findings, that there may be three ways of interaction.2 First, changing the voltage of a wordline could inject noise into an adjacent wordline through
2At least one major DRAM manufacturer has confirmed these hypothe- ses as potential causes of disturbance errors.
REFRESH (REF)
2 mov (X), %eax
3 mov (Y), %ebx
4 clflush (X)
5 clflush (Y)
7 jmp code1a
a. Induces errors
Code 1. Assembly code executed on Intel/AMD machines
On out-of-order processors, Code 1a generates multiple DRAM read requests, all of which queue up in the mem- ory controller before they are sent out to DRAM: (reqX, reqY, reqX , reqY , ). Importantly, we chose the values of X and Y so that they map to the same bank, but to different rows within the bank.4 As we explained in Section 2.3, this forces the memory controller to open and close the two rows repeat- edly: (ACTX, READX, PREX, ACTY, READY, PREY, ). Using the address-pair (X, Y), we then executed Code 1a for mil- lions of iterations. Subsequently, we repeated this procedure
3Without the mfence instruction, there was a large number of hits in the processor’s fill-buffer [30] as shown by hardware performance counters [31]. 4Whereas AMD discloses which bits of the physical address are used and how they are used to compute the DRAM bank address [5], Intel does not. We partially reverse-engineered the addressing scheme for Intel processors using a technique similar to prior work [46, 60] and determined that setting Y to XC8M achieves our goal for all four processors. We ran Code 1a within
a customized Memtest86+ environment [1] to bypass address translation.
2 mov (X), %eax 3 clflush (X)
7 jmp code1b
b. Does not induce errors
using many different address-pairs until every row in the 2GB module was opened/closed millions of times. In the end, we observed that Code 1a caused many bits to flip. For each pro- cessor, Table 2 reports the total number of bit-flips induced by Code 1a for two different initial states of the module: all ‘0’s or all ‘1’s.5;6 Since Code 1a does not write any data into DRAM, we conclude that the bit-flips are the manifestation of disturbance errors. We will show later in Section 6.1 that this particular module — which we named A19 (Section 5) — yields millions of errors under certain testing conditions.
Testing Platform. We programmed eight Xilinx FPGA boards [70] with a DDR3-800 DRAM memory con- troller [71], a PCIe 2.0 core [69], and a customized test en- gine. After equipping each FPGA board with a DRAM mod- ule, we connected them to two host computers using PCIe extender cables. We then enclosed the FPGA boards inside a heat chamber along with a thermocouple and a heater that are connected to an external temperature controller. Unless oth- erwise specified, all tests were run at 50 ̇2.0ıC (ambient).
Tests. We define a test as a sequence of DRAM accesses specifically designed to induce disturbance errors in a mod- ule. Most of our tests are derived from two snippets of pseu- docode listed above (Code 2): TestBulk and TestEach. The goal of TestBulk is to quickly identify the union of all cells that were disturbed after toggling every row many times. On the other hand, TestEach identifies which specific cells are disturbed when each row is toggled many times. Both tests take three input parameters: AI (activation interval), RI (re- fresh interval), and DP (data patter
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com