CS代写 ECE3375, Winter 2022

Memory & Memory Mapping
Profs. Leod & ECE3375, Winter 2022
This lesson provides some basic information about memory in computer systems, and introduces memory mapping. Memory mapping is a very common and use- ful technique used to assign the memory address space to different physical devices. Clever students will rec- ognize that memory mapping is a subject that is highly likely to be seen on the midterm and final exams.
Introduction to Memory

Copyright By PowCoder代写 加微信 powcoder

For our purposes, memory can be thought of as a long series of cells, each cell can store the same fundamental quantity of data.
• As previously mentioned, memory cells are most commonly one byte in width.

Each cell is identified by an address and contains a number. An address is a number that serves as the unique identifier for a single memory cell. The memory chip or circuitry decodes the address to identify the memory cell and either read or write to that cell as requested. The actual address decoding process uses a minterm.
• Every memory cell is connected to the n bit address bus through an n-input AND gate.
• A unique combination of lines or their complements from the address bus are used for each AND gate, such that for a given address on the bus only one AND gate will be high at a time.
This is shown schematically in Figure 1.
To be useful, memory should support two operations:
Definition: Load (sometimes called read) refers to the CPU retrieving data from memory and storing it in one (or more) of the registers inside the CPU. In this case, the CPU provides the address and the memory provides the number from the ap- propriate memory cell.
Definition: Store (sometimes called write) refers to the CPU putting data into memory. In this case the CPU provides both the address and the number to be placed in the appropriate memory cell.
it should be clear, from the description of these operations, that the data bus has to accommodate information travelling into and out of the CPU.
Figure 1: A simplified schematic of using minterms to load data. For simplicity, there is only a 3-bit memory space — obviously this is too small to be practical.

• The address bus only has to operate in “one direction”: infor- mation passes from the CPU to memory. It should be clear from the memory circuit in Figure 1 that nothing in memory is capable of putting information onto the address bus.
• The memory circuit in Figure 1 only allows the load operation, so it is not a circuit with full functionality. 1
• To allow the CPU to store to memory, the circuit in Figure 1 should provide a second set of tri-state drivers pointing from the data bus into the memory, and additional combinational logic to ensure that when a memory cell is selected, either only one of the two tri-state drivers are active, or there is an additional control on the memory cell itself to select whether data is being written or read.
• Some sort of global “memory enable” bit may also be neces- sary if the data bus is used for other purposes — the address line will always have a valid address (as (000)2 is a valid ad- dress) regardless of whether or not the CPU wants to do any- thing with the memory.
Furthermore, note that there is not necessarily any correlation be- tween the size of the address bus (the laughably small, but easy to draw, value of 3 is used in Figure 1) and the size of the data bus (just referenced as n in Figure 1).
• Most microcontrollers have byte addressable memory, where the memory cell (and consequently the data bus) are eight bits wide.
1 Because I (J. McLeod) am lazy and the figure is already cluttered enough.

• Some microcontrollers have word addressable memory, where either memory cells are larger than a byte, or (more com- monly) multiple byte-sized memory cells are grouped under the same address. If you want to access a byte of memory, you have to read the entire word-sized block and then extract the particular 8 bits of interest from a register in the CPU.
The data is referred to as a number, but in this context what is a number? Fundamentally, computers understand three different kinds of numbers. These are closely related to the signals discussed in the previous lesson with regards to the various buses in a micro- processor.
Definition: Data is a number stored in memory that represents some form of information used in computation. Data is the type of number that is of primary interest to the system’s users.
Definition: Memory can also store the address of other memory cells. When this is implemented in high-level computer languages, it is called a pointer. Remember, addresses are also numbers.
Definition: Finally, memory can store instructions for the CPU. These are numbers that index a pre-defined table of operations. These are often referred to as opcodes.
It is important to recognize that the only way the CPU can tell the difference between data, addresses, and opcodes is by con- text. High-level programming languages typically provide a lot of protection against accidentally mixing these types of numbers, low-level programming languages provide less protection.

• At the lowest level, it is perfectly possible to take two data values and add them together, then treat the result as an ad- dress, look up the number stored at that address, add it to the address number, then execute that result as an opcode.
• This is probably a terrible idea… but a CPU doesn’t know any better. 2
Finally, we will always use hex or binary for addresses (and usu- ally hex, since addresses can be quite large). It never makes sense to use a decimal value for an address. We will always use hex or binary for opcodes. It never makes sense to talk about an opcode as a decimal value. Furthermore, we will typically avoid talking about opcodes in numeric form whenever possible. Human-readable assembly language is difficult enough! Data, however, can be dis- cussed as a hex, binary, or decimal number, depending on what kind of data it is.
• This distinction is important when writing code for a micro- controller. For example, the compiler will not accept an ad- dress written as a decimal number, but often it will accept (and properly convert) a decimal value stored as data.
2 The compiler for the ARM®Cortex-A9 that we will use in this course is sophisticated enough to have some protection against doing these sort of shenanigans, but it is still good practice to carefully check your code!

Computer Programs
You have all taken some sort of computer programming course, so you are probably familiar with the basics of high-level program- ming. You also took ECE2277, so you should be familiar with the construction of basic computer hardware, like registers and adders. But what happens in between?
• Software is typically written in a high-level language (Python, Java, C++, etc.; for this course even C is considered “high- level”).
• The compiler converts the program to an optimized low-level language (i.e., Assembly).
• The assembler maps the low-level code to machine language, a hardware-specific binary representation of instructions and data.
For example, consider the following snippet of code written in C. 3
int i; // declare loop counter int total = 0; // declare running sum for (i = 0; i < 10; i++) // loop 10 times total += i; // add counter to sum } This code snippet calculates the running sum of numbers from 0 to 9, and stores the result in the integer variable total. This is simple 3 You may not be familiar with C (I think your programming course(s) were in Java?), and we will review the code later. This example here is simple enough you can probably understand it even if you are unfamiliar with C syntax. enough to understand in human terms, but what exactly does a computer processor do? For one thing, there is no such thing as a “named integer variable” in hardware: there is just storage for binary numbers. An equivalent snippet of assembly code (using ARMv7 assembly syntax) is given below. mov r0, #0 mov r1, #0 loop: add r1, r1, r0 adds r0, r0, #1 cmp r0 , #10 @ initialize r0 for sum @ initialize r1 for counter @ flag for start of loop @ increment running sum @ increment counter @ check if counter reached 10 @ branch back to loop if not The syntax for assembly is probably unfamiliar, and we will discuss it in much more detail in the upcoming lessons. For now it suffices to explain that r0 and r1 are CPU registers that are explicitly used to hold the running sum and the loop counter, respectively. Values are added together appropriately (using add and adds), then the program branches back to the start of the loop if necessary (with blt). The assembly language code can be converted line-by-line into the 32-bit machine language used by the ARM®Cortex-A9, as shown in Table 1. Therefore: • A computer program is a set of instructions in memory, usu- ally stored sequentially. • Each instruction consists of an opcode to tell the CPU what do to, and often a address and/or some data to tell the CPU what work with. 4 4 Consequentially, each instruction is often larger than a single memory cell. In the machine language example given in Table 1 each 32-bit num- ber contains the opcode and, if necessary, the data. Memory Address 0x0000 0000 0x0000 0004 0x0000 0008 0x0000 000C 0x0000 0010 0x0000 0014 Machine Code 0xE3A0 0000 0xE3A0 1000 0xE081 1000 0xE290 0001 0xE350 000A 0xBAFF FFFB Initialize running sum Initialize loop counter Increment running sum Increment counter Check if counter reached 10 Branch back to loop if not Table 1: Machine language for ARM®Cortex-A9 of example assembly code snippet. The CPU’s job is to fetch these instructions one at a time, perform that instruction, then advance to the next instruction. • Typically, the next instruction is literally at the next address in memory. • Certain instructions only serve to explicitly provide the ad- dress for the next instruction, allowing the CPU to branch somewhere else. Program Execution & Pipelining To expand on the concepts discussed above, we can elaborate on the CPU execution cycle. As previously mentioned, a CPU contains several registers for manipulating data. Definition: The program counter (pc) is a special register that holds the address in memory of the next instruction in the program currently running. When the CPU is powered on, pc is initialized in hardware to some default (often 0x0000 0000). The CPU then cycles through the following steps: 1. Fetches the instruction stored in the memory address held in pc. 2. Decodes the instruction. 3. Executes the instruction. Each of these steps is conducted by special-purpose hardware in the CPU, and is largely transparent to the programmer (and cer- tainly transparent to the user). This hardware is fully utilized by pipelining these steps. • The CPU is a synchronous sequential circuit, so each of the above steps takes a clock cycle. • However the CPU can operate faster than one complete in- struction for every four clock cycles. • Each of these steps uses separate hardware, so they can all be queued. • If a 32-bit architecture uses a 16-bit instruction set, 5 then two instructions can be fetched at once and held for sequential execution. 5 An ARM processor can be configured to operating in “thumb mode”, which uses 16- bit instructions — but we won’t do that in this course. This is shown schematically in Figure 2. Note that pc must also be incremented — this is not shown explicitly but it should occur basically simultaneously with the fetch step. Clock Instruction 1 Instruction 2 Instruction 3 Instruction 4 Decode Execute Fetch Decode Fetch Execute Decode Decode Decode Execute In reality program execution can be more complicated than the simple pipeline schematic shown in Figure 2: consider what might need to happen if the execution step for instruction 2 directs the program to conditionally branch to another part of the program. Fortunately, in this class, you just need a conceptual appreciation for the basics of CPU operation. Figure 2: Pipelining four instructions using (left) 32-bit instructions with 32-bit architecture, and (right) 16-bit instructions with 32-bit architec- ture. Types of Memory Memory is subdivided into two basic types: random-access mem- ory (RAM) and read-only memory (ROM). The name “random- access” is a bit archaic, since all modern memory is random-access. This is in contrast to extremely obsolete storage technologies like relays, mechanical counters, or delay lines, which can only be se- quentially accessed. • We can both read from (i.e., non-destructively access) and write to RAM. Both of these processes are “fast” in microcontroller terms (i.e. nanoseconds or less). • RAM is usually volatile, which means it needs an active power source to preserve data. It is possible to make non- volatile RAM (often using battery back-ups) but these are expensive and only used in certain specialist applications. There are also two common subtypes of RAM: dynamic and static RAM (DRAM and SRAM, respectively). • In DRAM, data is usually retained as a stored charge on a ca- pacitor. Each time the data is read the capacitor loses a bit of charge, and even when the data is not being read the capaci- tor will gradually “leak charge”. Consequently, DRAM must be periodically refreshed to preserve data quality. The major advantage of DRAM is that it is relatively cheap. • The RAM listed in the specs for your laptop/tablet/phone/s- mart watch/poptart is a type of DRAM. • in SRAM, data is usually retained in a flip flop. These always preserve data quality, hence they are called “static”. SRAM is also faster than DRAM. However each bit of SRAM is physi- cally larger and more complex, consequently SRAM is signifi- cantly more expensive. • The cache size listed in the specs for the processor of your lap- top/table/phone/smart watch/doorbell is usually a type of SRAM. The name “read-only” is a bit archaic, since most modern ROM can, technically, be rewritten. • To further complicate matters, ROM is also “random-access”. • Reading data from ROM is “fast” in microcontroller terms (i.e., nanoseconds or less), however writing to ROM is typ- ically “slow” in microcontroller terms (milliseconds or sec- onds), and sometimes impossible while the microcontroller is operating — the ROM can only be rewritten with an external tool use to program the microcontroller. For our purposes, ROM only stores programs, not data: The set of instructions telling the CPU what to do are stored in ROM, and the data telling the CPU what to do it with is stored in RAM. There are a few different subtypes of ROM that are still relevant. • Mask-programmable ROM (PROM) can be written to only once, usually at the factory. After that the programs are essen- tially hard-coded — no updates are possible! • Erasable PROM (EPROM) can be erased and rewritten, but it is an invasive procedure: often the EPROM needs to be re- moved and bathed in uv light. EPROM is a pretty old stan- dard, and is not used much any more. • Electrically-erasable PROM (EEPROM) can be rewritten with only electrical signals. They are cleverly constructed from transistors with floating gates. EEPROMs allow individual memory cells to be selectively erased and rewritten. Currently EEPROMs have relatively small amounts of storage, and are relatively expensive, but have very long lifetimes. • Most of you are familiar with flash, a form of EEPROM. Flash is cheaper than EEPROM because it compromises by only erasing and rewriting data in blocks considerably larger than one memory cell (512 or more), and having a shorter lifetime than EEPROM. Most modern microcontrollers have at least one ROM (usually Flash) and at least one static RAM on the chip. Additional mem- ory of any type can be added as a peripheral. Philosophy of Memory Spaces Memory is classified by the number of bits of storage, and by how these bits are organized. Typically a memory chip is described by the mnemonic NU×n, where N and n are integers, and U is one of the prefixes for memory sizes (typically K, M, or G). • The first part, NU, describes the number of addresses on the chip. • The second part, n, describes the width of each address in bits. Therefore, knowing your system has 64 Kb of RAM is fine for im- pressing your friends, but as microcontroller engineers it is more useful to know if the RAM is organized as 8 K × 8 (8 K memory cells, each 8 bits wide) or 4 K × 16 (4 K memory cells, each 16 bits wide). Memory organization can further be broken down by the number of address spaces available. • In Harvard architecture there are separate address spaces (i.e., different address buses) for program storage (typically on ROM, as discussed above) and data storage (typically on RAM, as discussed above). • In a von Neumann architecture the same address space is used for program and data storage. Many microprocessors, have von Neumann architecture. 6 14 DC [n] DD [n] D[n] AD [n] A[n] Figure 3: Equal amounts of memory for pro- gram storage (“code”) and data storage (“data”): Harvard architecture on the left, von Neumann architecture on the right. The Harvard architecture requires fewer addresses, but twice as many buses. 6 In previous iterations of this course I claimed that the ARM®Cortex-A9 used in this course was von Neumann. However upon reading the technical specifications it is actually modified- Harvard architecture, in which instructions and data can be fetched simultaneously, but program code can be moved through memory the same as data — so it is kind of a hybrid of both. • Programming a microprocessor with Harvard architecture requires a different read/load opcode for instructions and data. • A small advantage of a Harvard architecture system is that it removes the ambiguity on whether a number stored in mem- ory is an instruction or data (however, data and addresses can still both be stored in the same memory space). • A larger advantage of a Harvard architecture system is that it can fetch instructions and data simultaneously, making it faster than a von Neumann architecture system. • A von Neumann architecture system benefits from a simpler instruction set, and simpler (cheaper) hardware. Many microcontrollers prioritize low cost and simplicity, and use von Neumann architecture, but several others do not. 7 Most mod- ern personal computers use a modified Harvard architecture, which preserves separate buses for instructions and data, but have a fuzzier definition of “instruction” and “data” — information that is techni- cally data (i.e. not an opcode) can be stored in instruction space if it is not modified by the program (for example, the constant π). But enough about that. The relationship between memory and other peripherals — particu- larly input/output peripherals, is also important. • In a I/O mapped microprocessor, I/O peripherals occupy a separate, special memory space. This again necessitates sepa- rate read/load opcodes for working with peripherals, as they 7 Notably, the ARM®Cortex-M3 and -M4 dis- cussed in the textbook are Harvard architecture. are physically connected to the CPU by different hardware (buses) than the memory. • In a memory mapped microprocessor, all the peripherals occupy the same memory space as the data (and the instruc- tions, in a von Neumann architecture). The ARM®Cortex-A9 microprocessor uses memory mapping. A major advantage of memory mapped devices is in simplicity: read- ing from an input device (a bank of switches, for example, or some- thing more complicated) or writing to an output device (a set of LEDs, for example, or something more complicated) is done exactly the same way, and with exactly the same opcodes, as reading from or writ 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com