CS计算机代考程序代写 scheme chain flex computer architecture 07 Input Output

07 Input Output

William Stallings
Computer Organization
and Architecture
9th Edition

+

1
Lecture slides prepared for “Computer Organization and Architecture”, 9/e, by William Stallings, Chapter 7 “Input/Output”.

Chapter 4

Input/Output

+

In addition to the processor and a set of memory modules, the third key element
of a computer system is a set of I/O modules. Each module interfaces to the system
bus or central switch and controls one or more peripheral devices. An I/O module
is not simply a set of mechanical connectors that wire a device into the system bus.
Rather, the I/O module contains logic for performing a communication function
between the peripheral and the bus.

The reader may wonder why one does not connect peripherals directly to the
system bus. The reasons are as follows:

• There are a wide variety of peripherals with various methods of operation. It
would be impractical to incorporate the necessary logic within the processor
to control a range of devices.

• The data transfer rate of peripherals is often much slower than that of the
memory or processor. Thus, it is impractical to use the high-speed system bus
to communicate directly with a peripheral.

• On the other hand, the data transfer rate of some peripherals is faster than
that of the memory or processor. Again, the mismatch would lead to inefficiencies
if not managed properly.

• Peripherals often use different data formats and word lengths than the
computer to which they are attached.
2

Generic
Model
of an I/O Module

+

3
Thus, an I/O module is required. This module has two major functions
(Figure 7.1):

• Interface to the processor and memory via the system bus or central switch

• Interface to one or more peripheral devices by tailored data links

We begin this chapter with a brief discussion of external devices, followed by
an overview of the structure and function of an I/O module. Then we look at the
various ways in which the I/O function can be performed in cooperation with the
processor and memory: the internal I/O interface. Finally, we examine the external
I/O interface, between the I/O module and the outside world.

External Devices
Provide a means of exchanging data between the external environment and the computer
Attach to the computer by a link to an I/O module
The link is used to exchange control, status, and data between the I/O module and the external device
peripheral device
An external device connected to an I/O module
Three categories:
Human readable
Suitable for communicating with the computer user
Video display terminals (VDTs), printers
Machine readable
Suitable for communicating with equipment
Magnetic disk and tape systems, sensors and actuators
Communication
Suitable for communicating with remote devices such as a terminal, a machine readable device, or another computer

+

4
I/O operations are accomplished through a wide assortment of external devices
that provide a means of exchanging data between the external environment
and the computer. An external device attaches to the computer by a link to
an I/O module (Figure 7.1). The link is used to exchange control, status, and
data between the I/O module and the external device. An external device connected
to an I/O module is often referred to as a peripheral device or, simply, a
peripheral.

We can broadly classify external devices into three categories:

• Human readable: Suitable for communicating with the computer user

• Machine readable: Suitable for communicating with equipment

• Communication: Suitable for communicating with remote devices

Examples of human-readable devices are video display terminals (VDTs) and
printers. Examples of machine-readable devices are magnetic disk and tape systems,
and sensors and actuators, such as are used in a robotics application. Note
that we are viewing disk and tape systems as I/O devices in this chapter, whereas
in Chapter 6 we viewed them as memory devices. From a functional point of view,
these devices are part of the memory hierarchy, and their use is appropriately discussed
in Chapter 6. From a structural point of view, these devices are controlled by
I/O modules and are hence to be considered in this chapter.

Communication devices allow a computer to exchange data with a remote
device, which may be a human-readable device, such as a terminal, a machine readable
device, or even another computer.

External
Device
Block
Diagram

+

In very general terms, the nature of an external device is indicated in
Figure 7.2. The interface to the I/O module is in the form of control, data, and status
signals. Control signals determine the function that the device will perform, such as
send data to the I/O module (INPUT or READ), accept data from the I/O module
(OUTPUT or WRITE), report status, or perform some control function particular
to the device (e.g., position a disk head). Data are in the form of a set of bits to
be sent to or received from the I/O module. Status signals indicate the state of the
device. Examples are READY/NOT-READY to show whether the device is ready
for data transfer.

Control logic associated with the device controls the device’s operation in
response to direction from the I/O module. The transducer converts data from electrical
to other forms of energy during output and from other forms to electrical during
input. Typically, a buffer is associated with the transducer to temporarily hold
data being transferred between the I/O module and the external environment; a
buffer size of 8 to 16 bits is common.
5

Keyboard/Monitor
Basic unit of exchange is the character
Associated with each character is a code
Each character in this code is represented by a unique 7-bit binary code
128 different characters can be represented
Characters are of two types:
Printable
Alphabetic, numeric, and special characters that can be printed on paper or displayed on a screen
Control
Have to do with controlling the printing or displaying of characters
Example is carriage return
Other control characters are concerned with communications procedures
When the user depresses a key it generates an electronic signal that is interpreted by the transducer in the keyboard and translated into the bit pattern of the corresponding IRA code
This bit pattern is transmitted to the I/O module in the computer
On output, IRA code characters are transmitted to an external device from the I/O module
The transducer interprets the code and sends the required electronic signals to the output device either to display the indicated character or perform the requested control function
International Reference Alphabet (IRA)
Keyboard Codes
Most common means of computer/user interaction
User provides input through the keyboard
The monitor displays data provided by the computer

+

6
The most common means of computer/user interaction is a keyboard/monitor
arrangement. The user provides input through the keyboard. This input is then
transmitted to the computer and may also be displayed on the monitor. In addition,
the monitor displays data provided by the computer.

The basic unit of exchange is the character. Associated with each character
is a code, typically 7 or 8 bits in length. The most commonly used text code is the
International Reference Alphabet (IRA). Each character in this code is represented
by a unique 7-bit binary code; thus, 128 different characters can be represented.
Characters are of two types: printable and control. Printable characters are
the alphabetic, numeric, and special characters that can be printed on paper or displayed
on a screen. Some of the control characters have to do with controlling the
printing or displaying of characters; an example is carriage return. Other control
characters are concerned with communications procedures. See Appendix F for
details.

For keyboard input, when the user depresses a key, this generates an
electronic signal that is interpreted by the transducer in the keyboard and
translated into the bit pattern of the corresponding IRA code. This bit pattern
is then transmitted to the I/O module in the computer. At the computer, the
text can be stored in the same IRA code. On output, IRA code characters are
transmitted to an external device from the I/O module. The transducer at the
device interprets this code and sends the required electronic signals to the output
device either to display the indicated character or perform the requested
control function.

I/O Modules
Module Function

7
The major functions or requirements for an I/O module fall into the following
categories:

• Control and timing

• Processor communication

• Device communication

• Data buffering

• Error detection

During any period of time, the processor may communicate with one or more
external devices in unpredictable patterns, depending on the program’s need for I/O.
The internal resources, such as main memory and the system bus, must be shared
among a number of activities, including data I/O. Thus, the I/O function includes a
control and timing requirement, to coordinate the flow of traffic between internal
resources and external devices. For example, the control of the transfer of data from
an external device to the processor might involve the following sequence of steps:

1. The processor interrogates the I/O module to check the status of the attached
device.

2. The I/O module returns the device status.

3. If the device is operational and ready to transmit, the processor requests the
transfer of data, by means of a command to the I/O module.

4. The I/O module obtains a unit of data (e.g., 8 or 16 bits) from the external device.

5. The data are transferred from the I/O module to the processor.

If the system employs a bus, then each of the interactions between the processor
and the I/O module involves one or more bus arbitrations.

The preceding simplified scenario also illustrates that the I/O module must
communicate with the processor and with the external device. Processor communication
involves the following:

• Command decoding: The I/O module accepts commands from the processor,
typically sent as signals on the control bus. For example, an I/O module for a
disk drive might accept the following commands: READ SECTOR, WRITE
SECTOR, SEEK track number, and SCAN record ID. The latter two commands
each include a parameter that is sent on the data bus.

• Data: Data are exchanged between the processor and the I/O module over the
data bus.

• Status reporting: Because peripherals are so slow, it is important to know the
status of the I/O module. For example, if an I/O module is asked to send data
to the processor (read), it may not be ready to do so because it is still working
on the previous I/O command. This fact can be reported with a status signal.
Common status signals are BUSY and READY. There may also be signals to
report various error conditions.

• Address recognition: Just as each word of memory has an address, so does
each I/O device. Thus, an I/O module must recognize one unique address for
each peripheral it controls.

On the other side, the I/O module must be able to perform device communication.
This communication involves commands, status information, and data
(Figure 7.2).

An essential task of an I/O module is data buffering. The need for this function
is apparent from Figure 2.11. Whereas the transfer rate into and out of main
memory or the processor is quite high, the rate is orders of magnitude lower for
many peripheral devices and covers a wide range. Data coming from main memory
are sent to an I/O module in a rapid burst. The data are buffered in the I/O module
and then sent to the peripheral device at its data rate. In the opposite direction, data
are buffered so as not to tie up the memory in a slow transfer operation. Thus, the
I/O module must be able to operate at both device and memory speeds. Similarly, if
the I/O device operates at a rate higher than the memory access rate, then the I/O
module performs the needed buffering operation.

Finally, an I/O module is often responsible for error detection and for subsequently
reporting errors to the processor. One class of errors includes mechanical
and electrical malfunctions reported by the device (e.g., paper jam, bad disk track).
Another class consists of unintentional changes to the bit pattern as it is transmitted
from device to I/O module. Some form of error-detecting code is often used
to detect transmission errors. A simple example is the use of a parity bit on each
character of data. For example, the IRA character code occupies 7 bits of a byte.
The eighth bit is set so that the total number of 1s in the byte is even (even parity)
or odd (odd parity). When a byte is received, the I/O module checks the parity to
determine whether an error has occurred.

The major functions for an I/O module fall into the following categories:

Control and timing

Coordinates the flow of traffic between internal resources and external devices

Processor communication

Involves command decoding, data, status reporting, address recognition

Device communication

Involves commands, status information, and data

Data buffering

Performs the needed buffering operation to balance device and memory speeds

Error detection

Detects and reports transmission errors

I/O Module Structure

I/O modules vary considerably in complexity and the number of external devices
that they control. We will attempt only a very general description here. (One specific
device, the Intel 82C55A, is described in Section 7.4.) Figure 7.3 provides a general
block diagram of an I/O module. The module connects to the rest of the computer
through a set of signal lines (e.g., system bus lines). Data transferred to and from the
module are buffered in one or more data registers. There may also be one or more
status registers that provide current status information. A status register may also
function as a control register, to accept detailed control information from the processor.
The logic within the module interacts with the processor via a set of control
lines. The processor uses the control lines to issue commands to the I/O module.
Some of the control lines may be used by the I/O module (e.g., for arbitration and
status signals). The module must also be able to recognize and generate addresses
associated with the devices it controls. Each I/O module has a unique address or, if
it controls more than one external device, a unique set of addresses. Finally, the I/O
module contains logic specific to the interface with each device that it controls.

An I/O module functions to allow the processor to view a wide range of devices
in a simple-minded way. There is a spectrum of capabilities that may be provided.
The I/O module may hide the details of timing, formats, and the electromechanics
of an external device so that the processor can function in terms of simple read and
write commands, and possibly open and close file commands. In its simplest form,
the I/O module may still leave much of the work of controlling a device (e.g., rewind
a tape) visible to the processor.

An I/O module that takes on most of the detailed processing burden, presenting
a high-level interface to the processor, is usually referred to as an I/O channel or
I/O processor. An I/O module that is quite primitive and requires detailed control
is usually referred to as an I/O controller or device controller. I/O controllers are
commonly seen on microcomputers, whereas I/O channels are used on mainframes.

In what follows, we will use the generic term I/O module when no confusion
results and will use more specific terms where necessary.
8

Programmed I/O
Three techniques are possible for I/O operations:
Programmed I/O
Data are exchanged between the processor and the I/O module
Processor executes a program that gives it direct control of the I/O operation
When the processor issues a command it must wait until the I/O operation is complete
If the processor is faster than the I/O module this is wasteful of processor time
Interrupt-driven I/O
Processor issues an I/O command, continues to execute other instructions, and is interrupted by the I/O module when the latter has completed its work
Direct memory access (DMA)
The I/O module and main memory exchange data directly without processor involvement

+

9
Three techniques are possible for I/O operations. With programmed I/O, data are
exchanged between the processor and the I/O module. The processor executes a program
that gives it direct control of the I/O operation, including sensing device status,
sending a read or write command, and transferring the data. When the processor
issues a command to the I/O module, it must wait until the I/O operation is complete.
If the processor is faster than the I/O module, this is wasteful of processor time.
With interrupt-driven I/O, the processor issues an I/O command, continues to execute
other instructions, and is interrupted by the I/O module when the latter has completed
its work. With both programmed and interrupt I/O, the processor is responsible for
extracting data from main memory for output and storing data in main memory for
input. The alternative is known as direct memory access (DMA). In this mode, the I/O
module and main memory exchange data directly, without processor involvement.

I/O Techniques

+

Table 7.1 indicates the relationship among these three techniques. In this section,
we explore programmed I/O. Interrupt I/O and DMA are explored in the following
two sections, respectively.

10

I/O Commands
There are four types of I/O commands that an I/O module may receive when it is addressed by a processor:
Control
– used to activate a peripheral and tell it what to do
Test
– used to test various status conditions associated with an I/O module and its peripherals
Read
– causes the I/O module to obtain an item of data from the peripheral and place it in an internal buffer
Write
– causes the I/O module to take an item of data from the data bus and subsequently transmit that data item to the peripheral

+

11
To execute an I/O-related instruction, the processor issues an address, specifying the
particular I/O module and external device, and an I/O command. There are four types
of I/O commands that an I/O module may receive when it is addressed by a processor:

• Control: Used to activate a peripheral and tell it what to do. For example, a
magnetic-tape unit may be instructed to rewind or to move forward one record.
These commands are tailored to the particular type of peripheral device.

• Test: Used to test various status conditions associated with an I/O module and
its peripherals. The processor will want to know that the peripheral of interest
is powered on and available for use. It will also want to know if the most
recent I/O operation is completed and if any errors occurred.

• Read: Causes the I/O module to obtain an item of data from the peripheral
and place it in an internal buffer (depicted as a data register in Figure 7.3). The
processor can then obtain the data item by requesting that the I/O module
place it on the data bus.

• Write: Causes the I/O module to take an item of data (byte or word) from the
data bus and subsequently transmit that data item to the peripheral.

Three
Techniques
for Input of a
Block of Data

Figure 7.4a gives an example of the use of programmed I/O to read in a block of
data from a peripheral device (e.g., a record from tape) into memory. Data are read
in one word (e.g., 16 bits) at a time. For each word that is read in, the processor must
remain in a status-checking cycle until it determines that the word is available in the
I/O module’s data register. This flowchart highlights the main disadvantage of this
technique: it is a time-consuming process that keeps the processor busy needlessly.
12

I/O Instructions

13
With programmed I/O, there is a close correspondence between the I/O-related
instructions that the processor fetches from memory and the I/O commands that the
processor issues to an I/O module to execute the instructions. That is, the instructions
are easily mapped into I/O commands, and there is often a simple one-
to-one relationship. The form of the instruction depends on the way in which external
devices are addressed.

Typically, there will be many I/O devices connected through I/O modules to
the system. Each device is given a unique identifier or address. When the processor
issues an I/O command, the command contains the address of the desired device.
Thus, each I/O module must interpret the address lines to determine if the command
is for itself.

When the processor, main memory, and I/O share a common bus, two modes
of addressing are possible: memory mapped and isolated. With memory-mapped
I/O, there is a single address space for memory locations and I/O devices. The processor
treats the status and data registers of I/O modules as memory locations and
uses the same machine instructions to access both memory and I/O devices. So, for
example, with 10 address lines, a combined total of 210 = 1024 memory locations
and I/O addresses can be supported, in any combination.

With memory-mapped I/O, a single read line and a single write line are needed
on the bus. Alternatively, the bus may be equipped with memory read and write
plus input and output command lines. Now, the command line specifies whether the
address refers to a memory location or an I/O device. The full range of addresses
may be available for both. Again, with 10 address lines, the system may now support
both 1024 memory locations and 1024 I/O addresses. Because the address space for
I/O is isolated from that for memory, this is referred to as isolated I/O.

With programmed I/O there is a close correspondence between the I/O-related instructions that the processor fetches from memory and the I/O commands that the processor issues to an I/O module to execute the instructions

The form of the instruction depends on the way in which external devices are addressed

Each I/O device connected through I/O modules is given a unique identifier or address

When the processor issues an I/O command, the command contains the address of the desired device

Thus each I/O module must interpret the address lines to determine if the command is for itself

Memory-mapped I/O

There is a single address space for memory locations and I/O devices

A single read line and a single write line are needed on the bus

I/O Mapping Summary
Memory mapped I/O
Devices and memory share an address space
I/O looks just like memory read/write
No special commands for I/O
Large selection of memory access commands available
Isolated I/O
Separate address spaces
Need I/O or memory select lines
Special commands for I/O
Limited set

+

14
I/O mapping summary.

Memory
Mapped
I/O
Isolated
I/O

+

Figure 7.5 contrasts these two programmed I/O techniques. Figure 7.5a shows
how the interface for a simple input device such as a terminal keyboard might appear
to a programmer using memory-mapped I/O. Assume a 10-bit address, with a 512-
bit memory (locations 0–511) and up to 512 I/O addresses (locations 512–1023).
Two addresses are dedicated to keyboard input from a particular terminal. Address
516 refers to the data register and address 517 refers to the status register, which
also functions as a control register for receiving processor commands. The program
shown will read 1 byte of data from the keyboard into an accumulator register in the
processor. Note that the processor loops until the data byte is available.

With isolated I/O (Figure 7.5b), the I/O ports are accessible only by special
I/O commands, which activate the I/O command lines on the bus.

For most types of processors, there is a relatively large set of different instructions
for referencing memory. If isolated I/O is used, there are only a few I/O
instructions. Thus, an advantage of memory-mapped I/O is that this large repertoire
of instructions can be used, allowing more efficient programming. A disadvantage is
that valuable memory address space is used up. Both memory-mapped and isolated
I/O are in common use.
15

Interrupt-Driven I/O

16
The problem with programmed I/O is that the processor has to wait a long time
for the I/O module of concern to be ready for either reception or transmission of
data. The processor, while waiting, must repeatedly interrogate the status of the I/O
module. As a result, the level of the performance of the entire system is severely
degraded.

An alternative is for the processor to issue an I/O command to a module and
then go on to do some other useful work. The I/O module will then interrupt the
processor to request service when it is ready to exchange data with the processor.
The processor then executes the data transfer, as before, and then resumes its
former processing.

Let us consider how this works, first from the point of view of the I/O module.
For input, the I/O module receives a READ command from the processor. The I/O
module then proceeds to read data in from an associated peripheral. Once the data
are in the module’s data register, the module signals an interrupt to the processor
over a control line. The module then waits until its data are requested by the processor.
When the request is made, the module places its data on the data bus and is
then ready for another I/O operation.

From the processor’s point of view, the action for input is as follows. The processor
issues a READ command. It then goes off and does something else (e.g., the
processor may be working on several different programs at the same time). At the
end of each instruction cycle, the processor checks for interrupts (Figure 3.9). When
the interrupt from the I/O module occurs, the processor saves the context (e.g., program
counter and processor registers) of the current program and processes the
interrupt. In this case, the processor reads the word of data from the I/O module
and stores it in memory. It then restores the context of the program it was working
on (or some other program) and resumes execution.

Figure 7.4b shows the use of interrupt I/O for reading in a block of data.
Compare this with Figure 7.4a. Interrupt I/O is more efficient than programmed I/O
because it eliminates needless waiting. However, interrupt I/O still consumes a lot of
processor time, because every word of data that goes from memory to I/O module
or from I/O module to memory must pass through the processor.

The problem with programmed I/O is that the processor has to wait a long time for the I/O module to be ready for either reception or transmission of data

An alternative is for the processor to issue an I/O command to a module and then go on to do some other useful work

The I/O module will then interrupt the processor to request service when it is ready to exchange data with the processor

The processor executes the data transfer and resumes its former processing

Simple Interrupt Processing

+

17
Let us consider the role of the processor in interrupt-driven I/O in more detail.
The occurrence of an interrupt triggers a number of events, both in the processor
hardware and in software. Figure 7.6 shows a typical sequence. When an I/O device
completes an I/O operation, the following sequence of hardware events occurs:

1. The device issues an interrupt signal to the processor.

2. The processor finishes execution of the current instruction before responding
to the interrupt, as indicated in Figure 3.9.

3. The processor tests for an interrupt, determines that there is one, and sends an
acknowledgment signal to the device that issued the interrupt. The acknowledgment
allows the device to remove its interrupt signal.

4. The processor now needs to prepare to transfer control to the interrupt routine.
To begin, it needs to save information needed to resume the current program at
the point of interrupt. The minimum information required is (a) the status of the
processor, which is contained in a register called the program status word (PSW),
and (b) the location of the next instruction to be executed, which is contained in
the program counter. These can be pushed onto the system control stack.

5. The processor now loads the program counter with the entry location of the
interrupt-handling program that will respond to this interrupt. Depending on
the computer architecture and operating system design, there may be a single
program; one program for each type of interrupt; or one program for each
device and each type of interrupt. If there is more than one interrupt-handling
routine, the processor must determine which one to invoke. This information
may have been included in the original interrupt signal, or the processor may
have to issue a request to the device that issued the interrupt to get a response
that contains the needed information.

Changes
in Memory
and Registers
for an
Interrupt

+

Once the program counter has been loaded, the processor proceeds to the
next instruction cycle, which begins with an instruction fetch. Because the instruction
fetch is determined by the contents of the program counter, the result is that
control is transferred to the interrupt-handler program. The execution of this program
results in the following operations:

6. At this point, the program counter and PSW relating to the interrupted
program have been saved on the system stack. However, there is other information
that is considered part of the “state” of the executing program. In particular,
the contents of the processor registers need to be saved, because these
registers may be used by the interrupt handler. So, all of these values, plus any
other state information, need to be saved. Typically, the interrupt handler will
begin by saving the contents of all registers on the stack. Figure 7.7a shows a
simple example. In this case, a user program is interrupted after the instruction
at location N. The contents of all of the registers plus the address of the next
instruction (N + 1) are pushed onto the stack. The stack pointer is updated to
point to the new top of stack, and the program counter is updated to point to
the beginning of the interrupt service routine.

7. The interrupt handler next processes the interrupt. This includes an examination
of status information relating to the I/O operation or other event that
caused an interrupt. It may also involve sending additional commands or
acknowledgments to the I/O device.

8. When interrupt processing is complete, the saved register values are retrieved
from the stack and restored to the registers (e.g., see Figure 7.7b).

9. The final act is to restore the PSW and program counter values from the stack.
As a result, the next instruction to be executed will be from the previously
interrupted program.

Note that it is important to save all the state information about the interrupted
program for later resumption. This is because the interrupt is not a routine called
from the program. Rather, the interrupt can occur at any time and therefore at any
point in the execution of a user program. Its occurrence is unpredictable. Indeed, as
we will see in the next chapter, the two programs may not have anything in common
and may belong to two different users.

18

Design Issues

19
Two design issues arise in implementing interrupt I/O. First, because there will
almost invariably be multiple I/O modules, how does the processor determine which
device issued the interrupt? And second, if multiple interrupts have occurred, how
does the processor decide which one to process?

Two design issues arise in implementing interrupt I/O:

Because there will be multiple I/O modules how does the processor determine which device issued the interrupt?

If multiple interrupts have occurred how does the processor decide which one to process?

Device Identification
Multiple interrupt lines
Between the processor and the I/O modules
Most straightforward approach to the problem
Consequently even if multiple lines are used, it is likely that each line will have multiple I/O modules attached to it
Software poll
When processor detects an interrupt it branches to an interrupt-service routine whose job is to poll each I/O module to determine which module caused the interrupt
Time consuming
Daisy chain (hardware poll, vectored)
The interrupt acknowledge line is daisy chained through the modules
Vector – address of the I/O module or some other unique identifier
Vectored interrupt – processor uses the vector as a pointer to the appropriate device-service routine, avoiding the need to execute a general interrupt-service routine first
Bus arbitration (vectored)
An I/O module must first gain control of the bus before it can raise the interrupt request line
When the processor detects the interrupt it responds on the interrupt acknowledge line
Then the requesting module places its vector on the data lines
Four general categories of techniques are in common use:

+

20
Let us consider device identification first. Four general categories of techniques
are in common use:

• Multiple interrupt lines

• Software poll

• Daisy chain (hardware poll, vectored)

• Bus arbitration (vectored)

The most straightforward approach to the problem is to provide multiple interrupt
lines between the processor and the I/O modules. However, it is impractical to
dedicate more than a few bus lines or processor pins to interrupt lines. Consequently,
even if multiple lines are used, it is likely that each line will have multiple I/O modules
attached to it. Thus, one of the other three techniques must be used on each line.

One alternative is the software poll. When the processor detects an interrupt,
it branches to an interrupt-service routine whose job it is to poll each I/O module
to determine which module caused the interrupt. The poll could be in the form of a
separate command line (e.g., TESTI/O). In this case, the processor raises TESTI/O
and places the address of a particular I/O module on the address lines. The I/O module
responds positively if it sets the interrupt. Alternatively, each I/O module could
contain an addressable status register. The processor then reads the status register
of each I/O module to identify the interrupting module. Once the correct module is
identified, the processor branches to a device-service routine specific to that device.

The disadvantage of the software poll is that it is time consuming. A more efficient
technique is to use a daisy chain, which provides, in effect, a hardware poll. An example
of a daisy-chain configuration is shown in Figure 3.30. For interrupts, all I/O modules
share a common interrupt request line. The interrupt acknowledge line is daisy chained
through the modules. When the processor senses an interrupt, it sends out an interrupt
acknowledge. This signal propagates through a series of I/O modules until it gets to a
requesting module. The requesting module typically responds by placing a word on
the data lines. This word is referred to as a vector and is either the address of the I/O
module or some other unique identifier. In either case, the processor uses the vector as
a pointer to the appropriate device-service routine. This avoids the need to execute a
general interrupt-service routine first. This technique is called a vectored interrupt.
There is another technique that makes use of vectored interrupts, and that is
bus arbitration. With bus arbitration, an I/O module must first gain control of the
bus before it can raise the interrupt request line. Thus, only one module can raise the
line at a time. When the processor detects the interrupt, it responds on the interrupt
acknowledge line. The requesting module then places its vector on the data lines.

Intel
82C59A
Interrupt
Controller

+

21
The Intel 80386 provides a single Interrupt Request (INTR) and a single Interrupt
Acknowledge (INTA) line. To allow the 80386 to handle a variety of devices and priority
structures, it is usually configured with an external interrupt arbiter, the 82C59A.
External devices are connected to the 82C59A, which in turn connects to the 80386.

Figure 7.8 shows the use of the 82C59A to connect multiple I/O modules for the
80386. A single 82C59A can handle up to eight modules. If control for more than eight
modules is required, a cascade arrangement can be used to handle up to 64 modules.

The 82C59A’s sole responsibility is the management of interrupts. It accepts
interrupt requests from attached modules, determines which interrupt has the highest
priority, and then signals the processor by raising the INTR line. The processor
acknowledges via the INTA line. This prompts the 82C59A to place the appropriate
vector information on the data bus. The processor can then proceed to process the
interrupt and to communicate directly with the I/O module to read or write data.

The 82C59A is programmable. The 80386 determines the priority scheme to
be used by setting a control word in the 82C59A. The following interrupt modes are
possible:

• Fully nested: The interrupt requests are ordered in priority from 0 (IR0)
through 7 (IR7).

Rotating: In some applications a number of interrupting devices are of equal
priority. In this mode a device, after being serviced, receives the lowest priority
in the group.

• Special mask: This allows the processor to inhibit interrupts from certain devices.

Intel 82C55A
Programmable Peripheral Interface

+

As an example of an I/O module used for programmed I/O and interrupt-driven I/O,
we consider the Intel 82C55A Programmable Peripheral Interface. The 82C55A is
a single-chip, general-purpose I/O module designed for use with the Intel 80386
processor. Figure 7.9 shows a general block diagram plus the pin assignment for the
40-pin package in which it is housed.

The right side of the block diagram is the external interface of the 82C55A.
The 24 I/O lines are programmable by the 80386 by means of the control register.
The 80386 can set the value of the control register to specify a variety of operating
modes and configurations. The 24 lines are divided into three 8-bit groups (A, B, C).
Each group can function as an 8-bit I/O port. In addition, group C is subdivided into
4-bit groups (CA and CB), which may be used in conjunction with the A and B I/O
ports. Configured in this manner, group C lines carry control and status signals.

The left side of the block diagram is the internal interface to the 80386 bus. It
includes an 8-bit bidirectional data bus (D0 through D7), used to transfer data to
and from the I/O ports and to transfer control information to the control register.
The two address lines specify one of the three I/O ports or the control register.
A transfer takes place when the CHIP SELECT line is enabled together with either
the READ or WRITE line. The RESET line is used to initialize the module.

The control register is loaded by the processor to control the mode of operation
and to define signals, if any. In Mode 0 operation, the three groups of eight external
lines function as three 8-bit I/O ports. Each port can be designated as input or
output. Otherwise, groups A and B function as I/O ports, and the lines of group C
serve as control lines for A and B. The control signals serve two principal purposes:
“handshaking” and interrupt request. Handshaking is a simple timing mechanism.
One control line is used by the sender as a DATA READY line, to indicate when
the data are present on the I/O data lines. Another line is used by the receiver as an
ACKNOWLEDGE, indicating that the data have been read and the data lines may
be cleared. Another line may be designated as an INTERRUPT REQUEST line and
tied back to the system bus.
22

Keyboard/Display Interfaces to 82C55A

+

Because the 82C55A is programmable via the control register, it can be used to
control a variety of simple peripheral devices. Figure 7.10 illustrates its use to control
a keyboard/display terminal. The keyboard provides 8 bits of input. Two of these
bits, SHIFT and CONTROL, have special meaning to the keyboard-handling program
executing in the processor. However, this interpretation is transparent to the
82C55A, which simply accepts the 8 bits of data and presents them on the system
data bus. Two handshaking control lines are provided for use with the keyboard.

The display is also linked by an 8-bit data port. Again, two of the bits have special
meanings that are transparent to the 82C55A. In addition to two handshaking
lines, two lines provide additional control functions.
23

Drawbacks of Programmed and Interrupt-Driven I/O
Both forms of I/O suffer from two inherent drawbacks:
The I/O transfer rate is limited by the speed with which the processor can test and service a device
The processor is tied up in managing an I/O transfer; a number of instructions must be executed for each I/O transfer

When large volumes of data are to be moved a more efficient technique is direct memory access (DMA)

+

24
Interrupt-driven I/O, though more efficient than simple programmed I/O, still
requires the active intervention of the processor to transfer data between memory
and an I/O module, and any data transfer must traverse a path through the processor.
Thus, both these forms of I/O suffer from two inherent drawbacks:

1. The I/O transfer rate is limited by the speed with which the processor can test
and service a device.

2. The processor is tied up in managing an I/O transfer; a number of instructions
must be executed for each I/O transfer (e.g., Figure 7.5).

There is somewhat of a trade-off between these two drawbacks. Consider the
transfer of a block of data. Using simple programmed I/O, the processor is dedicated
to the task of I/O and can move data at a rather high rate, at the cost of doing
nothing else. Interrupt I/O frees up the processor to some extent at the expense of
the I/O transfer rate. Nevertheless, both methods have an adverse impact on both
processor activity and I/O transfer rate.

When large volumes of data are to be moved, a more efficient technique is
required: direct memory access (DMA).

Typical DMA
Module Diagram

+

DMA involves an additional module on the system bus. The DMA module
(Figure 7.11) is capable of mimicking the processor and, indeed, of taking over
control of the system from the processor. It needs to do this to transfer data to
and from memory over the system bus. For this purpose, the DMA module must
use the bus only when the processor does not need it, or it must force the processor
to suspend operation temporarily. The latter technique is more common and is
referred to as cycle stealing, because the DMA module in effect steals a bus cycle.

When the processor wishes to read or write a block of data, it issues a
command to the DMA module, by sending to the DMA module the following
information:

• Whether a read or write is requested, using the read or write control line
between the processor and the DMA module

• The address of the I/O device involved, communicated on the data lines

The starting location in memory to read from or write to, communicated on
the data lines and stored by the DMA module in its address register

• The number of words to be read or written, again communicated via the data
lines and stored in the data count register

The processor then continues with other work. It has delegated this I/O operation
to the DMA module. The DMA module transfers the entire block of data, one
word at a time, directly to or from memory, without going through the processor.
When the transfer is complete, the DMA module sends an interrupt signal to the
processor. Thus, the processor is involved only at the beginning and end of the
transfer (Figure 7.4c).
25

DMA Operation
DMA
DMA

+

26
Figure 7.12 shows where in the instruction cycle the processor may be suspended.
In each case, the processor is suspended just before it needs to use the bus.
The DMA module then transfers one word and returns control to the processor.
Note that this is not an interrupt; the processor does not save a context and do
something else. Rather, the processor pauses for one bus cycle. The overall effect
is to cause the processor to execute more slowly. Nevertheless, for a multiple-word
I/O transfer, DMA is far more efficient than interrupt-driven or programmed I/O.

Alternative
DMA
Configurations

+

27
The DMA mechanism can be configured in a variety of ways. Some possibilities
are shown in Figure 7.13. In the first example, all modules share the same system
bus. The DMA module, acting as a surrogate processor, uses programmed I/O to
exchange data between memory and an I/O module through the DMA module. This
configuration, while it may be inexpensive, is clearly inefficient. As with processor controlled
programmed I/O, each transfer of a word consumes two bus cycles.

The number of required bus cycles can be cut substantially by integrating the
DMA and I/O functions. As Figure 7.13b indicates, this means that there is a path
between the DMA module and one or more I/O modules that does not include the
system bus. The DMA logic may actually be a part of an I/O module, or it may be a
separate module that controls one or more I/O modules. This concept can be taken
one step further by connecting I/O modules to the DMA module using an I/O bus
(Figure 7.13c). This reduces the number of I/O interfaces in the DMA module to one
and provides for an easily expandable configuration. In both of these cases (Figures
7.13b and c), the system bus that the DMA module shares with the processor and
memory is used by the DMA module only to exchange data with memory. The
exchange of data between the DMA and I/O modules takes place off the system
bus.

8237 DMA Usage of System Bus

The Intel 8237A DMA controller interfaces to the 80 x 86 family of processors and
to DRAM memory to provide a DMA capability. Figure 7.14 indicates the location
of the DMA module. When the DMA module needs to use the system buses (data,
address, and control) to transfer data, it sends a signal called HOLD to the processor.
The processor responds with the HLDA (hold acknowledge) signal, indicating
that the DMA module can use the buses. For example, if the DMA module is to
transfer a block of data from memory to disk, it will do the following:

1. The peripheral device (such as the disk controller) will request the service of
DMA by pulling DREQ (DMA request) high.

2. The DMA will put a high on its HRQ (hold request), signaling the CPU
through its HOLD pin that it needs to use the buses.

3.The CPU will finish the present bus cycle (not necessarily the present instruction)
and respond to the DMA request by putting high on its HDLA (hold
acknowledge), thus telling the 8237 DMA that it can go ahead and use the
buses to perform its task. HOLD must remain active high as long as DMA is
performing its task.

4. DMA will activate DACK (DMA acknowledge), which tells the peripheral
device that it will start to transfer the data.

5. DMA starts to transfer the data from memory to peripheral by putting the
address of the first byte of the block on the address bus and activating MEMR,
thereby reading the byte from memory into the data bus; it then activates IOW
to write it to the peripheral. Then DMA decrements the counter and increments
the address pointer and repeats this process until the count reaches zero
and the task is finished.

6. After the DMA has finished its job it will deactivate HRQ, signaling the CPU
that it can regain control over its buses.
28

Fly-By DMA Controller

+

While the DMA is using the buses to transfer data, the processor is idle.
Similarly, when the processor is using the bus, the DMA is idle. The 8237 DMA
is known as a fly-by DMA controller. This means that the data being moved from
one location to another does not pass through the DMA chip and is not stored in
the DMA chip. Therefore, the DMA can only transfer data between an I/O port
and a memory address, but not between two I/O ports or two memory locations.
However, as explained subsequently, the DMA chip can perform a memory-to-memory
transfer via a register.

The 8237 contains four DMA channels that can be programmed independently,
and any one of the channels may be active at any moment. These channels are
numbered 0, 1, 2, and 3.
29

Data does not pass through and is not stored in DMA chip

DMA only between I/O port and memory

Not between two I/O ports or two memory locations

Can do memory to memory via register

8237 contains four DMA channels

Programmed independently

Any one active

Numbered 0, 1, 2, and 3

Table 7.2
Intel
8237A Registers
E/D = enable/disable
TC = terminal count

30
The 8237 has a set of five control/command registers to program and control
DMA operation over one of its channels (Table 7.2):

• Command: The processor loads this register to control the operation of the
DMA. D0 enables a memory-to-memory transfer, in which channel 0 is used
to transfer a byte into an 8237 temporary register and channel 1 is used to
transfer the byte from the register to memory. When memory-to-memory is
enabled, D1 can be used to disable increment/decrement on channel 0 so that
a fixed value can be written into a block of memory. D2 enables or disables
DMA.

• Status: The processor reads this register to determine DMA status. Bits
D0–D3 are used to indicate if channels 0–3 have reached their TC (terminal
count). Bits D4–D7 are used by the processor to determine if any channel has
a DMA request pending.

• Mode: The processor sets this register to determine the mode of operation
of the DMA. Bits D0 and D1 are used to select a channel. The other bits
select various operation modes for the selected channel. Bits D2 and D3
determine if the transfer is from an I/O device to memory (write) or from
memory to I/O (read), or a verify operation. If D4 is set, then the memory
address register and the count register are reloaded with their original
values at the end of a DMA data transfer. Bits D6 and D7 determine the
way in which the 8237 is used. In single mode, a single byte of data is transferred.
Block and demand modes are used for a block transfer, with the
demand mode allowing for premature
ending of the transfer. Cascade
mode allows multiple 8237s to be cascaded to expand the number of channels
to more than 4.

• Single Mask: The processor sets this register. Bits D0 and D1 select the channel.
Bit D2 clears or sets the mask bit for that channel. It is through this register
that the DREQ input of a specific channel can be masked (disabled) or
unmasked (enabled). While the command register can be used to disable the
whole DMA chip, the single mask register allows the programmer to disable
or enable a specific channel.

• All Mask: This register is similar to the single mask register except that all four
channels can be masked or unmasked with one write operation.
In addition, the 8237A has eight data registers: one memory address register
and one count register for each channel. The processor sets these registers to indicate
the location of size of main memory to be affected by the transfers.

Evolution of the I/O Function
The CPU directly controls a peripheral device.
A controller or I/O module is added. The CPU uses programmed I/O without interrupts.
Same configuration as in step 2 is used, but now interrupts are employed. The CPU need not spend time waiting for an I/O operation to be performed, thus increasing efficiency.

The I/O module is given direct access to memory via DMA. It can now move a block of data to or from memory without involving the CPU, except at the beginning and end of the transfer.
The I/O module is enhanced to become a processor in its own right, with a specialized instruction set tailored for I/O
The I/O module has a local memory of its own and is, in fact, a computer in its own right. With this architecture a large set of I/O devices can be controlled with minimal CPU involvement.

+

31
As computer systems have evolved, there has been a pattern of increasing complexity
and sophistication of individual components. Nowhere is this more evident than
in the I/O function. We have already seen part of that evolution. The evolutionary
steps can be summarized as follows:

1. The CPU directly controls a peripheral device. This is seen in simple microprocessor-
controlled devices.

2. A controller or I/O module is added. The CPU uses programmed I/O without
interrupts. With this step, the CPU becomes somewhat divorced from the specific
details of external device interfaces.

3. The same configuration as in step 2 is used, but now interrupts are employed.
The CPU need not spend time waiting for an I/O operation to be performed,
thus increasing efficiency.

4. The I/O module is given direct access to memory via DMA. It can now move
a block of data to or from memory without involving the CPU, except at the
beginning and end of the transfer.

5. The I/O module is enhanced to become a processor in its own right, with a
specialized instruction set tailored for I/O. The CPU directs the I/O processor
to execute an I/O program in memory. The I/O processor fetches and executes
these instructions without CPU intervention. This allows the CPU to specify a
sequence of I/O activities and to be interrupted only when the entire sequence
has been performed.

6. The I/O module has a local memory of its own and is, in fact, a computer
in its own right. With this architecture, a large set of I/O devices can be
controlled, with minimal CPU involvement. A common use for such an
architecture has been to control communication with interactive terminals.
The I/O processor takes care of most of the tasks involved in controlling the
terminals.

As one proceeds along this evolutionary path, more and more of the I/O
function is performed without CPU involvement. The CPU is increasingly
relieved of I/O-related tasks, improving performance. With the last two steps
(5–6), a major change occurs with the introduction of the concept of an I/O module
capable of executing a program. For step 5, the I/O module is often referred
to as an I/O channel. For step 6, the term I/O processor is often used. However,
both terms are on occasion applied to both situations. In what follows, we will use
the term I/O channel.

I/O
Channel Architecture

+

The I/O channel represents an extension of the DMA concept. An I/O
channel has the ability to execute I/O instructions, which gives it complete control
over I/O operations. In a computer system with such devices, the CPU does
not execute I/O instructions. Such instructions are stored in main memory to
be executed by a special-purpose processor in the I/O channel itself. Thus, the
CPU initiates an I/O transfer by instructing the I/O channel to execute a program
in memory. The program will specify the device or devices, the area or
areas of memory for storage, priority, and actions to be taken for certain error
conditions. The I/O channel follows these instructions and controls the data
transfer.

Two types of I/O channels are common, as illustrated in Figure 7.15. A
selector channel controls multiple high-speed devices and, at any one time, is
dedicated to the transfer of data with one of those devices. Thus, the I/O channel
selects one device and effects the data transfer. Each device, or a small set of
devices, is handled by a controller, or I/O module, that is much like the I/O modules
we have been discussing. Thus, the I/O channel serves in place of the CPU in
controlling these I/O controllers. A multiplexor channel can handle I/O with multiple
devices at the same time. For low-speed devices, a byte multiplexor accepts or
transmits characters as fast as possible to multiple devices. For example, the resultant
character stream from three devices with different rates and individual streams
A1A2A3A4 …, B1B2B3B4 …, and C1C2C3C4 … might be A1B1C1A2C2A3B2C3A4,
and so on. For high-speed devices, a block multiplexor interleaves blocks of data
from several devices.
32

Parallel
and
Serial
I/O

+

The interface to a peripheral from an I/O module must be tailored to the nature
and operation of the peripheral. One major characteristic of the interface is whether
it is serial or parallel (Figure 7.16). In a parallel interface, there are multiple lines
connecting the I/O module and the peripheral, and multiple bits are transferred
simultaneously, just as all of the bits of a word are transferred simultaneously over
the data bus. In a serial interface, there is only one line used to transmit data, and
bits must be transmitted one at a time. A parallel interface has traditionally been
used for higher-speed peripherals, such as tape and disk, while the serial interface
has traditionally been used for printers and terminals. With a new generation of
high-speed serial interfaces, parallel interfaces are becoming much less common.

33

Point-to-Point and Multipoint Configurations

34
The connection between an I/O module in a computer system and external devices
can be either point-to-point or multipoint. A point-to-point interface provides a
dedicated line between the I/O module and the external device. On small systems
(PCs, workstations), typical point-to-point links include those to the keyboard,
printer, and external modem. A typical example of such an interface is the EIA-232
specification (see [STAL11] for a description).

Of increasing importance are multipoint external interfaces, used to support
external mass storage devices (disk and tape drives) and multimedia devices
(CD-ROMs, video, audio). These multipoint interfaces are in effect external buses,
and they exhibit the same type of logic as the buses discussed in Chapter 3. In this
section, we look at two key examples: Thunderbolt and InfiniBand.

Connection between an I/O module in a computer system and external devices can be either:

point-to-point

multiport

Point-to-point interface provides a dedicated line between the I/O module and the external device

On small systems (PCs, workstations) typical point-to-point links include those to the keyboard, printer, and external modem

Example is EIA-232 specification

Multipoint external interfaces are used to support external mass storage devices (disk and tape drives) and multimedia devices (CD-ROMs, video, audio)

Are in effect external buses

Thunderbolt
Provides up to 10 Gbps throughput in each direction and up to 10 Watts of power to connected peripherals
A Thunderbolt-compatible peripheral interface is considerably more complex than a simple USB device
Most recent and fastest peripheral connection technology to become available for general-purpose use
Developed by Intel with collaboration from Apple
The technology combines data, video, audio, and power into a single high-speed connection for peripherals such as hard drives, RAID arrays, video-capture boxes, and network interfaces

First generation products are primarily aimed at the professional-consumer market such as audiovisual editors who want to be able to move large volumes of data quickly between storage devices and laptops
Thunderbolt is a standard feature of Apple’s MacBook Pro laptop and iMac desktop computers

+

The most recent, and fastest, peripheral connection technology to become available for
general-purpose use is Thunderbolt, developed by Intel with collaboration from Apple.
One Thunderbolt cable can manage the work previously required of multiple cables.
The technology combines data, video, audio, and power into a single high-speed connection
for peripherals such as hard drives, RAID (Redundant Array of Independent
Disks) arrays, video-capture boxes, and network interfaces. It provides up to 10 Gbps
throughput in each direction and up to 10 Watts of power to connected peripherals.

Although the technology and its associated specifications have stabilized, the
introduction of Thunderbolt-equipped devices into the marketplace has, as of this writing,
only slowly begun to develop. This is because a Thunderbolt-compatible peripheral
interface is considerably more complex than that of a simple USB device. The
first generation of Thunderbolt products are primarily aimed at the prosumer (professional-
consumer) market such as audiovisual editors who want to be able to move
large volumes of data quickly between storage devices and laptops. As the technology
becomes cheaper, Thunderbolt will find mass consumer uses, such as enabling very
high-speed data backups and editing high-definition photos. Thunderbolt is already a
standard feature of Apple’s MacBook Pro laptop and iMac desktop computers.
35

Computer Configuration with Thunderbolt

+

Figure 7.17 shows a typical computer
configuration that makes use of Thunderbolt. From the point of view of I/O, the
central element in this configuration is the Thunderbolt controller, which is a
high-performance, cross-bar switch. Unlike bus-based I/O architectures, each
Thunderbolt port on a computer is capable of providing the full data transfer rate
of the link in both directions with no sharing of data transmission capacity between
ports or between upstream and downstream directions.

For communication internal to the computer, the Thunderbolt controller
includes one or more DisplayPort protocol adapter ports. DisplayPort is a digital display
interface standard now widely adopted for computer monitors, laptop displays,
and other graphics and video interfaces. The controller also includes a PCI Express
switch with up to four PCI Express protocol adapter ports for internal communication.

The Thunderbolt controller provides access to external devices through one or
more Thunderbolt connectors. Each connector can provide one or two full-
duplex channels, with each channel providing up to 10 Gbps in each direction. The same
connector can be used for electrical or optical cables. The electrical cable can extend
up to 3 meters, while the optical cable can extend into the tens of meters.

Users can connect high-performance peripherals to their PC over a cable,
daisy chaining one after another, up to a total of 7 devices, 1 or 2 of which can be
high-resolution DisplayPort displays (depending on the controller configuration in
the host PC). Because Thunderbolt technology delivers two full-bandwidth channels,
the user can realize high bandwidth not only on the first device attached but on
downstream devices as well.
36

Thunderbolt
Protocol
Layers

+

Figure 7.18 illustrates the
Thunderbolt protocol architecture. The cable and connector layer provides
transmission medium access. This layer specifies the physical and electrical
attributes of the connector port.

The Thunderbolt protocol physical layer is responsible for link maintenance
including hot-plug detection and data encoding to provide highly efficient data
transfer. The physical layer has been designed to introduce very minimal overhead
and provides full-duplex 10 Gbps of usable capacity to the upper layers.

The common transport layer is the key to the operation of Thunderbolt and
what makes it attractive as a high-speed peripheral I/O technology. Some of the
features include:

• A high-performance, low-power, switching architecture.

• A highly efficient, low-overhead packet format with flexible quality of service
(QoS) support that allows multiplexing of bursty PCI Express transactions
with DisplayPort communication on the same link. The transport layer has the
ability to flexibly allocate link bandwidth using priority and bandwidth reservation
mechanisms.

• The use of small packet sizes to achieve low latency.

• The use of credit-based flow control to achieve small buffer sizes.

• A symmetric architecture that supports flexible topologies (star, tree, daisy
chaining, etc.) and enables peer-to-peer communication (via software)
between devices.

• A novel time synchronization protocol that allows all the Thunderbolt products
connected in a domain to synchronize their time within 8ns of each
other.

The application layer contains I/O protocols that are mapped onto the transport
layer. Initially, Thunderbolt provides full support for PCIe and DisplayPort
protocols. This function is provided by a protocol adapter, which is responsible for
efficient encapsulation of the mapped protocol information into transport layer
packets. Mapped protocol packets between a source device and a destination device
may be routed over a path that may cross multiple Thunderbolt controllers. At the
destination device, a protocol adapter re-creates the mapped protocol in a way that
is indistinguishable from what was received by the source device. The advantage of
doing protocol mapping in this way is that Thunderbolt technology–enabled product
devices appear as PCIe or DisplayPort devices to the operating system of the
host computer, thereby enabling the use of standard drivers that are available in
many operating systems today.
37

InfiniBand
Recent I/O specification aimed at the high-end server market
First version was released in early 2001
Standard describes an architecture and specifications for data flow among processors and intelligent I/O devices
Has become a popular interface for storage area networking and other large storage configurations
Enables servers, remote storage, and other network devices to be attached in a central fabric of switches and links
The switch-based architecture can connect up to 64,000 servers, storage systems, and networking devices

+

InfiniBand is a recent I/O specification aimed at the high-end server market. The
first version of the specification was released in early 2001 and has attracted numerous
vendors. The standard describes an architecture and specifications for data flow
among processors and intelligent I/O devices. InfiniBand has become a popular
interface for storage area networking and other large storage configurations. In
essence, InfiniBand enables servers, remote storage, and other network devices to
be attached in a central fabric of switches and links. The switch-based architecture
can connect up to 64,000 servers, storage systems, and networking devices.

Although PCI is a reliable interconnect method
and continues to provide increased speeds, up to 4 Gbps, it is a limited architecture
compared to InfiniBand. With InfiniBand, it is not necessary to have the basic I/O
interface hardware inside the server chassis. With InfiniBand, remote storage,
networking, and connections between servers are accomplished by attaching all
devices to a central fabric of switches and links. Removing I/O from the server
chassis allows greater server density and allows for a more flexible and scalable data
center, as independent nodes may be added as needed.

Unlike PCI, which measures distances from a CPU motherboard in centimeters,
InfiniBand’s channel design enables I/O devices to be placed up to 17 meters
away from the server using copper, up to 300 m using multimode optical fiber, and
up to 10 km with single-mode optical fiber. Transmission rates has high as 30 Gbps
can be achieved.
38

InfiniBand Switch Fabric

Figure 7.19 illustrates the InfiniBand architecture. The key elements are as
follows:

• Host channel adapter (HCA): Instead of a number of PCI slots, a typical
server needs a single interface to an HCA that links the server to an InfiniBand
switch. The HCA attaches to the server at a memory controller, which has
access to the system bus and controls traffic between the processor and memory
and between the HCA and memory. The HCA uses direct-memory access
(DMA) to read and write memory.

• Target channel adapter (TCA): A TCA is used to connect storage systems,
routers, and other peripheral devices to an InfiniBand switch.

• InfiniBand switch: A switch provides point-to-point physical connections to a
variety of devices and switches traffic from one link to another. Servers and
devices communicate through their adapters, via the switch. The switch’s
intelligence manages the linkage without interrupting the servers’ operation.

• Links: The link between a switch and a channel adapter, or between two
switches.

• Subnet: A subnet consists of one or more interconnected switches plus the links
that connect other devices to those switches. Figure 7.19 shows a subnet with
a single switch, but more complex subnets are required when a large number
of devices are to be interconnected. Subnets allow administrators to confine
broadcast and multicast transmissions within the subnet.

• Router: Connects InfiniBand subnets, or connects an InfiniBand switch to
a network, such as a local area network, wide area network, or storage area
network.

The channel adapters are intelligent devices that handle all I/O functions without
the need to interrupt the server’s processor. For example, there is a control
protocol by which a switch discovers all TCAs and HCAs in the fabric and assigns
logical addresses to each. This is done without processor involvement.

The InfiniBand switch temporarily opens up channels between the processor
and devices with which it is communicating. The devices do not have to share a
channel’s capacity, as is the case with a bus-based design such as PCI, which requires
that devices arbitrate for access to the processor. Additional devices are added to
the configuration by hooking up each device’s TCA to the switch.
39

InfiniBand Operation
The InfiniBand switch maps traffic from an incoming lane to an outgoing lane to route the data between the desired end points
Each physical link between a switch and an attached interface can support up to 16 logical channels, called virtual lanes
One lane is reserved for fabric management and the other lanes for data transport
A virtual lane is temporarily dedicated to the transfer of data from one end node to another over the InfiniBand fabric

A layered protocol architecture is used, consisting of four layers:
Physical
Link
Network
Transport

+

Each physical link between a switch and an attached
interface (HCA or TCA) can be support up to 16 logical channels, called virtual
lanes. One lane is reserved for fabric management and the other lanes for data
transport. Data are sent in the form of a stream of packets, with each packet
containing some portion of the total data to be transferred, plus addressing and
control information. Thus, a set of communications protocols are used to manage
the transfer of data. A virtual lane is temporarily dedicated to the transfer of data
from one end node to another over the InfiniBand fabric. The InfiniBand switch
maps traffic from an incoming lane to an outgoing lane to route the data between
the desired end points.

Figure 7.20 also indicates that a layered protocol architecture is used, consisting
of four layers:

• Physical: The physical-layer specification defines three link speeds (1X,
4X, and 12X) giving transmission rates of 2.5, 10, and 30 Gbps, respectively
(Table 7.3). The physical layer also defines the physical media, including copper
and optical fiber.

• Link: This layer defines the basic packet structure used to exchange data,
including an addressing scheme that assigns a unique link address to every
device in a subnet. This level includes the logic for setting up virtual lanes and
for switching data through switches from source to destination within a subnet.
The packet structure includes an error-detection code to provide reliability.

Network: The network layer routes packets between different InfiniBand
subnets.

• Transport: The transport layer provides reliability mechanism for end-to-end
transfer of packets across one or more subnets.
40

Table 7.3
InfiniBand Links and Data Throughput Rates

+

InfiniBand Communication Protocol Stack

Figure 7.20 indicates the logical structure used to support exchanges over
InfiniBand. To account for the fact that some devices can send data faster than
another destination device can receive it, a pair of queues at both ends of each link
temporarily buffers excess outbound and inbound data. The queues can be located
in the channel adapter or in the attached device’s memory. A separate pair of queues
is used for each virtual lane. The host uses these queues in the following fashion.
The host places a transaction, called a work queue entry (WQE) into either the
send or receive queue of the queue pair. The two most important WQEs are SEND
and RECEIVE. For a SEND operation, the WQE specifies a block of data in the
device’s memory space for the hardware to send to the destination. A RECEIVE
WQE specifies where the hardware is to place data received from another device
when that consumer executes a SEND operation. The channel adapter processes
each posted WQE in the proper prioritized order and generates a completion queue
entry (CQE) to indicate the completion status.
42

zEnterprise 196
Introduced in 2010
IBM’s latest mainframe computer offering
System is based on the use of the z196 chip
5.2 GHz multi-core chip with four cores
Can have a maximum of 24 processor chips (96 cores)
Has a dedicated I/O subsystem that manages all I/O operations
Of the 96 core processors, up to 4 of these can be dedicated for I/O use, creating 4 channel subsystems (CSS)
Each CSS is made up of the following elements:
System assist processor (SAP)
Hardware system area (HSA)
Logical partitions
Subchannels
Channel path
Channel

+

The zEnterprise 196 is IBM’s latest mainframe computer offering (at the time of
this writing), introduced in 2010. The system is based on the use of the z196 chip,
which is a 5.2-GHz multi-core chip with four cores. The z196 architecture can have a
maximum of 24 processor chips for a total of 96 cores. In this section, we look at the
I/O structure of the zEnterprise 196.

The zEnterprise 196 has a dedicated I/O subsystem that manages all I/O operations,
completely off-loading this processing and memory burden from the main processors.
Of the 96 core processors, up to 4 of these can be dedicated for I/O use, creating 4 channel subsystems
(CSS).
43

Figure 7.21 shows the logical structure of the I/O subsystem.

Each CSS is made up of the following elements:

• System assist processor (SAP): The SAP is a core processor configured for I/O
operation. Its role is to offload I/O operations and manage channels and the
I/O operations queues. It relieves the other processors of all I/O tasks, allowing
them to be dedicated to application logic.

• Hardware system area (HSA): The HSA is a reserved part of the system memory
containing the I/O configuration. It is used by SAPs. A fixed amount of
16 GB is reserved, which is not part of the customer-purchased memory. This
provides for greater configuration flexibility and higher availability by eliminating
planned and preplanned outages.

• Logical partitions: A logical partition is a form of virtual machine, which is in
essence, a logical processor defined at the operating system level. Each CSS
supports up to 16 logical partitions.

Subchannels: A subchannel appear to a program as a logical device and contain
the information required to perform an I/O operation. One subchannel
exists for each I/O device addressable by the CSS. A subchannel is used by the
channel subsystem code running on a partition to pass an I/O request to the
channel subsystem. A subchannel is assigned for each device defined to the
logical partition. Up to 196k subchannels are supported per CSS.

• Channel path: A channel path is a single interface between a channel subsystem
and one or more control units, via a channel. Commands and data are sent
across a channel path to perform I/O requests. Each CSS can have up to 256
channel paths.

• Channel: Channels are small processors that communicate with the I/O control
units (CUs). They manage the data transfer between memory and the
external devices.

This elaborate structure enables the mainframe to manage a massive number
of I/O devices and communication links. All I/O processing is offloaded from the
application and server processors, enhancing performance. The channel subsystem
processors are somewhat general in configuration, enabling them to manage
a wide variety of I/O duties and to keep up with evolving requirements. The channel
processors are specifically programmed for the I/O control units to which they
interface.

44

I/O System Organization

To explain the I/O system organization, we need to first briefly explain the physical
layout of the zEnterprise 196. Figure 7.22 is a front view of the water-cooled version
of the machine (there is an air-cooled version). The system has the following
characteristics:

• Weight: 2185 kg (4817 lbs)
• Width: 1.534 m (5 ft)
• Depth: 1.375 m (4.5 ft)
• Height: 2.012 m (6.6 ft)

Not exactly a laptop.

The system consists of two large bays, called frames, that house the various
components of the zEnterprise 196. The right hand A frame includes two large
cages, plus room for cabling and other components. The upper cage is a processor
cage, with four slots to house up to four processor books that are fully interconnected.
Each book contains a multichip module (MCM), memory cards, and I/O
cage connections. Each MCM is a board that houses six multicore chips and two
storage control chips.

The lower cage in the A frame is an I/O cage, which contains I/O hardware,
including multiplexors and channels. The I/O cage is a fixed unit installed by IBM to
the customer specifications at the factory.

The left hand Z frame contains internal batteries and power supplies and
room for one or more support elements, which are used by a system manager for
platform management. The Z frame also contains slots for two or more I/O drawers.
An I/O drawer contains similar components to an I/O cage. The differences are
that the drawer is smaller and easily swapped in and out at the customer site to meet
changing requirements.

45

IBM z196 I/O System Structure

With this background, we now show a typical configuration of the zEnterprise
196 I/O system structure (Figure 7.23). The z196 processor book supports two internal
(i.e., internal to the A and Z frames) I/O infrastructures: InfiniBand for I/O
cages and I/O drawers, and PCI Express (PCIe) for I/O drawers. These channel
controllers are referred to as fanouts.

The InfiniBand connections from the processor book to the I/O cages and I/O
drawers are via a Host Channel Adapter (HCA) fanout, which has InfiniBand links
to InfiniBand multiplexors in the I/O cage or drawer. The InfiniBand multiplexors
are used to interconnect servers, communications infrastructure equipment, storage,
and embedded systems. In addition to using InfiniBand to interconnect systems,
all of which use InfiniBand, the InfiniBand multiplexor supports other I/O technologies.
ESCON (Enterprise Systems Connection) supports connectivity to disks,
tapes, and printer devices using a proprietary fiber-based technology. Ethernet connections
provide 1-Gbps and 10-Gbps connections to a variety of devices that support
this popular local area network technology. One noteworthy use of Ethernet is
to construct large server farms, particularly to interconnect blade servers with each
other and with other mainframes.

The PCIe connections from the processor book to the I/O drawers are via a
PCIe fanout to PCIe switches. The PCIe switches can connect to a number of I/O
device controllers. Typical examples for zEnterprise 196 are 1-Gbps and 10-Gbps
Ethernet and Fiber Channel.

Each book contains a combination of up to 8 InfiniBand HCA and PCIe
fanouts. Each fanout supports up to 32 connections, for a total maximum of 256
connections per processor book, each connection controlled by a channel processor.
46

Summary
External devices
Keyboard/monitor
Disk drive
I/O modules
Module function
I/O module structure
Programmed I/O
Overview of programmed I/O
I/O commands
I/O instructions
Interrupt-driven I/O
Interrupt processing
Design issues
Intel 82C59A interrupt controller
Intel 82C55A programmable peripheral interface
Direct memory access
Drawbacks of programmed and interrupt-driven I/O
DMA function
Intel 8237A DMA controller
I/O channels and processors
The evolution of the I/O function
Characteristics of I/O channels
The external interface
Types of interfaces
Point-to-point and multipoint configurations
Thunderbolt
InfiniBand
IBM zEnterprise 196 I/O structure

Chapter 7

Input/Output

+

47
Chapter 7 summary.

I/O Module

Figure 7.1 Generic Model of an I/O Module

Links to

peripheral

devices

Control Lines

Data Lines

Address Lines

System

Bus

Buffer

Transducer

Figure 7.2 Block Diagram of an External Device

Control

Logic

Control

signals from

I/O module

Status

signals to

I/O module

Data bits

to and from

I/O module

Data (device-unique)

to and from

environment

Status/Control Registers

Data Registers

Interface to

System Bus

Figure 7.3 Block Diagram of an I/O Module

I/O

Logic

Control

Lines

Address

Lines

Data

Lines

External

Device

Interface

Logic

Data

Status

Control

External

Device

Interface

Logic

Data

Status

Control

Interface to

External Device

No Interrupts Use of Interrupts

I/O-to-memory transfer
through processor

Programmed I/O Interrupt-driven I/O

Direct I/O-to-memory

transfer

Direct memory access (DMA)

Figure 7.4 Three Techniques for Input of a Block of Data

Issue Read

command to

I/O module

Read status

of I/O

module

Check

status

Read word

from I/O

Module

Write word

into memory

Done?

Next instruction

(a) Programmed I/O

CPU !! I/O

CPU !! memory

I/O !! CPU

I/O !! CPU

Error

condition

Ready Ready

Yes Yes

No

Not

ready

Issue Read

command to

I/O module

Do something

else

Interrupt
Read status

of I/O

module

Check

status

Read word

from I/O

Module

Write word

into memory

Done?

Next instruction

(b) Interrupt-driven I/O

CPU !!! memory

Do something

else

Interrupt

CPU !! DMA

DMA !! CPU

I/O !!! CPU

Error

condition

No

Issue Read

block command

to I/O module

Read status

of DMA

module

Next instruction

(c) Direct memory access

CPU !! I/O

I/O !! CPU

7 6 5

516 Keyboard input data register

4 3 2 1 0

7 6 5

517

(a) Memory-mapped I/O

Keyboard input status

and control register

1 = ready

0 = busy

4 3 2 1 0

Set to 1 to

start read

ADDRESS INSTRUCTION OPERAND COMMENT

200 Load AC “1” Load accumulator

Store AC 517 Initiate keyboard read

202 Load AC 517 Get status byte

Branch if Sign = 0 202 Loop until ready

Load AC 516 Load data byte

Figure 7.5 Memory-Mapped and Isolated I/O

(b) Isolated I/O

ADDRESS INSTRUCTION OPERAND COMMENT

200 Load I/O 5 Initiate keyboard read

201 Test I/O 5 Check for completion

Branch Not Ready 201 Loop until complete

In 5 Load data byte

Device controller or

other system hardware

issues an interrupt

Processor finishes

execution of current

instruction

Processor signals

acknowledgment

of interrupt

Processor pushes PSW

and PC onto control

stack

Processor loads new

PC value based on

interrupt

Save remainder of

process state

information

Process interrupt

Restore process state

information

Restore old PSW

and PC

Hardware Software

Figure 7.6 Simple Interrupt Processing

Start

N + 1

Y + L

N

Y

Y

T

Return

User’s
Program

Main
Memory

Processor

General
Registers

Program
Counter

Stack
Pointer

N + 1

T – M

T – M

T

Control
Stack

Interrupt
Service
Routine

User’s
Program

Interrupt
Service
Routine

(a) Interrupt occurs after instruction
at location N (b) Return from interrupt

Figure 7.7 Changes in Memory and Registers for an Interrupt

Start

N + 1

Y + L

N

Y

T

Return

Main
Memory

Processor

General
Registers

Program
Counter

Stack
Pointer

Y + L

T – M

T – M

T

Control
Stack

N + 1

External device 00

Slave
82C59A
interrupt
controller

External device 07

IR0
IR1 INT
IR2
IR3
IR4
IR5
IR6
IR7

External device 01

External device 08

Slave
82C59A
interrupt
controller

External device 15

IR0
IR1 INT
IR2
IR3
IR4
IR5
IR6
IR7

Master
82C59A
interrupt
controller
IR0
IR1 INT
IR2
IR3
IR4
IR5
IR6
IR7

External device 09

80386
processor

INTR

External device 56

Slave
82C59A
interrupt
controller

External device 63

Figure 7.8 Use of the 82C59A Interrupt Controller

IR0
IR1 INT
IR2
IR3
IR4
IR5
IR6
IR7

External device 57

8

Data
buffer

Control
logic

Control
register

Data
buffers

+5 volts

A

CA

PA41PA3 40

CB

B

ground

8086
data bus

8-bit
internal

bus

power
supplies

A0address
lines A1

read
write
reset
chip

select

8

8

(a) Block diagram (b) Pin layout

Figure 7.9 The Intel 82C55A Programmable Peripheral Interface

8

4

4

8

PA52PA2 39
PA63PA1 38
PA74PA0 37
Write5Read 36
Reset6Chip select 35
D07Ground 34
D18A1 33
D29A0 32
D310PC7 31
D411PC6 30
D512PC5 29
D613PC4 28
D714PC3 27
V15PC2 26
PB716PC1 25
PB617PC0 24
PB518PB0 23
PB419PB1 22
PB320PB2 21

A0

A1

A2

A3

A4

A5

A6

A7

C3

Interrupt

request

Interrupt

request

C0

Figure 7.10 Keyboard/Display Interface to 82C55A

INPUT

PORT

KEYBOARD

OUTPUT

PORT

82C55A

B0

B1

B2

B3

B4

B5

B6

B7

C1

C2

C6

C7

C4

C5

R0

R1

R2

R3

R4

R5

Shift

Control

Data ready

Acknowledge

DISPLAY

S0

S1

S2

S3

S4

S5

Backspace

Clear

Data ready

Acknowledge

Blanking

Clear line

Address
register

Control
logic

Data
register

Figure 7.11 Typical DMA Block Diagram

Data
count

Data lines

Address lines

Request to DMA
Acknowledge from DMA

Interrupt
Read

Write

Processor

Cycle

Fetch

Instruction

Processor

Cycle

Decode

Instruction

Processor

Cycle

Instruction Cycle

Time

DMA

Breakpoints

Figure 7.12 DMA and Interrupt Breakpoints During an Instruction Cycle

Interrupt

Breakpoint

Fetch

Operand

Processor

Cycle

Execute

Instruction

Processor

Cycle

Store

Result

Processor

Cycle

Process

Interrupt

Processor DMA

(a) Single-bus, detached DMA

(b) Single-bus, Integrated DMA-I/O

(c) I/O bus

Figure 7.13 Alternative DMA Configurations

I/O bus

System bus

I/O I/O Memory

Processor DMA Memory

I/O I/O I/O

Processor DMA DMA

I/O

I/O I/O

Memory

CPU

DACK = DMA acknowledge
DREQ = DMA request
HLDA = HOLD acknowledge
HRQ = HOLD request

Figure 7.14 8237 DMA Usage of System Bus

Data bus

DACK

DREQ

Address bus

Control bus (IOR, IOW, MEMR, MEMW)

8237 DMA
chip

Main
memory

Disk
controller

HRQ

HLDA

Bit Command Status Mode Single Mask All Mask

D0 Memory-to-memory E/D
Channel 0 has

reached TC
Clear/set

channel 0 mask
bit

D1
Channel 0

address hold
E/D

Channel 1 has
reached TC

Channel select Select channal mask bit Clear/set
channel 1 mask

bit

D2 Controller E/D Channel 2 has reached TC
Clear/set mask

bit
Clear/set

channel 2 mask
bit

D3 Normal/compressed timing
Channel 3 has

reached TC

Verify/write/
read transfer Clear/set

channel 3 mask
bit

D4 Fixed/rotating priority
Channel 0

request
Auto-

initialization
E/D

D5 Late/extended write selection
Channel 0

request

Address
increment/
decrement

select

D6 DREQ sense
active high/low

Channel 0
request

D7 DACK sense active high/low
Channel 0

request
Demand/single/
block/cascade
mode select

Not used

Not used

Selector

channel

Control signal

path to CPU

Data and

address channel

to main memory

I/O

Controller

I/O

Controller

I/O

Controller

(a) Selector

(b) Multiplexor

Figure 7.15 I/O Channel Architecture

I/O

Controller

Multi-

plexor

channel

Control signal

path to CPU

Data and

address channel

to main memory

I/O

Controller

I/O

Controller

I/O Module

Buffer

To system

bus

(a) Parallel I/O

To

peripheral

I/O Module

Figure 7.16 Parallel and Serial I/O

Buffer

To system

bus

(b) Serial I/O

To

peripheral

Figure 7.17 Example Computer Configuration with Thunderbolt

Processor

COMPUTER

Platform
controller
hub (PCH)

Thunderbolt
controller

Memory

TC

TC TC

Daisy
chain

Thunderbolt
connector

Thunderbolt
20 Gbps (max)

PCIe x4

DisplayPort

DisplayPort

Graphics
Sub­
system

Figure 7.18 Thunderbolt Protocol Layers

Common Transport

P
C
I
e

D
is
p
la
y
P
o
rt

I
/
O
 P
ro
to
c
o
l

Electrical/Optical Physical

Cable and Connector

THUNDERBOLT TECHNOLOGY

APPLICATION­SPECIFIC
PROTOCOL STACKS

Router

Figure 7.19 InfiniBand Switch Fabric

CPU

HCA

CPU
System

memory

I
n

t
e
r
n

a
l

b
u

s

Host server

Memory

controller
IB link

I
B

l
in

k

InfiniBand

switch
IB link

Target

Device

IB link Router

I
B

l
in

k

TCA

Target

device

T

C

A

Subnet

IB = InfiniBand

HCA = host channel adapter

TCA = target channel adapter

Link
Signal rate

(unidirectional)
Usable capacity (80%

of signal rate)
Effective data throughput

(send + receive)
1- wide 2.5 Gbps 2 Gbps (250 MBps) (250 + 250) MBps

4-wide 10 Gbps 8 Gbps (1 GBps) (1 + 1) GBps
12-wide 30 Gbps 24 Gbps (3 GBps) (3 + 3) Gbps

Client process

Transport engine

Host

channel

adapter

Transport layer

Network layer

Link layer

Physical layer

Figure 7.20 Infiniband Communication Protocol Stack

Server process

Port

Physical link Physical link

Packet

CQEWQE

IB = InfiniBand
WQE = work queue element
CQE = completion queue entry
QP = queue pair

QP

Send Receive

Packet relay

Packet

Port Port

Transport engine

Target

channel

adapter

Port

Fabric

Packet

CQEWQE

QP

Transactions

(IB operations)

Send Receive

IB operations

(IB packets)

IB packets

Figure 7.21 IBM z196 I/O Channel Subsystem Structure

Partition

!”#$”%&'()()*+,”%-‘”./&++-0″,12,3,(-4

!”5$6″./&++-0,”%-‘”./&++-0″,12,3,(-4

Subchannels

Channel Channel

Channel

Subsystem

Channel

Subsystem

Channel

Subsystem

Channel

Subsystem

Channel

Subsystem

4 channel

subsystems

Channel

Subsystem

Channel

Subsystem

Partition

Subchannels

Partition

Subchannels

Partition

Subchannels

!”67″%&'()()*+,”%-‘”,3,(-4

!”#758″%&'()()*+,”%-‘”,3,(-4

Channel Channel

Figure 7.22 IBM z196 I/O Frames — Front View

Figure 7.23 IBM z196 I/O System Structure

PCIe (8X) PCIe (8X)

BOOK

PCIe I/O Drawer I/O Cage Domain
or I/O Drawer

HCA2 C (6X) HCA2 C (6X)

PCIe
switch

PCIe
switch

PCIe
switch

PCIe
switch

InfiniBand
multiplexor

InfiniB and
multiplexor

Channels Ports

1-Gbps
Ethernet controller

Fibre Channel
controller

ESCON10-Gbps
Ethernet controller

/docProps/thumbnail.jpeg