Computer Architecture Midterm

profilemgarzona
Chapter3.docx

· The Digital Logic Level

In this Section

· Study the basic building blocks computers are made of (gates).

· Look at a special two-valued algebra (Boolean algebra) used to study these basic components.

· Examine fundamental circuits, obtained by combining gates (including circuits for doing arithmetic).

· Examine how gates can be used to store information (memories).

· Examine the CPU, CPU-peripherals interfaces.

3.1 Gates and Boolean Algebra

Gates

· Digital circuit: one of two values are present:

0-1 volt represents one value (e.g. binary 0),

2-5 volt represents the other value (e.g. binary 1).

· Gates are tiny electronic devices that can compute functions of these 2 valued signals.

· Gates are the basic building blocks of computers, and are made of tran-sistors.

· Transistors are electronic devices that act as very fast binary switches.

· A transistor has 3 connections to the outside world:

the collector,

the base,

the emitter.

· Figure 18 illustrates a transistor, and how transistors can be combined to form basic gates.

· Gates are obtain from combinations of transistors and can be seen as functions of input, as soon as the representation of the digital values is decided (e.g. “high” - Vcc) is logical 1, and “low” - ground is logical 0). In Figure 18 (b), (c) assumes the choice mentioned here.

· The basic gates and their behavior as functions of their inputs are repre-sented in Figure 19.

1

Figure 18: (a) A transistor as an inverter and two transistors combined to form (b) a NAND gate and (c) a NOR gate.

Figure 19: The basic gates and their behavior.

2

Figure 20: (a) The majority function as a truth table and (b) the corresponding circuit.

· Since the AND and OR gates each need 3 transistors, the NAND and NOR gates are used instead in practice.

· Constructive technologies:

bipolar: TTL (Transistor-Transistor Logic), ECL (Emitter-Coupled Logic) [high speed],

MOS (Metal Oxide Semiconductor) [slower but less power hungry]: PMOS, NMOS, CMOS (used in memories).

Boolean algebra

· Boolean algebra is used to describe the circuits that can be built by com-bining gates.

· It “captures the essence” of the logical operations AND, OR, NOT.

· Boolean functions are represented by truth tables.

· The majority function M = f(A; B; C), i.e. the ternary function that returns the value that appears as an input the most times, is represented in Figure 20.

3

· The truth table representation of boolean functions may become cumber-some with the increase in the number of inputs.

· To make things easier, one can use the notation for multiplication (AND) and addition (OR) and overbar for negation.

· The compact representation of the majority function:

· = ABC + ABC + ABC + ABC

· The compact representation records the configuration of inputs for output 1.

· A function of n variables can be described by giving a “sum” of at most 2n “product” terms (DNF).

· This formulation leads to a straightforward way to implement a boolean function using standard gates.

· However, many times there are more efficient ways to implement digital circuits.

Implementation of Boolean Functions

· Some representation conventions: whenever lines cross, no connection is implied unless a heavy dot marks the connection.

· Given a boolean function, its implementation is done in the following way:

1. Write down the truth table for the function.

2. Provide inverters to generate the complement of each input.

3. Draw an AND gate for each term with a 1 in the result column (from the truth table).

4. Wire the AND gates to the appropriate inputs.

5. Feed the output of all the AND gates into an OR gate.

· It is simpler to construct circuits that have fewer, simpler gates:

Replace multi-input gates with two-input gates: A + B + C + D with

(A + B) + (C + D).

Replace various types of gates with only one type (this is possible, see Figure 21, {NAND} and {NOR} are complete sets of boolean connectives).

4

Figure 21: The basic gates represented in terms of NAND, NOR: (a) NOT, (b)

AND and (c) OR.

Figure 22: Two equivalent Boolean functions.

5

Figure 23: Some laws of Boolean algebra.

Circuit equivalence

· Circuit equivalence is exploited to reduce the number of gates and simplify a circuit.

· Start with a Boolean function and apply the laws of Boolean algebra to simplify it, see Figure 22.

· The laws of boolean algebra are illustrated in Figure 23

· Figure 24 shows alternative notations for different gates.

· Figure 25 shows equivalent representations of the implementation of ex-clusive OR (XOR).

Representation conventions and digital devices

· According to the assignment of 0,1 to voltages (e.g. 0V, 5V respectively) in digital devices, distinguish:

positive logic: 0 for 0V, 1 for 5V,

negative logic: 1 for 0V, 0 for 5V.

· Figure 26 shows how the same device can implement different gates with different choices of representation.

6

Figure 24: Some alternative notations for (a) NAND, (b) NOR, (c) AND, (d)

OR.

Figure 25: (a) XOR and some of its equivalent implementations (b), (c), (d).

7

Figure 26: (a) Digital device in (b) positive logic and (c) negative logic.

3.2 Basic Digital Logic Circuits

Integrated Circuits

· Gates are packed on units called Integrated Circuits - ICs or chips.

· ICs: a number of gates on a plaque, pins.

· Classification of ICs:

(Small Scale Integration): 1 to 10 gates,

MSI (Medium Scale Integration): 10 to 100 gates,

LSI (Large Scale Integration): 100 to 100,000 gates.

VLSI (Very Large Scale Integration): > 100,000 gates.

· Figure 27 shows a SSI chip with 4 gates, 14 pins (12 + power and ground connectors) and a notch to ensure correct orientation.

· A gate delay of 1-10 nsec occurs.

· Implementation issues:

A circuit with 5,000,000 NANDs would need 15,000,002 pins x 0.1 inch, i.e. approx 18km long chip.

Solution: design circuits with a high gate/pin ratio.

Combinatorial circuits - Multiplexers

· Combinatorial circuits are circuits with multiple inputs and multiple out-puts, where the outputs are uniquely determined by the current inputs.

· A multiplexer is a circuit with 2n data inputs, n control inputs and 1 output, see Figure 28.

· Multiplexers can be used for parallel-to-serial converters.

· Demultiplexers are the inverse circuits.

· Figure 29 shows the use of a multiplexer to implement the majority func-tion.

8

Figure 27: A SSI chip.

Figure 28: A multiplexer.

9

Figure 29: (a) A MSI multiplexer, (b) used to implement the majority function.

Combinatorial circuits - Decoders

· Decoders have n inputs and 2n outputs - the number formed by the input is used to select (i.e. set to 1) exactly one of the 2n output lines. See Figure 30.

· Example of use: memory addressing.

Combinatorial circuits - Comparators

• A comparator circuit compares two words, see Figure 31.

Combinatorial circuits - Programmable Logic Arrays

· Boolean functions can be written as “sums” of “products”.

· Programmable Logic Arrays (Pleas) (see Figure 32) are circuits that can be used to implement arbitrary boolean functions by providing:

n input lines (2n internally by also providing their negations),

m AND gates each with inputs a subset of the input lines,

a n m matrix of fuses that specify which input goes into the ANDs,

p OR gates that take as input outputs of the m AND gates,

a matrix of m p fuses to specify which AND output goes into the OR input,

· The circuit is programmed by blowing some of the fuses.

10

Figure 30: A 3 to 8 decoder.

Figure 31: A comparator.

11

Figure 32: A 12 input 6 output programmable logic array.

Arithmetic circuits - Shifters

· Shifters have n inputs, n outputs and a control line (controlling the direc-tion of the shift - 0 for left and 1 for right), see Figure 33.

Arithmetic circuits - Adders

· A half adder computes the sum and carry of two bits, see Figure 34.

· A full adder also takes into account the carry bit from the right (carry in), see Figure 35.

· Full bit adders can be put together to form n bit adders.

· This type of adders is known as ripple carry adders.

· Ripple carry adders have disadvantages - delay.

· Improvement - carry select adder:

divide a 2n bit adder into an n bit lower half and an n bit upper half,

12

Figure 33: An 8 bit shifter.

Figure 34: A half adder.

13

Figure 35: A full adder (a) specification and (b) circuit.

duplicate the upper half hardware,

one of the upper halves gets 0 as a right carry, the other one gets 1 as a right carry,

the two upper halves run in parallel,

for the final result, one of the two circuits is selected (according to the result of the carry from the lower half).

100% increase in speed at the cost of 50% more hardware.

Arithmetic Circuits - Arithmetic Logic Units

· Arithmetic Logic Units (ALU) are circuit that perform basic word opera-tions: AND, OR, NOT, sum.

· Figure 36 illustrates a 1 bit ALU. The inputs F0; F1 control the operations to be performed, ENA; ENB enable inputs, INVA forces A.

· n bit ALUs are obtained by combining n 1 bit ALUs (bit slices), see Figure 37.

Clocks

· Clocks (see Figure 38) are circuits that provide synchronization of devices by emitting series of pulses with a precise width and with precise intervals between pulses.

· Clock cycle time: the time interval between the corresponding edges of two consecutive pulses.

· The clock is usually controlled by a crystal oscillator.

14

Figure 36: 1 bit ALU.

Figure 37: 8 bit ALU.

15

Figure 38: Clocks: (a) subdivisions of the clock cycle by inserting a delay, getting 4 references (b) rising edge of C1/ falling edge of C1/rising edge of C2/falling edge of C2 and (c) asymmetric clock - basic clock shifted and ANDed with the original circuit.

3.3 Memory

Latches

· Latches are circuits that “remember” previous input values.

· A SR latch, see Figure 39 has the following:

inputs: S, for setting, R for resetting,

outputs, Q, Q.

· SR latches are not uniquely determined by the current inputs:

S = R = 0 has 2 stable states, depending on the value of Q,

S = 1 has the effect Q = 1 (i.e., switches state 0 from state 1)

R = 1 has the effect Q = 0 ((i.e., switches state 1 from state 0)

In sum, when S is set to 1 momentarily, the latch ends up in state Q=1, regardless of what state it was previously in. Likewise, setting R to 1 forces the latch to state Q=0. the circuit “remembers” whether S or R was last on Using this property, we can build computer memories

Clocked latches

· Adding a clock (enable, strobe) to the SR latch, prevents it from changing the state, except at specified times, see Figure 40.

· SR latches have a problem: S = R = 1 (the latch jumps to one of the stable states at random).

16

Figure 39: A NOR SR latch.

Figure 40: A clocked SR latch.

· Clocked D latches fix the ambiguity of SR latches by avoiding the situation that leads to the ambiguity, see Figure 41.

· Only one input D because the input to the lower AND is always the complement of the input to upper one, the problem of both inputs being 1 never arises.

Flip-flops

· Flip-flops - are circuits that sample the value on a certain line at a par-ticular instance and store it.

· The state transition occurs during the clock transition – from 0 to 1 (the rising edge) or from 1 to 0 (the falling edge), and not on plateau (i.e., when the clock is 1).

Figure 41: A clocked D latch.

17

Figure 42: (a) Pulse generator and (b) pulse signal.

Figure 43: A D flip-flop.

· Flip-flops differ from latches in that they are edge triggered as opposed to level triggered.

· Flip-flops can be obtained by combining a pulse generator (Figure 42) with a latch (Figure 43).

· Figure 44 illustrates the standard representations of latches and flip-flops.

· Latches and flip-flops may have additional inputs (Set, Preset to force Q = 1, Reset, Clear, to force Q = 0)

Registers

• Flip-flops are packed together on ICs to form registers, see Figure 45.

18

Figure 44: Representations of (a), (b) latches and (c), (d) flip-flops:

(c) changes state on the rising edge of the clock pulse (0 to 1 transition)

Figure 45: An 8 bit register.

19

Memory organization

· Memory chips are more complicated than the registers.

· Figure 46 illustrates a memory chip holding 4 3 bit memory words.

· Inputs:

3 data inputs (I0 I2),

2 address inputs (A1; A2),

3 control inputs: CS (chip select), RD(read or write), OE (output enable).

• Outputs: 3 data outputs (O0 O2).

· The chip has 14 pins (as opposed to the 8 bit register before).

· CS selects the memory chip, RD with value 1 (read) or 0 (write).

· Write:

CS=1, RD & OE will be 0, so no output,

CS RD together with A1A2 will enable one of the write gates,

I0I1I2 will be written in the flip-flops corresponding to the selected word.

· Read:

CS RD OE will be 1, so the output is enabled,

CS RD will be 0, so all the write gates are disabled and none of flip-flops is modified.,

A1A2 will select the word whose content is sent to the output.

Noninverting buffers

· In practice, the same lines are used for input and output.

· Noninverting buffers, see Figure 47, allow the interruption of cable, such that the memory chip will not attempt to get input and output at the same time.

· Noninverting buffers are tri-state devices (they can output 0, 1 or nothing at all).

20

Figure 46: A 4 x 3 bit memory.

Figure 47: (a) A noninverting buffer (IN: data, control, OUT: data), (b) the effect of the control being 1, (c) the effect of the control being 0, (d) an inverting buffer.

21

Figure 48: The organization of a 4 Mbit memory chip: (a) 512K x 8, (b) 4096K x 1.

Memory Chips

· The previous design can be easily extended.

· For efficiency reasons, the number of words should be powers of 2.

· Memory chips observe Moore’s law.

· There are various ways in which to organize chips for a given memory size. In Figure 48 we have 4 Mbit chips organized in different ways: 512K x 8, 4096K x 1 (memories tent to be measured in bits, not in bytes).

· Terminology:

asserting - setting a value (positive if the line is set to 1, negative otherwise),

negating - the opposite of asserting (same as disconnecting).

· The 512K x 8 chip:

CS (Chip Select) asserted to select the chip.

WE (Write Enable) asserted to indicate that data is being written.

OE (Output Enable) asserted to drive the output signals.

· The 4096 x 1 chip:

internally, it is a 2048 x 2084 matrix of 1 bit cells, which gives 4 Mbits

22

the chip writes/reads 1 bit at a time,

RAS (Row Address Strobe),

CAS (Column Address Strobe).

· The organization of memories as matrices of n x n reduces the number of pins needed, but addressing is slower, since it has to be done twice, once for rows, once fore columns.

· In case building a memory with a 32-bit word from 4096K x 1 chips requires 32 chips in parallel.

RAMs and ROMs (revisited)

· RAMs (Random Access Memories) are memories that can be read and written.

Static RAMs:

· constructed using D flip-flops,

· keep their contents as long as there is power,

· very fast (popular as level 2 cache memories).

Dynamic RAMs:

· an array of cells, each cell containing one transistor and a tiny capacitor,

· have to be refreshed every few milliseconds,

· high density but slower,

· FPM (Fast Page Mode) - matrix of bits like x 1 chip.

· EDO (Extended Data Output) - improve memory bandwidth.

S(ynchronous)DRAM is a combination of static and dynamic RAM (used in large caches and main memory) (FPM and EDO are asyn-chronous).

· ROMs (Read Only Memories)

cheaper to build (when ordered in large numbers), but the mask may cost,

PROM (Programmable ROM) eliminates the cost of the mask, can be written (programmed) once,

EPROM (Erasable PROM) can be reused,

EEPROM - improvement over PROM - no special device needed to program it.

examples of EEPROM - flash memory.

23

3.4 CPU Chips and Buses

CPU Chips

· All modern CPUs are contained on a single chip.

· All interactions to the outside world is done through the processor’s pins.

· These pins communicate with memory chips and I/O chips through buses.

· Address pins:

the CPU puts a memory address on the address pins to load the word (instruction) from the respective address.

· Data pins:

the memory sends the instruction to the data pins,

· Control pins:

the CPU also needs arguments, so it informs the memory over the control pins that it wants to read data,

the memory tells the CPU over the control lines whether data is available.

bus control: the CPU tells the bus what it wants to read or write memory or do something else,

interrupts: input from the I/O devices to the CPU,

bus arbitration: needed to regulate traffic on the bus (CPU counts as an I/O device in this context),

coprocessor signalling: pins for making and granting requests to/from the coprocessor (such as a floating-point chip, graphics chip, and other chips)

status: provide or accept status information,

miscellaneous: various, e.g. backward compatibility.

· Other pins: power, ground, clock signal.

· Key parameters for CPU performance:

the number of address pins: a chip with m address can address 2m addresses (usual values for m are 16, 20, 32, 64).

the number of data pins: n can transfer an n bit word at once (usual values for n are 8, 16, 32, 64).

· Figure 49 illustrates a typical microprocessor.

24

Figure 49: The logical pinout of a generic microprocessor.

Computer buses

· Buses = electrical pathway between multiple devices.

· Types: Internal buses (ALU-registers), external buses (CPU-memory-I/O devices), see Figure 50.

· Bus protocols are sets of well defined rules about how the bus works, and which all devices connected to the bus must obey.

· Examples: Omnibus (PDP-8), Unibus (PDP-11), Multibus (8086), ISA bus (PC/AT), EISA (80386), PCI bus (many PCs), SCSI (PCs, work-stations), Nubus (Macintosh), USB (modern PCs), FireWire, VME bus (physics lab equipment), Camac bus (high energy physics).

· Some devices attached to the bus are active and can initiate transfers - masters, whereas some of them are passive - slaves.

· Example:

CPU - master, memory - slave: fetching instructions and data,

CPU - master, I/O device - slave: initiating data transfer,

CPU - master, coprocessor - slave: CPU handling instructions to the coprocessor,

I/O - master, memory - slave: DMA.

25

Figure 50: A computer system with multiple buses.

coprocessor - master, CPU - slave: coprocessor fetching operands from the CPU.

· Bus drivers - essentially digital amplifiers for bus masters (signals are not powerful enough to power the bus, usually),

· Bus receivers - chips that connect the slaves to the bus.

· Bus transceivers - for devices that can be both masters and slaves.

· A bus, like a CPU, has address, data and control lines, but there is not necessarily a one-to-one mapping between the CPU pins and the bus sig-nals (decoders are places between the CPU and the bus).

Bus design issues

· The more address lines a bus has, the more memory the CPU can address directly.

· However, more lines make the bus more expensive, a tradeoff may be necessary.

· Example: Figure 51 illustrates the evolution of the address buses for Intel processors, backward compatibility leading to a messy design.

· To increase the data transfer through the buses:

increase the number of data wires (same design problems mentioned above, more bits/transfer),

decrease the bus cycle time (more transfer/sec)

the above two ways has problems 1) bus skew (signals have different speeds on different lines, the faster bus, the more skew), 2) speeding up the bus causes backward compatibility (old boards designed for the slower bus will not work with the new one)

recent trend: transition to serial buses - high speed and simple design.

· Multiplexed buses use the same wires for both data and address signals (slower bus) : at the start of a bus operation, the lines are used for the address, later on they are used for data

26

Figure 51: The evolution of an address bus over time (a) enough for addressing 1 Mb, (b) extended to address 16 Mb, (c) extended further to address 1 Gb.

Bus clocking - synchronous buses

· Synchronous buses has a line driven by a crystal oscillator, all bus activities take an integral number of these bus cycles.

· Figure 52 illustrate the timing of a read operation on a synchronous bus.

Bus clocking - asynchronous buses

· Synchronous buses are geared towards the slowest device in the configuration.

· Asynchronous buses do not have a master clock, but a synchronization signal.

· A full handshake - set of interlocking signals ensure the correctness of operations.

· Figure 53 illustrates a read operation on an asynchronous bus.

· Note that despite the apparent obvious advantage, most of the buses are still synchronous (compatibility, investment, simplicity).

Bus arbitration

· Bus arbitration decides which device gets to use the bus.

· In the case of centralized arbitration, see Figure 54, access is granted by a bus arbiter.

27

*Reading from memory takes 3 bus cycles

*The start of : the rising edge of the clock.

*During the CPU puts the address on the address line

*After the address lines are set to their new value, & are asserted where

indicates that memory is being accessed, is asserted for reads and negated for writes.

*At the start of , the memory asserts the since the memory accessing is not finished yet.

So, there is extra bus cycle (wait states)

*During the first half of , memory puts the data one the data lines.

*At the falling edge of the CPU reads the data lines, storing the value in an internal register.

Figure 52: Timing of a read operation on a synchronous bus.

*When the bus master has asserted the address, it asserts .

*When the slave sees this, it asserts .

*When the master sees asserted, it knows that the data are available, so it stores them and the negates the address lines, along with ,

*When the slave sees the negation of , it know that the cycle has been completed, so it negates , and we are back in the original situation.

Figure 53: A read operation on an asynchronous bus.

28

Figure 54: Centralized bus arbitration: (a) with daisy chaining and (b) multi-level daisy chaining.

Figure 55: Decentralized bus arbitration.

· Daisy chaining

a) When the arbiter sees a bus request, it issues a grant by asserting the bus grant line

b) When the device physically closest to the arbiter sees the grant, it checks to see if it has made a request. If so, it takes over the bus but does not propagate the grant further down the line. If it has not made a request, it propagates the grant to the next device in line, which behave the same way, and so on..

· There could be several access levels of access priorities.

· Decentralized bus arbitration is possible, see Figure 55:

As an example, decentralized daisy chaining (same behavior as the centralized case, but cheaper and easier to implement).

a) Uses 3 bus lines, no matter how many devices are present. 1st line: a wired OR line for requesting the bus, 2nd line (called Busy) : asserted by the current bus master, 3rd line: used to arbitrate the bus, daisy chained through all the devices.

b) When no device wants the bus, the asserted arbitration line is propagated through all devices. To acquire the bus, a device first checks if the bus is idle and IN is asserted. If IN is negated, it does not become the bus master and it negates OUT. However, if IN is asserted, the device negates OUT, which causes its downstream neighbors to negate both IN and its OUT. So now the device becomes the bus master, asserts BUSY and OUT, and begins its transfer.

29

Figure 56: Block operations on buses.

Block operations

· Block read/writes are used in some cases, e.g. when using caching, as illustrated in Figure 56.

a) When a block read is started, the bus master tells the slave how many words are to be transferred, for example, by putting the word count on the data lines during . Instead of just returning one word, the slave outputs one word during each cycle until the count has been exhausted.

b) is asserted to indicate that a block transfer is requested.

Handling interrupts

· When the CPU commands an I/O device to do something, it expects back an interrupt. The interrupt signaling requires the bus.

· Since multiple devices may want to cause an interrupt simultaneously, the same kind of arbitration problems are present solution: assigns priorities to devices and use a centralized arbiter to give priority to the most time-critical devices.

· Interrupt controllers are chips that arbiter the interrupts.

· Figure 57 illustrates the 8259A controller.

· Up to 8 devices can be connected to the 8259A controller.

· More can be connected using multiple controlled cascaded.

· The procedures for handling interrupts are stored in hardware tables - interrupt vectors.

30

Figure 57: The 8259 interrupt controller.

3.5 Example CPUs

31