Computer Architecture Midterm

mgarzona

ReviewMidterm1.zip

Home >Computer Science homework help >Computer Architecture Midterm

Review Midterm/Appendix A & B.pptx

Appendix A & B

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Bits and Bytes

bit: Most basic unit of information in a computer. Two states:

1: on, true

0: off, false

byte: A group of eight bits.

Often abbreviated using a capital B.

word: A contiguous group of bytes.

The precise size of a word is machine-dependent.

Typically refers to the size of an address or integer.

The most common sizes are 32 bits (4 bytes) and 64 bits (8 bytes).

Number Representation

Initially, focus on non-negative integers of infinite length. Later in this unit, look at:

fixed-length integers

negative numbers

floating point numbers (fractional component)

Base-10 Numbers

Our decimal system is a base-10 system: each digit represents a power of 10.

For instance, the decimal number 947 in powers of 10 can be expressed as:

9*10^2 + 4*10^1 + 7*10^0

= 900 + 40 + 7

= 947

Converting Binary into Decimal

Integers are stored in a computer in binary notation where each digit is a bit:

Each bit represents a power of 2.

Also called the base-2 system.

2^0 = 1 2^1 = 2

2^2 = 4 2^3 = 8

2^4 = 16 2^5 = 32

2^6 = 64 2^7 = 128

2^8 = 256 2^9 = 512

2^10 = 1024; 2^11 = 2048

Converting Binary into Decimal Example

Example: Convert 1011012 into base 10.

1*2^5 + 0*2^4 + 1*2^3 + 1*2^2 + 0*2^1 + 1*2^0

= 32 + 0 + 8 + 4 + 0 + 1

= 45

Converting Decimal Numbers into Binary

How do you convert decimal numbers into binary? Two techniques:

Subtraction Method

Division/Remainder Method

The subtraction method is more intuitive than the division / remainder method but requires familiarity with the powers of 2.

Subtraction Method

Idea: Iteratively subtract the largest power of two.

Algorithm to convert decimal number n into binary number b:

Set b = 0.

Find x: largest power of 2 that does not exceed (≤) n.

Mark a 1 in the position represented by x in b.

n = n – x

If n ≠ 0, repeat steps 2-4.

Subtraction Method Example

Example: Use the subtraction method to convert 90 into binary.

X = __ ; b = 0000000; N = 90

X = 2^6 = 64; b = 1000000; N = 90 – 64 = 26

X = 2^4 = 16; b = 1010000; N = 26 -16 = 10

X = 2^3 = 8; b = 1011000; N = 10 -8 = 2

X = 2^1 = 2; b = 1011010; N = 2 -2 = 0

90 = 1011010 in binary

Division / Remainder Method

Idea: Continuously divide by two and record the remainder.

Algorithm to convert decimal number n into binary number b:

Set k = 0, b = 0

Divide n = n / 2 storing the remainder (0 or 1) into r.

Bit 2k of b is set to r.

k = k + 1

If n ≠ 0, repeat steps 2 - 4.

Division / Remainder Method Example

Example: Use the division / remainder method to convert 177 into binary.

N = 177/2 = 88; r = 1; b= 1

N = 88/2 = 44; r = 0; b=01

N = 44/2 = 22; r=0; b=001

N = 22/2 = 11; r =0; b=0001

N = 11/2 = 5; r=1; b=10001

N = 5/2 = 2; r=1; b=110001

N = 2/2 = 1; r =0; b = 0110001

N = 1/2 = 0; r =1; b = 10110001

177 = 10110001 in binary

Hexadecimal Numbers

Problem: Binary numbers can be long and difficult to read.

Hexadecimal (base-16) numbers are often used to represent quantities in a computer.

Often preceded with ‘0x’ such as 0x61A2F.

Since 24 = 16, it is easy to convert a binary number into hexadecimal:

Divide the binary number into groups of four bits.

Translate each four bit group into a hexadecimal digit.

Hexadecimal Number Example

Example: Convert the binary number into hexadecimal.

11010100011011

11010100011011 = 0x351B

11010100011011 = 32433 in Octal

Class Problem

Convert the number 489 into (a) binary and (b) hexadecimal.

Binary: 111101001

Hexadecimal: 0x1E9

Fixed Length Integers

Integers in a computer have finite length.

An unsigned (nonnegative) integer of n bits can represent values of 0 to 2n – 1.

In C++, can find size of a type using sizeof operator. For instance, sizeof(char) = 1.

C++ Integer Sizes

Data type	Size (bytes)	Unsigned range (0 to …)
char	1	255
short	2	65,535
int, long	4	4,294,967,295
long long	8	~ 1.84 * 10^19

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Binary Math Facts

Works in the same way as base-10 addition. Math facts:

0 + 0 = 0 0 + 1 = 1

1 + 0 = 1 1 + 1 = 10

Binary Addition Example

Example: Find the sum of two bytes containing 139 and 46 using binary addition.

10001011

+ 00101110

10111001

139 + 46 = 185

Binary Addition Example

Example: Now find the sum of two bytes containing 214 and 93 using binary addition.

11010110

+ 01011101

100110011

214 + 93 = 307

Outside the range of 1 byte, i.e., 255; overflow

Overflow

Arithmetic overflow occurs when the result of an operation is too large to fit in the provided space.

In many languages (including C++), overflow is undetected.

Responsibility of the programmer to check for and/or avoid overflow conditions.

This is especially problematic since many specifications are given using integers (infinite).

Potential security hazard if integer is used in pointer arithmetic or array references.

Good rule of thumb: Restrict input as soon as enters the program.

Class Problem

Find the sums of these binary numbers. Assume a one-byte limit and indicate if overflow occurs.

10100110 01011100

+ 01101100 + 10001111

100010010 11101011

Overflow in the first sum

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Signed Magnitude Numbers

Simple way of representing negative numbers: Reserve the left most bit to represent the sign.

0 is positive

1 is negative

Example: Represent –43 with one byte.

10101011

Signed Magnitude Numbers

Some issues with signed magnitude numbers:

There are two representations of zero.

negative zero?

Logic for dealing with sign is complicated.

Consider how you would add the numbers 34 + -78?

Two’s Complement Numbers

Two’s complement numbers arrange negative and positive numbers in an ordered number line.

-4	1111 1100
-3	1111 1101
-2	1111 1110
-1	1111 1111
0	0000 0000
1	0000 0001
2	0000 0010
3	0000 0011
4	0000 0100

Two’s Complement Numbers

This creates new endpoints. For one byte the endpoints are:

bottom (most negative): 1000 0000 (-128)

top (most positive): 0111 1111 (127)

In general, if a number has b bits, the end points are:

bottom (most negative): - 2^(b-1)

top (most positive): 2^(b-1) - 1

Two’s Complement Numbers

Why is there one more negative value than positive value?

Zero consumes one of the positive value bit pattern

How do you determine if a value is positive or negative?

Look at the left most bit, the sign bit

1 => Negative

0 => Zero or positive

Negating Two’s Complement Numbers

To express a positive number – the representation is identical to unsigned.

Remember that the range of positive numbers that can be represented is reduced.

To express a negative value – use this algorithm:

Start with the positive representation.

Flip the bits: 01, 10. (bitwise not)

Add 1.

Two’s Complement Negation Example

Example: Express -43 in two’s complement.

Positive 43 = 0010 1011

Flip bits: 1101 0100

Add 1: 1101 0101

Negating Two’s Complement Numbers

How do you convert a negative number to its positive representation? The same way!

Start with the negative representation.

Flip the bits.

Add 1.

Two’s Complement Negation Example

Example: Express –(–43) in two's complement.

Negative 43 = 1101 0101

Flip the bits: 0010 1010

Add 1: 0010 1011

Two’s Complement Addition

Addition is carried out much the same way as unsigned numbers.

No special work for negative numbers

Only change is for overflow detection

Example: Add the numbers 75 and -39.

0100 1011 75

+ 1101 1001 + -39

0010 0100 36

No overflow

Two’s Complement Addition Overflow

Rule for detecting overflow when adding two's complement numbers: When the “carry in” and the “carry out” of the sign bit are different, overflow has occurred.

Example: Add the numbers 107 + 46.

0110 1011 107

+ 0010 1110 + 46

1001 1001 153

Carry in of sign bit = 1

Carry out of sign bit = 0

Therefore an overflow!

Two’s Complement Overflow Cases

Case 1: Adding a positive and a negative number.

The sign bits must be different (1 and 0)

The carry out bit will always be the same as carry in

Hence overflow can never occur

Two’s Complement Overflow Cases

Case 2: Adding two positive numbers.

The sign bits are both zero

The carry out will be zero

If the carry in is one

The result of 7-bit unsigned addition doesn’t fit in 7 bits

Overflow has occured

Two’s Complement Overflow Cases

Case 3: Adding two negative numbers.

The sign bits are both one

Carry out will always be one

If carry in is zero

Overflow has occurred

Class Problem

Find the sums of these two’s complement binary numbers. Assume a one-byte limit and indicate if overflow occurs.

1001 1001 1011 0100 0010 0111

+ 0110 0111 + 1101 1010 + 0101 1001

10000 0000 11000 1110 01000 0000

Overflow in the third case:

Carry in = 1

Carry out = 0

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Numbering Bits

Bits are commonly numbered from right to left starting with 0:

bit 7  0100 1011  bit 0

The rightmost bit (bit 0) is called the least significant bit.

The leftmost bit (bit n–1) is called the most significant bit.

Bits to the right are lower than bits to the left.

Sign Extension

For two’s complement numbers, to convert shorter (fewer bits) to longer numbers:

Starting with bit 0, copy the shorter number bit by bit.

When you are out of bits, replicate the sign bit (most significant bit) to the remaining bit positions.

The process of replicating the sign is called sign extension.

For unsigned numbers, simply place a zero in all new bit positions.

This is called zero extension.

Sign Extension Example

Example: Convert a byte containing -39 to a 16 bit number.

-39 in 8 bits: 1101 1001

-39 in 16 bits: 1111 1111 1101 1001

Other Arithmetic Operations

Subtraction: Simply negate the subtrahend and add.

Multiplication: Convert to positive numbers and determine sign at end.

Use grade-school (long) multiplication.

For multiplying two n-bit numbers, need 2n bits to represent the product.

Division:

Need to be careful about dividing by zero.

All operations (including addition) have faster algorithms that are beyond the scope of the course.

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Floating Point Numbers

To represent non-integer numbers, computers use floating point numbers.

In base-10 speak, such numbers have a decimal point.

Since computers represent everything in binary, we could call it a binary point.

However, the more generic term floating point is more commonly used.

Scientific Notation

Computers use a form of scientific notation for floating-point representation. Numbers written in scientific notation have three components:

-3.83 * 10^4

“-” is the sign

3.83 is the significand

1<= significand < 10 (unless the number is exactly zero)

4 is the exponent

-1.00101 * 2^6 (in binary expression)

“-” is the sign

1.00101 is the significand

1 <= significand < 2 (unless the number is exactly zero)

6 is the exponent

IEEE Floating Point Representation

The one-bit sign field is the sign of the stored value.

0 is positive, 1 is negative

The size of the exponent determines the range of values that can be represented.

The size of the fraction determines the precision of the representation.

Single Precision

Double Precision

IEEE Floating Point Representation

Characteristics of IEEE floating-point numbers.

Converting Decimal to Floating Point

To convert a decimal number into floating point requires three steps:

1. Convert the decimal number into a binary number.

2. Express the floating point number in scientific notation.

3. Fill in the various fields of the floating point number appropriately.

To illustrate this process, we will convert the number 10.625 into a IEEE floating point number.

Converting Decimal to Floating Point

Step 1. Convert the decimal number into a binary number.

Just like a base-10 number with a decimal point, the bits past the floating point represent negative powers of two:

…b3b2b1b0.b-1b-2b-3… =

… + b323 + b222 + b121 + b020 + b-12-1 + b-22-2 + b-32-3 + …

where 2-1 = ½ = 0.5, 2-2 = ¼ = 0.25, 2-3 = ⅛ = 0.125, …

Note: Some numbers such as 0.1 and 1/3 cannot be exactly represented in binary notation regardless of how many bits past the floating point are specified.

Converting Decimal to Floating Point

Example: Convert 10.625 into a binary number.

10.625 = 8 + 2 + 0.5 + 0.125 = 1010.101

Converting Fraction to Floating Point

Begin with the decimal fraction F

Multiply the fraction F by two; F = F*2

Whole number part “W” of the multiplication result in step 2 is the next binary digit to the right of the point

Update F by discarding the whole number part; F = F – W

If F > 0, repeat steps 2, 3, 4

Converting Decimal to Floating Point

Step 2. Express the floating point number in scientific notation.

Recall in base-10 scientific notation, the number to the left of the decimal point must be 1-9 (unless the number is zero). Examples:

857.63 =

0.00007634 =

In binary, the number to the left of the floating point must be a 1 (unless the number is zero). Examples:

110111.01 =

0.011101 =

8.5763 * 10^2

7.634 * 19^(-5)

1.1011101 * 2^5

1.1101 * 2^(-2)

Converting Decimal to Floating Point

Example: Express 10.625 as a binary number in scientific notation.

1010.101 = 1.010101 * 2^3

Converting Decimal to Floating Point

Step 3. Fill in the various fields of the floating point number appropriately.

Sign bit: 0 positive, 1 negative

Exponent: Holds the exponent using a biased notation (more below).

Fraction: Holds the fractional part of the significand in scientific notation.

The ‘1’ before the floating point is implied and not stored.

Cannot represent zero (more later).

Converting Decimal to Floating Point

Exponent is stored using an unusual biased representation. Think of it as a combination of two’s complement and unsigned numbers:

Two’s complement: number line including both negative and positive numbers

Unsigned: lowest number is all zeroes, highest number is all ones.

Both of these properties are true in the biased representation.

Biased Representation

For an 8 bit exponent:

-127	0000 0000	reserved for special numbers
-126	0000 0001
…	…
-2	0111 1101
-1	0111 1110
0	0111 1111
1	1000 0000
2	1000 0001
…	…
127	1111 1110
128	1111 1111	reserved for special numbers

Excess 127 Bias

For 8 bits, this form is called excess 127 bias because the numbers are 127 apart from the two’s complement equivalent.

To convert a two’s complement number to excess 127 bias: Add +127 in its two's complement form (0111 1111) and ignore overflow.

To convert a number from excess 127 bias to two's complement: Add -127 in its two's complement form (1000 0001) and ignore overflow.

Converting Decimal to Floating Point

Example: Convert 10.625 as a IEEE floating point number in both binary and hexadecimal.

10.625 = 1010.101 (binary)

1.010101 * 2^3

Sign bit = 0

Fraction = 010101…0 (23 bits)

Exponent = 3 = 011 (binary) = 1000 0010 (Excess 127)

Therefore the floating point number is

0100 0001 0010 1010 0000 0000 0000 0000

0x412A0000

Class Problem

Convert the IEEE floating point number 0xC2AC8000 into decimal.

1100 0010 1010 1100 1000 0000 0000 0000

Sign: 1

Exponent: 10000101

Fraction: 010 1100 1000 0000 0000 0000

To convert exponent from excess 127 to two’s complement add -127 (1000 0001) to it

Exponent = 6

Fraction has an implied “1” and binary point, therefore:

The number is 1.01011001 * 2^6 = 1010110.01

1010110.01 = 86.25

As sign bit is 1, the decimal number is -86.25

Special Floating Point Numbers

Some bit patterns are reserved for special numbers:

zero	infinity	NaN (not a number)
sign bit can be anything exponent is all zeroes fraction is all zeroes	sign bit: 1(-∞), 0(+∞) exponent is all ones fraction is all zeroes	sign bit can be anything exponent is all ones fraction is anything except all zeroes

Floating Point Approximations and Errors

Since the number of bits is finite, not every real number can be represented.

Many values cannot be represented exactly. This introduces error or imprecision in each floating point value and calculation.

By using a greater number of bits in the fraction, the magnitude of the error is reduced but errors can never totally be eliminated.

Floating Point Terminology

The range of a numeric format is the difference between the largest and smallest values that can express. 32-bit IEEE FP range: 1.2 x 10-38 to 3.4 x 1038

Accuracy refers to how closely a numeric representation approximates a true value.

The precision of a number indicates how much information we have about a value; the number of significant digits.

Overflow occurs when there is no room to store the high-order bits resulting from a calculation.

Underflow occurs when a value is too small to store, possibly resulting in division by zero.

Floating Point Error

It is the programmer’s job to reduce error or be aware of the magnitude of error during calculations.

When testing floating point values for equality to zero or some other number, you need to figure out how close the numbers can be considered equal.

Replace: if (a == b) …

With:

fp_error = a – b;

if (abs(fp_error) < epsilon) …

Floating Point Error

Must be aware that errors can compound through repetitive arithmetic operations:

The order of operations can affect the error.

Associative, commutative or distributive laws may no longer apply.

Best practice: use operands similar in magnitude.

Floating Point Error Example

int main()

{

int i;

float a, b;

a = 0.0;

a = a + 10000000;

for (i = 0; i < 20000000; i++) {

a = a + 0.5;

}

cout << "a = " << a << endl;

b = 0.0;

for (i = 0; i < 20000000; i++) {

b = b + 0.5;

}

b = b + 10000000;

cout << "b = " << b << endl;

return 0;

}

Example: What does this program print out?

Floating Point Addition

Check for any special values: zero, infinity, NaN.

Shift value with smaller exponent right to match larger exponent.

Add two values (or subtract if signs are different).

Normalize the value (in the form 1.bbb) and update exponent.

Check for zero and overflow.

Floating Point Multiplication

Check for any special values: zero, infinity, NaN.

Fill in sign based on sign of the two values; ignore sign for remaining steps.

Multiply fractions.

Multiply exponents (add them together).

Normalize the product (in the form 1.bbb) and update exponent.

Check for overflow and underflow.

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Character Representation

Characters (letters, digits, symbols) are represented using a code where each bit pattern represents a unique character.

Two common formats (virtually all machines use either or both of these formats):

ASCII (American Standard Code for Information Interchange) is a 7-bit code.

Unicode is a 16-bit code.

First 128 bit patterns are same as ASCII.

Includes letters and characters from non-English alphabets.

Includes more symbols (including math).

ASCII Character Set

Source: Andrew Tanenbaum

ASCII Character Set

Source: Andrew Tanenbaum

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Image Representation

Images can be thought of as a 2-dimensional array of pixels.

Each pixel is a small dot within the image that has been assigned a color.

The pixels are small enough that the human eye is unable to detect the boundaries between the different pixels.

A color is commonly represented using an RGB-value. RGB is a color model that produces colors by adding Red, Green, and Blue components.

Commonly, one byte (8 bits) is used for each of the three colors for a total of 24 bits for each pixel.

This model works best for computer monitors and televisions.

RGB Color Model

In the RGB color model, colors go from (0,0,0) to (255,255,255):

(0,0,0) is black

(255, 0, 0) is red

(0, 255, 0) is green

(0, 0, 255) is blue

(0, 255, 255) is cyan

(255, 0, 255) is magenta

(255,255,0) is yellow

(255,255,255) is white

If all three color components are the same value  shade of gray

Source: Wikipedia

Image Representation

With 24 bits – there are 16,777,216 (224) possible colors. However, computer monitors are unable to display 16 million colors.

Color printers use a different color model – CYMK (Cyan, Yellow, Magenta, blacK).

Images are typically stored in a compressed format (such as JPEG). Compression algorithms take advantage of the fact that pixels near one another are often close to the same color.

Remove redundancy

Outline

Binary Numbers

Adding Binary Numbers

Negative Integers

Other Operations with Binary Numbers

Floating Point Numbers

Character Representation

Image Representation

Sound Representation

Sounds, in the physical world, are waves of air pressure. To digitize the sound wave curve, we need to sample the wave periodically, measuring the instantaneous amplitude.

Source: Mark Guzdial, Georgia Tech

-1.1102230246251565E-16 0.3246994692046834 0.61421271268966771 0.83716647826252855 0.96940026593933037 0.99658449300666985 0.91577332665505751 0.7357239106731317 0.47594739303707373 0.16459459028073403 -0.16459459028073378 -0.47594739303707312 -0.735723910 67313125 -0.91577332665505728 -0.99658449300666985 -0.96940026593933049 -0.83716647826252877 -0.61421271268966804 -0.32469946920468379 -2.4492935982947064E-16 -1.1102230246251565E-16 0.3246994692046834 0.61421271268966771 0.83716647826252855 0.96940026593933037 0.99658449300666985 0.91577332665505751 0.7357239106731317 0.47594739303707373 0.16459459028073403 -0.16459459028073378 -0.47594739303707312 -0.73572391067313125 -0.91577332665505728 -0.99658449300666985 -0.96940026593933049 -0.83716647826252877 -0.61421271268966804 -0.32469946920468379 -2.4492935982947064E-16

Sampling

The Nyquist–Shannon sampling theorem states that if the highest frequency of a sound is N Hertz, a sampling rate of at least 2N can perfectly reconstruct the original sound.

The human ear can detect sounds up to 22,000 Hz approximately.

CD quality sound is captured at 44,100 samples per second.

Each sample is commonly encoded using 16 bits (a two's complement number).

Like images, audio formats use compression.

-we need 2N samples per seconds

Sound Representation Example

Example: How much memory is needed to store a 30-second audio file (uncompressed)?

44,100 samples per second

Each sample has 16 bits

Hence for 30 second recording we need:

44100 * 16 * 30 bits

~2.52 MB

Thank You!

Review Midterm/Chapter1_.docx

Chapter 1

1 Computer Architecture - Overview and Moti-vation

1.1 The Structured Organization of Computers

Computers, Programs

· Computer = machine that can solve problems by carrying out instructions given to it.

· Program = a sequence of instructions describing how a certain task is performed.

· Computers are built from electronic circuits. Only very basic instructions can be carried out on such machines:

Figure 1: Moving between language levels.

– add 2 numbers,

– check to see whether a number is 0,

– copy a piece of data from one location to another.

· Machine language = the set of instructions that can be carried out on the electronic circuits that form a computer.

· People find it extremely difficult to understand such a language.

· People must understand the language in order to write programs.

· To bridge the gap between machine language and human (programmer’s) language, a series of abstractions have to be employed, leading to the structured organization of computers.

Translation and interpretation

Figure 1 presents the problem of moving between “machine” and “human” languages, and its solution.

Comparison: translation and interpretation

· Similar:

– instructions from L1 are ultimately carried out by executing equiva-lent sequences in L0.

· Different:

– Translation:

· a L1 program is converted into a L0 program;

· the L1 program is thrown away;

· the L0 program is loaded into the memory and executed;

· the new L0 program has control of the execution.

– Interpretation:

· after each L1 instruction is analyzed and decoded, it is carried out immediately (no translated program);

· the interpretor is in control of the program, the initial L1 pro-gram is just data.

Virtual machines

· Instead of thinking in terms of interpretation and translation, imagine a hypothetical computer, a virtual machine M1, able to “speak” the lan-guage L1.

· If the machine M1 can be constructed easily, there is no need for L0 (and the corresponding machine M0).

· If M1 is too difficult to construct, one can still write programs in L1 (for M1), and these can either be translated or interpreted into L0 (and run on M0).

· However, in order to make the translation/interpretation from/of L1 to/into L0, these language should not be “too different”. The initial picture (“hu-man” - “machine”) was too optimistic.

· Obvious solution: introduce intermediary levels (and virtual machines), so that moving between levels becomes easier. This leads to a hierarchy of language layers.

Multilevel machines

Figure 2 presents the structure of a multilevel machine.

Languages, virtual machines

· A virtual machine defines its machine language (as the set of all instruc-tions which can be executed on that machine).

· A language defines its machine (as the machine that can execute all pro-grams written in the respective language).

· Machines based on arbitrary languages can be arbitrarily hard to build (complicated, expensive).

· Example: C++, Cobol machines can be constructed with the technology today, but would not be cost effectives.

· A computer with n levels can be seen as n different machines, each with its own language.

· The terms “level” and “virtual machine” can be used interchangeably.

Figure 2: A multilevel machine.

Modern multilevel machines.

· Contemporary multilevel machines have two or more levels. Figure 3 presents a machine with 6 levels

The digital logic level

· Gates:

– built from analog devices, can be accurately modeled as digital de-vices,

– each gate has one or more digital inputs (i.e. 0,1) and generate simple digital outputs,

– each gate can be built out of a handful of transistors,

– a small number of gates can be combined to form 1 bit memories (stores for one of two values: 0,1).

· Registers:

– combinations of typically 16, 32, 64 1 bit memories,

– each can hold a simple binary number up to some maximum.

· Gates can also be combined to form the main computing engine.

Figure 3: A multilevel machine with 6 levels. The way of moving between levels (translation/interpretation) is indicated along with the name of the program to do this.

The microarchitecture level

· A collection of (typically) 8 to 32 registers that form a local memory.

· A circuit – arithmetic logical unit (ALU) – capable of carrying out simple operations.

· Data path that connects the registers to the ALU, ensuring the data flow.

Example: select 2 registers, pass the content to the ALU where the values are added, then store the result in some other register.

· On some machines, the operations in the data path is controlled by a microprogram, in others this is done directly in hardware.

The instruction set architecture (ISA) level

· This is what is usually described in the “Machine language reference man-ual”.

· It contains the machine’s instruction set.

· These instructions are carried out interpretatively by the microprogram or by the hardware execution circuits.

The operating system machine level

· Most instructions in the language of this level can be found also on the ISA level.

· However, in addition:

– there is a new set of instructions,

– the memory is organized in a different way,

– it is possible to run two or more programs in the same time.

· The new instructions are interpreted by an interpretor at the ISA level (historically called the operating system).

· The instructions that are identical to those at the ISA level are interpreted by the microprogram or the hardware (at the microarchitecture level).

· Note that the operating system machine level is a hybrid level (instructions interpreted at 2 levels).

Some considerations regarding the levels

· The levels discussed so far:

– are intended primarily for running interpretors and translators to support higher levels;

– these are written by system programmers.

· The remainding levels are for application programmers.

· Other differences:

– method of language support:

levels 1-2-3: interpretation,

levels 4-5: (mostly) translation;

– nature of the language:

levels 1-2-3: numeric (hard to read by humans),

levels 4-5: words and abbreviations.

The assembly language level

· The language presents an abbreviation for the languages at levels 1-2-3.

· Programs in assembly languages are first translated, then interpreted by the appropriate machine for which instructions were intended.

· The program that performs the translation is called the assembler.

The problem oriented language level

· Contains the languages for application programmers (high level languages).

· There are hundreds of such languages.

Basic, C, C++, Java, LISP, Prolog, Mathematica, Python, : : : (add your favorite).

· Generally, the programs written in one of the high level languages are (can be) translated into level 3 or level 4 by translators called compilers.

Hardware, software - some considerations

· Initially:

HARDWARE: the electronic circuits (level 0) along with the memory and the input-output devices.

SOFTWARE: algorithms (detailed instructions telling how to do some-thing) and their computer representations (programs).

· However, in time, the distinction between hardware and software blurred:

“Hardware and software are logically equivalent”.

“Hardware is petrified software”.

– Any operation performed by software can be built into the hardware.

– Any hardware operation can be simulated by software.

· The above points will be illustrated by taking a look at the development of multilevel machines.

Invention of microprogramming

· In the 1940s computers had two levels: digital logic and instruction set architecture:

– they were complicated to build,

– difficult to understand,

– unreliable.

· 1951, Maurice Wilkes (University of Cambridge) had the idea to introduce an additional level (microarchitecture):

– this meant simplified hardware,

– a built-in interpreter at the microarchitecture level was interpreting programs at the ISA level.

Invention of the operating system

· Even with microprogramming, working with computers was tedious:

– programs were stored on punch cards (e.g. about 80 for a program),

– the user would have to load the interpreter/translator,

– then load their own program and data.

· Work time had to be booked in advance – computers were complicated machines that had to be operated directly by the programmer.

· Around 1960 the idea of having a program (the operating system) in the computer at all times emerged:

– it led to a new virtual machine that got more and more complicated over the years,

– some of its instructions were similar to those of the ISA level,

– others (in particular I/O) were different (operating system macros, supervisor calls).

Migration of functionality to microcode

· In the 1970s microprogramming was widespread.

· It was common practice to add more and more instructions to the set of machine instructions.

· This led to an explosion of machine instruction sets.

· Examples of instructions added:

– integer multiplication and division,

– floating-point arithmetic,

– calling and returning from procedures,

– speeding up loops,

– handling character strings.

· Examples of added features:

– computations involving arrays,

– memory relocation facilities,

– interrupt system,

– process switching.

· Adding many instructions (not always needed, not always executed) meant that microprograms grew slower, bulkier.

Elimination of microprogramming

· The solution to the problems caused by the large instruction sets devel-oped in the golden years of microprogramming was to reduce vastly the instruction set and have the instructions executed directly by hardware, rather than by a microprogram.

· There are two trends:

– CISC (Complex Instruction Set Computers),

– RISC (Reduced Instruction Set Computers),

– more details discussed later : : :

To summarize...

The evolution of multilevel machines shows how, depending on various fac-tors, the border between hardware and software moves back and forth.

1.2 Milestones in Computer Architecture

Generations of computers

· Generation zero: mechanical computers (1623(?)–1945).

· The first generation: vacuum tubes (1945–1955).

· The second generation: transistors (1955–1965).

· The third generation: integrated circuits (1965–1980).

· The forth generation: very large scale integration (1980 – ?).

· The fifth generation: Japan’s fifth generation computer project (1982 – 1992).

· The next generation?

Generation zero: mechanical computers

· Wilhelm Schichard (1623) - mechanical device able to do addition, multi-plication, division. http://en.wikipedia.org/wiki/Wilhelm_Schickard

· Blaise Pascal (1623–1662) - mechanical machine able to do addition, sub-straction. http://en.wikipedia.org/wiki/Pascal’s_calculator

· Gottfired Wilhelm “Calculemus!” von Leibniz (1646 –1716) - mechanical

machine that could multiply and divide. http://en.wikipedia.org/wiki/Stepped_Reckoner

· Charles Babbage (1792 – 1871):

– difference engine:

· designed to computer tables of numbers for navigation,

· one algorithm: the method of finite differences using polynomi-als,

· output: punched on a copper plate (CD ROM principle).

· http://en.wikipedia.org/wiki/Difference_engine

– analytical engine (1834)

· 17000 sterling pounds from the government + a large part of family fortune,

· machine consisting of a store (memory) of 1000 words, each of 50 decimals, the mill (computational unit – addition, substraction, multiplication, division), input section (punched cards), output section (punched cards, printed).

· http://en.wikipedia.org/wiki/Analytical_engine

· general purpose device,

· programmable (in a simple assembly language)

· Lady Ada Augusta Lovelace (daughter of Lord Byron) – the world’s first programmer,

· however, the analytical engine was difficult to implement, with many hardware bugs (thousands of components, entirely me-chanical).

–

· Konrad Zuse (late 1930’s, Germany)

– builds a series of automatic calculating machines using electromag-netic relays,

– was not granted government funding (other priorities),

– the machine was destroyed by Allied bombing of Berlin (1944), no influence on subsequent development of computers,

– http://en.wikipedia.org/wiki/Konrad_Zuse

· John Atanasoff - Iowa State College (late 1930’s)

– tries to build a machine for binary arithmetic,

– using capacitors for memory (similar to RAM chips),

https://en.wikipedia.org/wiki/Capacitor#/media/File:Capacitors_(7189597135).jpg

– his vision was not supported by the technology, and the machine was never completed,

– http://en.wikipedia.org/wiki/John_Vincent_Atanasoff

· George Stibbitz - Bell Labs (late 1930’s): implements a calculator simpler

than Atanasoff’s, but which worked. ( http://en.wikipedia.org/wiki/George_Stibitz )

· Howard Aiken

– was completing a PhD at Harvard,

– he discovered Babbage’s work and set out to implement his analytical engine using relays instead of toothed wheels,

– Mark I, Harvard 1944 had 72 words of 23 decimal digits each, an instruction time of 6 sec, I/O on punched paper.

– http://en.wikipedia.org/wiki/Harvard_Mark_I

Vacuum tubes (1945-1955) : https://en.wikipedia.org/wiki/Vacuum_tube

· The development of these machines was driven by World War II.

· Colossus (1943) - Alan Turing, others:

– built to perform the huge computations needed to break the German Enigma code,

– secret for 30 years, no influence on the further development of com-puters,

– http://en.wikipedia.org/wiki/Colossus_computer

· In 1943 John Mauchly is awarded a government grant to build a machine to compute artillery tables .

– together with J. Presper Eckert: ENIAC (Electronic Numerical In-tegrator and Computer), 1946.

– 18000 vacuum tubes, 5000 relays, 30 tons, 20 registers each holding 10 digit decimal numbers, programmed by 6000 switches and connecting sockets,

– presented at a summer school, leading to an explosion of interest in computers.

– spinoffs: EDSAC (University of Cambridge, Maurice Wilkes), JOH-NIAC (Rand Corporation), ILLIAC (University of Illinois), MANIAC (Los Alamos Labs), WEIZAC (Weizmann Institute, Israel).

· John von Neumann - Princeton Institute of Advanced studies:

– notices that programming with switches is tedious,

– programs can be stored along with the data in the memory,

– binary arithmetic is better than decimal.

· The design known as the von Neumann machine is the basis of nearly all digital computers (up to this day) see Figure 4

· http://en.wikipedia.org/wiki/Von_Neumann_architecture

Figure 4: The von Neumann design.

· The design was implemented as the IAS machine:

– memory of 4096 words, each of 40 bits (holding either 2 20 bit instructions or 40-bit signed integer, 8 bit instruction type, 12 bit memory address).

– ALU with a 40 bit register - the accumulator,

– typical instructions: add a word from memory to the content of the accumulator, write the content of the accumulator into the memory.

· Whirlwind I, a machine designed for real-time control (as opposed to heavy number crunching like IAS, ENIAC).

16 bit words,

– it lead to the invention of the magnetic core memory (Jay Forrester).

· IBM - initially a producer of punchers and card sorting machines,

started showing interest in computing machines.

– 1953 IBM 701, 2084 36 bit words,

– 1957 IBM 704 4k core memory, 36 bit instructions, floating point hardware,

– IBM 709, an upgrade of the 704.

Transistors (1955-1965)

• Transistors:

Figure 5: The PDP-8 omnibus.

– 1948, AT&T Bell Labs, John Bardeen, Walter Brattain, William Shockley.

– 1965 Nobel Prize for Physics,

· First transistor computers developed at MIT:

– TX-0 (Transistorized eXperimental computer 0) 1956

– TX-2, 1958

· From MIT, a number of team members go on to establish Digital Equip-ment corporation:

– 1961: PDP-1:

· 4 kb of memory, 18 bit words, $ 120,000.

· 512 x 512 points screen,

· Spacewar - one of the earliest games (MIT students).

· http://en.wikipedia.org/wiki/PDP-1

· Note that at the same time, IBM’s 7090, dedicated to scientific computing was selling for millions of $.

– 1965: PDP-8

· at $ 16,000, with 50,000 sold

· important innovation: omnibus connecting the CPU, memory, I/O devices.

· http://en.wikipedia.org/wiki/PDP-8

The schematic representation of the PDP-8 omnibus is illustrated in Figure 5.

· In this time, IBM followed 2 directions:

– Expensive models, dedicated to scientific computing: 7090, 7094, see http://en.wikipedia.org/wiki/IBM_7090

– Cheaper models, dedicated to business computing: 1401,

see http://en.wikipedia.org/wiki/IBM_1401

· Control Data Corporation:

– 1964: CDC 6600

· CPU able to perform parallel operations twice as fast as 7094,

· small computers inside used to perform I/O operations, sort of like “Snow White and the Seven Vertically Challenged People”. CPU could spend all its time crunching numbers, leaving all the details of job management and I/O to the smaller computers

· see http://en.wikipedia.org/wiki/CDC_6600

– 6600 and its successors 7600, Cray-1 (designed by Seymour Cray) were dedicated to complex computations (supercomputers).

· Burroughs B5000 - built to run Algol 60,

see http://en.wikipedia.org/wiki/Burroughs_large_systems

Integrated Circuits (1965-1980) : https://en.wikipedia.org/wiki/Integrated_circuit

· 1958 - the silicon integrated circuit (Robert Noyce,. co-founder of Intel): dozens of transistors can be put on a single chip, leading to smaller, faster computers.

· 1964 - IBM System/360 series:

– both commercial and scientific applications,

– replace the two separate strands of system design at IBM,

– first commercial computers with microprogramming (emulation of both 7094 and 1401),

– multiprogramming (many programs running in the same time on a processor),

– huge address space (224 bytes - 16 MB).

– http://en.wikipedia.org/wiki/IBM_360

· DEC PDP-11 - popular at universities, http://en.wikipedia.org/wiki/PDP_11

Very Large Scale Integration - VLSI (1980-?)

· In excess of tens of thousands of transistors on a single chip, making computers faster and cheaper (“the personal computer era”).

· Originally, personal computer chips were available. The kits consists of:

– printed circuit board,

– CPUs (including Intel 8080),

– cables, power supply,

– 8 inch floppy disk,

– no software (write your own, later CP/M on a floppy).

· Early PC’s

– IBM PC (1981 - public documentation for $ 49 leading to a PC clone industry),

– Apple, Apple II (Steve Jobs, Steve Wozniak),

– Amiga, Atari.

· Mid 1980’s - RISC processors (replacing CISC processors).

· Mid 1990’s - superscalar CPUs.

The Fifth Generation

· Japan’s fifth generation computers (1982-1992),

– http://en.wikipedia.org/wiki/Fifth_generation_computer

– The aim was to create an “epoch making” computer, with supercom-puter like performance and usable artificial intelligence capabilities,

– The machine was to be built on top of massive databases, using a logic programming language (Prolog, extended) to access data in the range of 100 M - 1 G Logical Inferences per Second (LIPS).

– The project failed, as fifth generation computers were made obsolete by the development of the CPU beyond the “obvious limitations” that motivated the project, the internet and the development of graphical user interfaces (GUI).

– Although the project failed in its goals, it lead to a strong develop-ment of research networks, relations, etc.

· Ubiquitous computing/Low power/invisible computers (alternative fifth generation): computing integrated in everyday objects and activities.

Figure 6 gives a summary of important milestones in the development of the digital computer.

The Next Generation?

· 3D circuit design (cubes of transistors instead of chips).

· Optical computing (replace electronics by optics).

· Molecular computing (chemical-biological processes for computing):

– experiments were made of 4bit computation in test tubes,

– DNA computing,

– new computing models - p-systems (IeAT Timisoara).

· Quantum computing, with special applications (quantum cryptography).

Figure 6: Milestones in the development of the digital computer.

1.3 The Computer Zoo

Technological and Economic Forces Driving Computer Development

· The primary force is the ability of chip manufacturers to pack more and more transistors per chip every year:

– Moore’s Law (Gordon Moore, co-founder and chairman of Intel, 1965):

“The complexity for minimum component costs has increased at a rate of roughly a factor of two per year ... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years.”(”Cramming more components onto integrated cir-cuits”, Electronics Magazine 19 April 1965).

– today’s formulation: the number of transistors doubles every 18 months (and it was correct so far).

– the estimation is that it will hold until around 2020 (when physical limits will be reached).

– Moore’s law drives a virtuous circle: better technology ! smaller prices ! more products ! more companies ! better technology.

– Figure 7 illustrates the fact that Moore’s law still holds today.

Figure 7: A representation of Moore’s law (source: commons.wikimedia.org).

– Nathan’s first law of software (Nathan Myhrvold, former top execu-tive, Microsoft):

“software is as gas - it expands to fill the container holding it”.

· Not all computer technologies develop as fast: storage capacity increases only about 50% per year.

· Communications, on the other hand saw spectacular development (high speed Internet).

The computer spectrum

· Several ways of using Moore’s law:

– Increase computer power at constant price.

– Build the same computer for less and less.

– Shrink the size of the hardware for constant power.

· Categories of computers:

Type	Cost ($)	Example application
Disposable computer	0.5	Greeting cards
Microcontroller	5	Watches, cars, appliances
Game console	50-200	Home/portable video games
Personal computer	500	Desktop or notebook
Server	5K	Network server
Collection of workstations	50-500K	Departmental minisupercomputer
Mainframe	5M	Batch data processing in a bank

Disposable computers

· To the low end, greeting cards, playing cheerful tunes.

· Possibly the most important development: RFID (Radio Frequency IDen-tification) chips:

– smaller than 0.5 mm on edge,

contain a tiny radio transponder and a built-in unique 128 bit number.

– Applications:

· labelling products in stores

· animal ID (pets, farm animals, etc.),

· vehicle tracking (controversial),

· airline luggage,

· cash marking (possible RFID in Euro notes),

· http://www.rfid.org

– Critical design issues: Price, energy, application-specific performance.

Microcontrollers

· Microcontrollers = computers embedded in devices that are not sold as computers.

· They manage the devices and/or user interfaces.

· Found in a large variety of devices:

– Appliances (clock radio, dishwasher, microwave, alarm).

– Communication gear (cordless phones, cell phones, pagers).

– Computer peripherals (printer, scanner, modem, CD ROM drive).

– Entertainment devices (DVD, stereo, mp3 player).

– Imaging devices (TV, digital cameras, photocopier).

– Medical devices (X ray, MRI, digital thermometer).

– Military weapon systems (missiles, etc.).

– Shopping devices (vending machines, ATM).

– Toys (talking dolls, radio-controlled cars, etc.)

· Critical design issues: Price, energy, application-specific performance.

· Example: a car could contain up to 50 microcontrollers (ABS, fuel injec-tion, radio, GPS, etc), a plane more than 200.

· Compared to RFIDs, microcontrollers are complete computers (processor, memory, I/O capabilities).

· Types:

– general purpose - complete computers,

– special purpose (architecture geared to some application, e.g. multi-media) : has instruction set tuned to a specific application.

· General purpose microcontrollers differ from ordinary computers:

– extremely cost sensitive (in large number they may cost as little as $ 0.01 per unit),

– operate in real time (get a stimulus and react to it instantly),

– physical constraints (designed with size restrictions in mind)

Game consoles

· Game consoles - normal computers with special graphics and sound capa-bilities, but limited software and little extensibility.

· Started out as low-end machines (ex. action game like ping pong on TV) but evolved into far more powerful systems (sometimes outperforming personal computers in certain dimensions).

· Examples (last generation): Microsoft XBOX 360, Sony Playstation 3, PSP, Nintendo Wii.

· Other characteristics:

– Customized hardware: special processors, I/O devices.

– Low cost - price supported by manufacturers (consoles sold at a loss, profits from games).

· See http://en.wikipedia.org/wiki/Game_consoles

· Critical system design issues: Price-performance, energy, graphics performance.

Personal computers

· Personal computer is what people think in general when hearing “com-puter”

· Include desktop and notebook models.

· Some people call “PC” for Intel CPU machines, “workstations” for high-end RISC CPU machines (see also “Apple vs. PC”) but they are essentially the same, conceptually.

· Other closely related machines: PDAs.

· Critical system design issues: Price-performance, energy, graphics performance.

Servers

· Servers are beefed up workstations, used for local networks or the Internet.

· Single processor and multiprocessor configurations.

· Architecturally not that different from personal computers, but typically much larger memories, storage, network bandwidth.

· Critical system design issues: Throughput, availability, scalability, energy.

Collections of workstations

· COW - Clusters of Workstations - multiple workstations collected to-gether.

· Special software allows them to work on a single problem.

· The clusters are called many time COTS (Commodity Off The Shelf) machines linked by high-speed networking hardware.

· These scale easily.

· One typical application: Internet Web server - server farms (hundreds, thousands of servers).

· Critical system design issues: Price-performance, throughput, energy pro-portionality.

Mainframes

· In many cases, mainframes are descendants of the IBM 360.

· They are not much faster than a powerful sever, but may have higher I/O capacity, higher storage.

· Very expensive, but many companies find it cheaper to pay once in a while for a mainframe, than update software.

· This has led to the Y2K problem (originating in the 1960-70s when COBOL programmers used only 2 digits for years).

· Andrew Tanenbaum predicts “the end of civilization as we know it at midnight on Dec. 31, 9999, when 8000 years worth of COBOL programs crash simultaneously.”

· Critical system design issues: Price-performance, application-specific performance, throughput, energy proportionality.

1.4 Computer Families

Intel x86 -*

· 1968 - Robert Noyce, Gordon Moore, Arthur Rock form Intel Corporation (and sell $ 3000 worth of chips).

· http://en.wikipedia.org/wiki/Intel_x86

· Some notable milestones in the Intel (and related) processor development:

Figure 8: Key members of the Intel family.

Chip	Date	Memory	Notable features
4004	1971	640		First microprocessor on a chip.
8008	1971	16 KB	First 8 bit microprocessor.
8080	1974	64 KB	First general-purpose CPU on a chip.
8086	1978	1	MB		First 16 bit CPU on a chip.
8088	1979	1	MB		Used in IBM PC.
80286	1982	16 MB	Memory protection.
80386, AMD am386	1985	4	GB		First 32 bit CPU.
80486	1989	4	GB		Built-in 8 KB cache memory, RISC like pipelining, in
Pentium (Pentium MMX)	1993	4	GB		Superscalar, MMX.
Cyrix 6x86	1996	4GB		Register renaming, speculative execution.
Pentium Pro, AMD K5	1995	4GB		Microoperation translation, 2 level cache (only Pentiu
Athlon	1999	4GB		Superscalar FPU design.
AMD K6-2/3, Pentium II	1997	4	BG		Extra MMX instructions.
Pentium 4	2000	4	GB		Deeply pipelined, high frequency, SSE2, hyperthreadi
Pentium M	2003	4	GB		Optimized for low power.
Athlon 64/Opteron	2003	up to	1 TB	64 bit instruction set, on-die memory controller, hype
Core 2	2006	up to	1 TB	low power, multi-core, lower clock frequencies, SSE4.
AMD Phenom	2007	up to	1 TB	Monolithic quad-core, 128 bit FPU, SSE4a, modular
Intel Atom	2008	4	GB/up to 1TB	Low power (netbook, nettop).
Core i7	2008	up to 1TB	Quad core.
AMD Bobcat	2011	as above	on-die GPU, low power.
Intel Sandy Bridge/Ivy bridge	2011-2012	as above	highly modular, on-die GPU.
Haswell/Skylake	2013-2016	as above (but more and better)	as above (but more and better)

Figure 8 gives another summary of the most important members of the Intel family.

UltraSPARC

· 1970’s Unix could only run on minicomputers such as PDP 11;

· 1981 Andy Bechtolsheim SUN 1 (Stanford University Network) a personal computer that would run Unix (Sun Microsystems 1982);

– these workstations (Sun 1, 2, 3) used Motorola CPUs;

– the Sun workstations were more powerful than the PCs of the day and came equipped with an Ethernet connection and TCP/IP software;

· 1987 Sun decides to design its own CPU, based on the revolutionary new design from the University of California at Berkeley (RISC II):

Enters SPARC (Scalable Processor ARChitecture) the basis of Sun 4;

· Various manufacturers produced their own implementation of the open architecture:

– MicroSPARC,

– HyperSPARC,

– SuperSPARC,

– TurboSPARC,

each of the above being compatible with the architecture (running the same programs);

· SPARC International Consortium manages the development of the SPARC architecture;

· The initial SPARC CPU: 32 bit machine 36 MHz, 1986 Sun/Fujitsu;

· 1995 UltraSPARC I: 64 bit architecture, 64 bit address space, 64 bit reg-isters, but still able to run SPARC 32 bit instructions:

– implementing the SPARC V9 specifications,

– instructions designed to handle images, audio – VIS (Visual Instruc-tion Set);

· 1997 UltraSPARC II;

· 2001 UltraSPARC III;

· 2003 UltraSPARC IV;

· 2006 UltraSPARC T1;

· 2007 (late) UltraSPARC T2;

· 2010 (late) SPARC T3.

· 2011 SPARC T4 (Oracle).

· 2013 SPARC T5 (Oracle).

· 2015 SPARC64 XIfx (Fujitsu), SPARCM7 (Oracle).

ARM

· Advanced RISC design, initially Acorn Computer RISC design, later Ad-vanced RISC Machine.

· Timeline:

– 1985: ARM2 (popular in schools UK, Ireland, Australia, New Zealand),

– 1993: ARM 610 (Apple Newton),

– mid 90’s: strongArm (ultrafast - 233MHz, ultra low power - 1W),

– 1994: ARM7 (still used today)

– 2011: 64 bit ARM architecture.

· ARM does not manufacture processors, but licenses them.

· ARM is the most widely used processor (over 50 billion processors by 2015, growing).

· Typical applications: hand-held devices, home appliances, etc.

· Some characteristics: RISC design, low power, hardware support for Java bytecode.

Java

· Sun Microsystems: mid 1990’s designs Java (inspired by C++), aiming at crossplatform compatibility;

· JVM (Java Virtual Machine): memory of 32 bit words, 226 instructions;

· Java programs are compiled for JVM;

· To run compiled Java code on a machine, one needs an interpreter (usually written in C i.e. available virtually everywhere, but generally slow);

· JIT (Just in Time) compilers:

– JVM to machine compilers, usually found in web browsers,

– using a JIT solution may cause delays in execution;

· picoJava II (1998)

– a chip architecture that implements JVM,

– designed for embedded devices (i.e. low cost, way under 50$),

– various implementations available.

These slides are credited to Prof. Adrian Crăciun

Review Midterm/Chapter2.docx

Figure 9: The organization of a simple computer.

· Computer Systems Organization

Computer organization

· A digital computer is an interconnected system of processors, memories and input/output devices.

· In the following we focus on each of these 3 categories.

Figure 9 illustrates the organization of a digital computer.

· CPU (Central Processing Unit) - the “brain” of the computer.

Its function is to execute the programs stored in the main memory by

– fetching instructions,

– examining them,

– executing them one after the other.

· A bus:

– connects the components of a computer,

– is a collection of wires (parallel, serial) for transmiting address, data and control signals,

– external buses connect the CPU to the memory and the I/O devices,

– internal buses are found inside the CPU.

A modern computer today

Processor	Desktop workstation	Mobile
	Core i7X 990x 3.20 GHz Quad	A5 (ARM Cortex-A9 MPCore 1GHz)
Memory	36 GB DDR3	512 MB DDR2
Secondary storage	2 x 2 TB SATA 7200 rpm HDD	64 GB SSD
Keyboard, mouse	Optimus Maximus	none
Videocard	AMD Radeon HD6870 2 GB	Integrated on-chip (PowerVR SGX543MP2)
Monitor	2 x 30” LCD	9.7” LCD multitouch
Network card	Intel(R) PRO/1000 EB	Wi-Fi, 3G
Optical drive	Blu-ray Disk Burner	-

2.1 Processors

CPU Organization

· Control Unit

– fetches instructions from the main memory and determines their type.

· ALU

– performs operations needed to carry out the instructions,

– e.g.: addition, boolean AND.

· Registers

– small high-speed memory used to store temporary results and certain control information,

– general purpose and special purpose, e.g.:

Program Counter (PC) - points to the next instruction to be exe-cuted,

Instruction Register (IR) - holds the instruction that is currently executed.

Data Path

· The registers, together with the ALU and several buses form the data path.

· Instructions:

– register-memory

· fetch words from memory,

· store words back into memory.

– register-register (data path cycle)

· fetch two operands from registers into the input registers,

· bring the content into the ALU input registers,

Figure 10: The data path of a typical von Neumann machine.

· perform operation into the ALU,

· store result back intro a register.

Example: Figure 10 illustrates an addition operation on a typical von Neumann machine.

– The data path cycle is the core of the CPU.

Instructions execution

· The CPU executes the instructions in a series of small steps:

1. Fetch the next instruction from memory into an instruction register.

2. Change the program counter to point to the next instruction.

3. Determine the type of instruction just fetched.

4. If the instruction uses a word in the memory, determine where it is.

5. Fetch the word, if needed, into a CPU register.

6. Execute the instruction.

7. Go to step 1 to begin executing the next instruction.

· The above sequence is the fetch-decode-execute cycle, central to the op-eration of all computers.

· The way the CPU works can be described into some programming lan-guage. Figure 11 illustrate such an interpreter.

Figure 11: An interpreter for a simple computer (written in Java).

Interpretation of instructions

· Instruction interpreter - a program that can imitate the functions of a CPU (fetch, examine, execute).

· Hardware (processor) – software (interpreter) equivalence!

· After having specified the machine language L, design decision for L:

– direct hardware implementation,

– implementation of a software interpreter,

– hybrid solutions.

· Interpreter: breaks the instructions of L into small steps the interpreter’s machine becomes much simpler and cheap than the target machine for L.

· Instructions were initially simple (1950’s), then more and more complex.

· Transition to complex instructions: sequence of simple instructions occurring frequently or special cases (floating point, array operations).

· Hardware implementation of complex instructions more powerful computers. However, instruction compatibility requirements and the rising cost of software development created the need to implement complex instructions even on low-end computers.

So, how to build a low-cost computer?

· Economic factors:

– hardware instructions - higher cost (Cray supercomputers).

– solution: interpretation.

RISC vs. CISC

CISC (Complex Instruction Set Computer) (term coined later, in opposi-tion to RISC.)

– since 1950’s the instructions were interpreted,

– more and more instructions (DEC VAX: several hundred instructions, more than 200 ways to specify operands in each instr.),

– interpretation provided a way to add new instructions quickly,

– interpretation provised easy bug fixing,

– interpretation ensured backward compatibility.

RISC (Reduced Instruction Set Computer)

– 1980, David Patterson and Carlo Séquin (Berkeley): VLSI CPU without interpretation,

– coined the term RISC and named the CPU RISC I (! SPARC),

– 1981 John Hennessy (Stanford): MIPS CPU (! MIPS),

– no need to be backward compatibility to clog the design,

5

– key idea: instructions can be issued quickly and executed in hard-ware (the key to good performance: designing instructions that could be issued quickly).

RISC vs. CISC ?

– RISC supports’ belief: even if a RISC machine takes four or five instr.to do what a CISC machine does in one instruction, if RISC instructions are 10x faster (they are not interpreted), RISC wins.

– However, nothing like this has happened. Why not?

– too much invested in CISC software, hardware, RISC has not taken over,

– hybrid approaches (CISC architecture with RISC core - Intel Pentium + successors):

CISC was able to contain RISC core that execute the simplest instruction, while interpreting complicated instructions in the usual CISC way

Design principles for modern computers

· All instructions directly executed by hardware.

– CISC instructions should be broken down into RISC instructions.

· Maximize the rate at which the instructions are issued.

– it is less important how long instructions will take to execute,

– parallelism could play an important role.

· Instructions should be easy to decode.

– instructions should be regular, with fixed length and a small number of fields.

· Only Loads and Stores should reference the memory.

– operands for all other operations should come from and return to registers.

· Provide plenty of registers

– access to memory is slow ! the more registers, the better.

· Attention: The above principles are relative to the current technological resources and limitations!

6

Figure 12: (a) A five stage pipeline. (b) The state of each stage as a function of time (nine clock cycles).

Improving computer performance

· Brute force: make chips faster by increasing the clock speed. (Limited by the technological development at a fixed moment in time.)

· Parallelism: doing more things at once - get more performance for a given clock speed.

– Instruction level parallelism - get more instructins per second out of the machine.

– Processor (thread) level parallelism - use more processors to share the work on the same problem.

– Data level parallelism - spread data over multiple system (e.g. “cloud”).

Instruction-level parallelism: pipelines

· Pipelining: divide instruction execution into small steps, each supported by a dedicated piece of hardware (as illustrated in Figure 12).

Instruction level parallelism: superscalar architectures

· “One pipeline good, two pipelines better.”

– pairs of instructions that do not conflict over resources or do not depend on eachother can be sent on different pipelines,

– correctness of this process is ensured by the compiler, or using extra hardware,

7

Figure 13: Dual five-stage pipeline with a common instruction fetch unit.

Figure 14: A superscalar processor with five functional units.

– Example: Intel processors (Pentium):

· the u pipeline: arbitrary instructions,

· the v pipeline: integer and floating-point instructions.

· this led to important performance gains (Pentium up to 2 times faster than 486).

– Figure 13 illustrates the use of two pipelines.

· “Two pipelines good, four pipelines better”?

Actually, no - too much hardware duplication.

· For high-end CPUs: one pipeline, but with multiple functional units, as illustrated in Figure 14.

8

Figure 15: An array processor of the ILLIAC IV type.

Processor-level parallelism: array computers

· Array processors - large number of identical processors, performing the same sequence of instructions on different sets of data.

· ILLIAC IV, 1972 (4 quadrants, each quadrant having 8 x 8 square grid of processor-memory elements), A single control unit per quadrant broadcast a single instruction to all processors,

· Architecture known as SIMD (Single Instruction-stream Multiple Data), different form from the standard von Neumann architecture.

· The design of an ILLIAC IV quadrant is illustrated in Figure 15.

· A modern incarnation is the Fermi GPU, illustrated in Figure 16.

· Vector processor -Like a SIMD it is very efficient at executing a sequence of operations on pairs of data element, with the difference that all of operations are performed in a single, heavily pipelined functional unit (Cray-1, 1974).

· No array computers are in use today, but the same principle is used in Intel x86-* processors for multimedia instructions (MMX, SSE).

Processor-level parallelism: multiprocessors/multicomputers

· Multiprocessors - systems with multiple full-blown CPUs sharing a common memory

9

Figure 16: SIMD core of the Fermi GPU.

– easy to build for small number of processors (up to 64), difficult for more.

– Figure 17 illustrates possible implementation schemes for multipro-cessor systems.

· Multicomputers - systems of interconnected computers, each with its memory, but no common memory.

– several topologies for the connections: 2D, 3D, grid, trees, rings,

– up to 10000 CPUs.

– Hybrid approaches - multicomputers that simulate a shared memory.

Processor clock

· Instruction execution is driven by a high-frequency processor clock. e.g.

– 3.5 GHz computer (3.5 billion Herz),

– the clock ticks 3.5 billion times a second,

· Execution time (clock cycles):

– instructions may take 1, 2, 3, : : : clock cycles,

– different instructions have different execution time,

10

– processor speed is not a measure of processor power (megaherz/gigaherz myth).

Examples:

· AMD vs. Intel,

· Intel vs. PowerPC,

· Intel Pentium M vs. Pentium 4.

2.2 Primary Memory / Secondary Memory / Input/Out-put

11

Primary Memory

The Primary Memory

Primary Memory

· Memory = the part of a computer where programs and data are stored.

· Primary memory (fast, small, volatile) vs. secondary memory (slow, large, persistent).

· Synonyms (British) – storage, store (more and more used to refer to disk storage).

Bits

· basic unit of memory (holding 0 or 1);

· binary arithmetic : “more efficient” (= easier to distinguish between 2 physical values rather than between 10);

· Example: BCD(*) (Binary Coded Decimal) vs. binary:

1944:

BCD:

00001 1001 0100 0100

· binary: 0000011110011000

BCD:09999

binary:065,535

(*) [use 4 bits to represent one digit, 16 combinations possible, 6 not used, the rest 0 ,..., 9]

12

Primary Memory

Memory Addresses

· memories consist of a number of cells (locations) – used to store information;

· each cell has a number, which gives its address;

· if an address is expressed on m bits, the maximum number of addressable cells is 2m.

The organization of a 96 bit memory, on 8 bits (a),

12 bits (b) and 16 bits (c), respectively.

Primary Memory

Memory Addresses (continued)

· a cell = smallest addressable unit;

· most computer manufacturers have standardized a 8bit cell, called byte;

· bytes are grouped into words;

· word = instructions operate on entire words

32 bit machines will have 32 bit registers;

64 bit machines will have 64 bit registers.

Ordering bytes:

· big endian, lefttoright (SPARC, IBM mainframes);

· little endian (b), righttoleft (Intel).

· Problems with data transfer from one format to the other!

If we store “Jim Smith, age 21, department 260 (=1x256+4)

13

Primary Memory

Units for Measuring Memory

The multiplication factor for the next unit is 1024 = 210 .

Primary Memory

Error Correcting Codes

· memories can occasionally produce errors;

· to avoid such problems, memories use errordetecting or errorcorrecting codes;

· an n bit word (codeword) usually contains m data bits and r redundant bits:

n = m + r

· Hamming distance = the minimum number of bits that are different between 2 codewords;

· Example:

10001001 and 10110001 have the Hamming distance 3;

· with Hamming distance d, d single bit errors are required to convert from one into another;

· for a n bit codeword (n = m + r), there are 2n codewords that can be formed, of which only 2m are legal; whenever the computer detects an illegal codeword, an error has occurred;

· Example: parity bits.

14

Hamming’s algorithm using parity bits

In a Hamming code,

· r parity bits are added to an m-bit word.

· m+r bits are numbered starting at 1, not 0

· all bits whose bit number is a power of 2 are parity bits; the rest are used for data.

(Example) for a 16-bit word, 5 parity bits (bit 1,2,4,8,16) are added total 21 bits

· Each parity bit checks specific bit positions; the parity bit is set so that the total number of 1s in the checked position is even (when we use even parity system).

(Example) Bit 1 checks bit 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21

Bit 2 checks bit 2, 3, 6, 7, 10, 11, 14, 15, 18, 19

Bit 4 checks bit 4, 5, 6, 7, 12, 13, 14, 15, 20, 21

Bit 8 checks bit 8, 9, 10, 11, 12, 13, 14,15

Bit 16 checks bit 16, 17, 18, 19, 20, 21

In general, bit b is checked by those parity bits ,,, such that .

· Construction of the Hamming code for the memory word 1111000010101110 by adding 5 check bits to the 16 data bits.

· What happen if bit 5 is inverted by an electrical surge on the power line?

· So, the new codeword : 001001100000101101110

(a). Parity bit 1 incorrect (1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 contain five 1s)

(b). Parity bit 2 correct (2, 3, 6, 7, 10, 11, 14, 15, 18, 19 contain six 1s)

(c). Parity bit 4 incorrect (4, 5, 6, 7, 12, 13, 14, 15, 20, 21 contain five 1s)

(d). Parity bit 8 correct (8, 9, 10, 11, 12, 13, 14,15 contain two 1s)

(e). Parity bit 16 correct (16, 17, 18, 19, 20, 21 contain four 1s)

(a) should be even. The incorrect bit must be one of them

(d) should be even. The incorrect bit must be one of them

By the way, (b) is correct. So, eliminate 7, 15

(c) is also correct. So, eliminate 13

(e) is also correct. So, eliminate 21

The only bit left is bit 5, which is the one in error. In this way, errors can be corrected

15

Primary Memory

Cache Memory

· historically, CPUs have always been much faster than memories;

· this leads to inefficiencies in execution (CPU has to wait for the data from the memory);

· technologically, it is possible to have fast enough memories, but this is not economically viable;

· solution: use a small amount of very fast – cache memory at the CPU level;

· the most heavily used words are kept in the cache;

· CPU first looks in the cache, and only then in the main memory;

· logically, the cache lies between the main memory and the CPU, in practice it can be located in several other places.

Primary Memory

Cache Memory (continued)

· programs run faster if the data they need is in the cache and not in the main memory;

HOW TO ACHIEVE THAT ?

· locality principle: memory references made in a short time interval tend to use only a small fraction of the total memory the basis for all caching systems.

· if a word is read or written k times in a short interval, then 1 slow reference to the main memory

is needed, and k1 references to cache – the bigger the k the better the performance;

· memories and cache are organized in cache lines: whenever a word is needed from the memory, the whole cache line will be loaded;

· Using the locality principle, main memory and caches are divided up into fixed-sized blocks (cache lines)

· cache design issues:

· size of cache: 16KB, ...., 2 MB, 6MB – the bigger the more expensive the CPU;

· organization of cache;

· whether instructions and data are kept in different caches:

· unified cache: simpler to build, balance between instructions and data;

· split cache (Harvard architecture): allows parallel accesses, good with pipelined CPU

· number of caches: not uncommon to have one on the CPU, one off chip but still in the same package, and a third one further away.

16

Primary Memory

Memory Packaging

· initially, memory chips were separate units (1Kb1Mb), plugged directly in sockets;

· at present, memory chips ( 8 or 16 chips) are packaged into circuit boards, which are sold as units;

· memory units:

· SIMM (Single Inline Memory Module):

· 72 connectors;

· transfer 32 bits per clock cycle.

· typical sizes: 32MB, 64MB, ...

· DIMM (Double Inline Memory Module):

· 84 goldplated connectors;

· 64 bits transfer;

· typical sizes: 64MB, ...

· variants: SODIMM (Small Outline DIMM) – laptops.

· Usually no error detecting/correcting features, as average error rate: 1/10 years.

Primary Memory

RAM Types

· SRAM (Static RAM)

· used for cache, motherboard memories, digital signal processing;

· retains its contents as long as power remains applied;

· NVRAM (NonVolatile RAM)

· userprogrammable memory chip whose data is

retained when power to the chip is turned off;

· modems, MP3 players;

· DRAM (Dynamic RAM)

· stores each bit of data in a separate capacitor;

· use refresh logic to refresh the memory;

· bigger, better, faster, more:

· Fast Page Mode DRAM;

· EDO RAM (Extended Data Out DRAM);

· SDRAM (Synchronous DRAM) ;

· DDR SDRAM (Double Data Rate Synchronous DRAM);

· RDRAM (Rambus DRAM).

17

Secondary Memory

The Secondary Memory

Secondary Memory

Memory Hierarchies

· The trouble with the registers, cache, main memory is that although more and more is available, there isn't enough to go around to store all that needs to be stored.

· Example: 50.000.000 books (U.S. Library of Congress) each with 1 MB of textand & 1 Mb pictures would fit on about 100 TB.

· Solution: use a multilevel memory hierarchy.

18

Secondary Memory

Memory Hierarchies (continued)

· On the 5 level memory hierarchy, as we move down, 3 parameters increase:

1. the access time:

· registers (nanoseconds),

· cache (small multiple of registers),

· main memory (tens of nanoseconds),

[~~~~~ huge gap ~~~~~]

· disc access (at least 10 milliseconds),

· tape, optical disc (seconds, taking into account handling).

2. the storage capacity:

· registers (~ 128 bytes),

· cache (a few MB),

· main memory (~ 1 GB),

· magnetic disks (hundreds of GB, now TB),

· etc...

3. number of bits/$ (or favorite currency)

· prices change rapidly,

· main memory: $/ MB;

· magnetic disks: cents/ MB;

· magnetic tape: $/GB.

Secondary Memory

Performance Comparison

Operation

Clock cycles

CPU instructions

13

Register access

1

Cache access

1

Main memory access

10

Disk seek time

10,000,000

19

Secondary Memory

The Hard Disk

Secondary Memory

Magnetic Disks

· magnetic disk =

· one or more aluminum platters with a magnetizable coating;

· a disk head floating above the surface on an air cushion, at the end of an arm;

· when a positive or negative current passes through the head, it magnetizes the surface of the platter (write);

· when the head passes over the platter area, a positive or negative current is induced

on the head (read);

The geometry of a disk track (the circular sequence of bits written while the disk makes a complete rotation):

20

Secondary Memory

Organization of Magnetic Disks

· tracks are divided up in sectors of fixed length;

· sectors = preamble (allows synchronization of the head) + data (512 bytes) + error correcting code (ECC);

· EEC: Hamming code (single errors) or ReedSolomon code (multiple errors);

· between each sector there is an intersection gap;

· unformatted disk capacity – total of sectors, preambles, data, EEC, intersection gaps;

· formatted disk capacity – just the data capacity;

· storage capacity is influenced by:

· radial density: number of tracks per radial centimeter (8002000);

· linear bit densities: number of bits per track centimeter (~ 100.000);

· most disks consist of several platters stacked vertically, each with its own arm and head;

· all arms are ganged together, so they move to different radial positions all at once;

· cylinder = the set of tracks at a given radial position;

Secondary Memory

Performance

· a disk with 4 platters:

· factors that influence the performance of a magnetic disk:

· seek: the time needed to move the heads at a certain radial position (515 ms);

· rotational latency: the time needed for the desired sector to be placed under the head (48 ms);

· rotation speeds: 4200, 5400, 7200 ... RPM;

· transfer rates 520 MB/sec;

· burst rate: transfer rate when the head is over the first bit in the sector;

· sustained rate: much smaller than the burst rate.

21

Secondary Memory

Disk Controllers

· disk controllers are chips (sometimes full CPUs) associated with each drive;

· their task:

· accept commands from software: READ, WRITE, FORMAT;

· control the arm motion;

· detect, correct errors;

· buffering of multiple sectors;

· caching sectors read for potential future use

· remapping bad sectors.

Floppy Disks

· miniature disks (difference: head on the surface, small capacity – up to 1.44 MB);

· cheap, unreliable !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!; [always make multiple copies]

· surprisingly not yet extinct (springsummer 2008, in fact essential to the economy!!!!)

· (retrocool? not likely).

Secondary Memory

IDE disks

· Integrated Drive Electronics – mid 1980s

· controller integrated with the driver;

· BIOS calling convention: not changed for reason of backward compatibility;

· because of backward compatibility, it could address sectors with 4 bits for the head,

6 bits for the sectors, 10 bits for the cylinder max disk:

16 heads, 63 sectors, 1024 cylinders ~ 1,032,192 sectors ~ 528 MB;

· EIDE (Extended IDE)

· support for a second addressing scheme, LBA (Logical Block Addressing);

· 228 1 sectors;

· the controller has to convert LBA addresses to head, sector, cylinder addresses;

· other improvements: ability to control up to 4 drives, higher transfer rate;

· can also control CDROM drives, tape drives, etc.

· ANSI standard: ATAPI (Advanced Technology Attachment Packet Interface);

: Increased the addressable space & transfer rate to 30MB/sec

· This technology is comparatively cheap, widely used for the garden variety PCs;

· Drawback: internal only (limitations on cable length);

· The terms (E)IDE and ATA are used interchangeably.

22

Secondary Memory

SCSI Disks

· Small Computer System Interface (“scuzzy”)– 1986

:Not different from IDE disks in terms of the organization of cylinders, tracks, and sectors, but have different interface and much higher transfer rate.

· since 1986, increasingly fast versions have been standardized:

· SCSI 1: 8 data bits, 5 MHz bus, 5 MB/sec transfer,

· Fast SCSI: 8 data bits, 10 MHz bus, 10 MB/sec transfer,

· Wide Fast SCSI: 16 data bits, 10 MHz bus, 20 MB/sec transfer,

· Ultra SCSI: 8 data bits, 20 MHz bus, 20 MB/sec transfer,

· Wide Ultra SCSI: 16 data bits, 20 MHz bus, 40 MB/sec transfer,

· Ultra2 SCSI: 8 data bits, 40 MHz bus, 40 MB/sec transfer,

· Wide Ultra2 SCSI: 16 data bits, 40 MHz bus, 80 MB/sec transfer,

· Ultra3 SCSI: 16 data bits, 40 Mhz DDR bus, 160 MB/sec transfer, CRC, error correction (1999),

· Ultra320 SCSI: 16 data bits, 80 Mhz DDR, 320 MB/sec,

· Ultra640 SCSI – pushing the hardware limits, very short cables, impractical,

· New research and development: Serial SCSI (various approaches);

· standard disk controllers for UNIX workstations (Sun, HP, SGI), Macs, highend Intels;

· SCSI: controller + several devices (harddisks, CDROMs, scanners, etc.)

· each device has a unique ID;

· each device has 2 connectors: one for input and one for output;

· each output connected to the input of the next device, last device terminated;

· SCSI controllers and peripherals act as initiators or targets:

· Usually, the controller, acting as initiator, issues commands to disks

· commands and responses occur in phases,

· arbitration allows all the devices to run at once (as opposed to EIDE, where only one

device can run at a time;

Secondary Memory

SUMMARY: Disk Performance

23

24

Secondary Memory

RAID

· Redundant Array of Inexpensive [/Independent] Disks:

· as opposed to SLED (Single Large Expensive Disk);

· specific disk organization can improve performance/reliability;

· a RAID controller + many disks (RAID SCSI obvious choice);

· visible to the operating system as a single disk.

· RAID levels:

Level 0: distribute data over multiple disks: stripping;

parallel I/O, reliability potentially worse than SLED;

Level 1: RAID 1 + backup;

distributed (fast) read / write twice bad;

backup disks (fault tolerance);

Level 2: working on word basis;

distribute bits over a large number of disks (and add Hamming code);

Level 3: simplified version of level 2: use parity bit instead;

fault tolerance in case of 1 drive failing;

Level 4: like level 0, with 1 disk used for parity;

fault tolerance, but poor performance for small updates;

Level 5: like level 0, with parity bits distributed among the disks;

good performance, but in event of a crash, reconstructing the content is difficult;

Secondary Memory

RAID Levels

Other RAID configurations

· combinations are possible (e.g. RAID 1+0/0+1/5+0);

· proprietary RAID formats: RAID 1.5, 7, S, Z, ServerRAID 1E, etc.

Secondary Memory

CD-ROMs

CD-Recordables

CD-Writables

DVDs

Input/Output Devices

Buses

Terminals (keyboards, touch screens, flat panels displays, video RAM)

Mice

Game Controllers

Printers

Telecommunication Equipment (Modems, Digital Subscriber Lines, Internet cables)

Digital Cameras

Character Codes

25

Review Midterm/Chapter3.docx

· The Digital Logic Level

In this Section

· Study the basic building blocks computers are made of (gates).

· Look at a special two-valued algebra (Boolean algebra) used to study these basic components.

· Examine fundamental circuits, obtained by combining gates (including circuits for doing arithmetic).

· Examine how gates can be used to store information (memories).

· Examine the CPU, CPU-peripherals interfaces.

3.1 Gates and Boolean Algebra

Gates

· Digital circuit: one of two values are present:

– 0-1 volt represents one value (e.g. binary 0),

– 2-5 volt represents the other value (e.g. binary 1).

· Gates are tiny electronic devices that can compute functions of these 2 valued signals.

· Gates are the basic building blocks of computers, and are made of tran-sistors.

· Transistors are electronic devices that act as very fast binary switches.

· A transistor has 3 connections to the outside world:

– the collector,

– the base,

– the emitter.

· Figure 18 illustrates a transistor, and how transistors can be combined to form basic gates.

· Gates are obtain from combinations of transistors and can be seen as functions of input, as soon as the representation of the digital values is decided (e.g. “high” - Vcc) is logical 1, and “low” - ground is logical 0). In Figure 18 (b), (c) assumes the choice mentioned here.

· The basic gates and their behavior as functions of their inputs are repre-sented in Figure 19.

1

Figure 18: (a) A transistor as an inverter and two transistors combined to form (b) a NAND gate and (c) a NOR gate.

Figure 19: The basic gates and their behavior.

2

Figure 20: (a) The majority function as a truth table and (b) the corresponding circuit.

· Since the AND and OR gates each need 3 transistors, the NAND and NOR gates are used instead in practice.

· Constructive technologies:

– bipolar: TTL (Transistor-Transistor Logic), ECL (Emitter-Coupled Logic) [high speed],

– MOS (Metal Oxide Semiconductor) [slower but less power hungry]: PMOS, NMOS, CMOS (used in memories).

Boolean algebra

· Boolean algebra is used to describe the circuits that can be built by com-bining gates.

· It “captures the essence” of the logical operations AND, OR, NOT.

· Boolean functions are represented by truth tables.

· The majority function M = f(A; B; C), i.e. the ternary function that returns the value that appears as an input the most times, is represented in Figure 20.

3

· The truth table representation of boolean functions may become cumber-some with the increase in the number of inputs.

· To make things easier, one can use the notation for multiplication (AND) and addition (OR) and overbar for negation.

· The compact representation of the majority function:

· = ABC + ABC + ABC + ABC

· The compact representation records the configuration of inputs for output 1.

· A function of n variables can be described by giving a “sum” of at most 2n “product” terms (DNF).

· This formulation leads to a straightforward way to implement a boolean function using standard gates.

· However, many times there are more efficient ways to implement digital circuits.

Implementation of Boolean Functions

· Some representation conventions: whenever lines cross, no connection is implied unless a heavy dot marks the connection.

· Given a boolean function, its implementation is done in the following way:

1. Write down the truth table for the function.

2. Provide inverters to generate the complement of each input.

3. Draw an AND gate for each term with a 1 in the result column (from the truth table).

4. Wire the AND gates to the appropriate inputs.

5. Feed the output of all the AND gates into an OR gate.

· It is simpler to construct circuits that have fewer, simpler gates:

– Replace multi-input gates with two-input gates: A + B + C + D with

(A + B) + (C + D).

– Replace various types of gates with only one type (this is possible, see Figure 21, {NAND} and {NOR} are complete sets of boolean connectives).

4

Figure 21: The basic gates represented in terms of NAND, NOR: (a) NOT, (b)

AND and (c) OR.

Figure 22: Two equivalent Boolean functions.

5

Figure 23: Some laws of Boolean algebra.

Circuit equivalence

· Circuit equivalence is exploited to reduce the number of gates and simplify a circuit.

· Start with a Boolean function and apply the laws of Boolean algebra to simplify it, see Figure 22.

· The laws of boolean algebra are illustrated in Figure 23

· Figure 24 shows alternative notations for different gates.

· Figure 25 shows equivalent representations of the implementation of ex-clusive OR (XOR).

Representation conventions and digital devices

· According to the assignment of 0,1 to voltages (e.g. 0V, 5V respectively) in digital devices, distinguish:

– positive logic: 0 for 0V, 1 for 5V,

– negative logic: 1 for 0V, 0 for 5V.

· Figure 26 shows how the same device can implement different gates with different choices of representation.

6

Figure 24: Some alternative notations for (a) NAND, (b) NOR, (c) AND, (d)

OR.

Figure 25: (a) XOR and some of its equivalent implementations (b), (c), (d).

7

Figure 26: (a) Digital device in (b) positive logic and (c) negative logic.

3.2 Basic Digital Logic Circuits

Integrated Circuits

· Gates are packed on units called Integrated Circuits - ICs or chips.

· ICs: a number of gates on a plaque, pins.

· Classification of ICs:

– (Small Scale Integration): 1 to 10 gates,

– MSI (Medium Scale Integration): 10 to 100 gates,

– LSI (Large Scale Integration): 100 to 100,000 gates.

– VLSI (Very Large Scale Integration): > 100,000 gates.

· Figure 27 shows a SSI chip with 4 gates, 14 pins (12 + power and ground connectors) and a notch to ensure correct orientation.

· A gate delay of 1-10 nsec occurs.

· Implementation issues:

– A circuit with 5,000,000 NANDs would need 15,000,002 pins x 0.1 inch, i.e. approx 18km long chip.

– Solution: design circuits with a high gate/pin ratio.

Combinatorial circuits - Multiplexers

· Combinatorial circuits are circuits with multiple inputs and multiple out-puts, where the outputs are uniquely determined by the current inputs.

· A multiplexer is a circuit with 2n data inputs, n control inputs and 1 output, see Figure 28.

· Multiplexers can be used for parallel-to-serial converters.

· Demultiplexers are the inverse circuits.

· Figure 29 shows the use of a multiplexer to implement the majority func-tion.

8

Figure 27: A SSI chip.

Figure 28: A multiplexer.

9

Figure 29: (a) A MSI multiplexer, (b) used to implement the majority function.

Combinatorial circuits - Decoders

· Decoders have n inputs and 2n outputs - the number formed by the input is used to select (i.e. set to 1) exactly one of the 2n output lines. See Figure 30.

· Example of use: memory addressing.

Combinatorial circuits - Comparators

• A comparator circuit compares two words, see Figure 31.

Combinatorial circuits - Programmable Logic Arrays

· Boolean functions can be written as “sums” of “products”.

· Programmable Logic Arrays (Pleas) (see Figure 32) are circuits that can be used to implement arbitrary boolean functions by providing:

– n input lines (2n internally by also providing their negations),

– m AND gates each with inputs a subset of the input lines,

– a n m matrix of fuses that specify which input goes into the ANDs,

– p OR gates that take as input outputs of the m AND gates,

– a matrix of m p fuses to specify which AND output goes into the OR input,

· The circuit is programmed by blowing some of the fuses.

10

Figure 30: A 3 to 8 decoder.

Figure 31: A comparator.

11

Figure 32: A 12 input 6 output programmable logic array.

Arithmetic circuits - Shifters

· Shifters have n inputs, n outputs and a control line (controlling the direc-tion of the shift - 0 for left and 1 for right), see Figure 33.

Arithmetic circuits - Adders

· A half adder computes the sum and carry of two bits, see Figure 34.

· A full adder also takes into account the carry bit from the right (carry in), see Figure 35.

· Full bit adders can be put together to form n bit adders.

· This type of adders is known as ripple carry adders.

· Ripple carry adders have disadvantages - delay.

· Improvement - carry select adder:

– divide a 2n bit adder into an n bit lower half and an n bit upper half,

12

Figure 33: An 8 bit shifter.

Figure 34: A half adder.

13

Figure 35: A full adder (a) specification and (b) circuit.

– duplicate the upper half hardware,

– one of the upper halves gets 0 as a right carry, the other one gets 1 as a right carry,

– the two upper halves run in parallel,

– for the final result, one of the two circuits is selected (according to the result of the carry from the lower half).

– 100% increase in speed at the cost of 50% more hardware.

Arithmetic Circuits - Arithmetic Logic Units

· Arithmetic Logic Units (ALU) are circuit that perform basic word opera-tions: AND, OR, NOT, sum.

· Figure 36 illustrates a 1 bit ALU. The inputs F0; F1 control the operations to be performed, ENA; ENB enable inputs, INVA forces A.

· n bit ALUs are obtained by combining n 1 bit ALUs (bit slices), see Figure 37.

Clocks

· Clocks (see Figure 38) are circuits that provide synchronization of devices by emitting series of pulses with a precise width and with precise intervals between pulses.

· Clock cycle time: the time interval between the corresponding edges of two consecutive pulses.

· The clock is usually controlled by a crystal oscillator.

14

Figure 36: 1 bit ALU.

Figure 37: 8 bit ALU.

15

Figure 38: Clocks: (a) subdivisions of the clock cycle by inserting a delay, getting 4 references (b) rising edge of C1/ falling edge of C1/rising edge of C2/falling edge of C2 and (c) asymmetric clock - basic clock shifted and ANDed with the original circuit.

3.3 Memory

Latches

· Latches are circuits that “remember” previous input values.

· A SR latch, see Figure 39 has the following:

– inputs: S, for setting, R for resetting,

– outputs, Q, Q.

· SR latches are not uniquely determined by the current inputs:

– S = R = 0 has 2 stable states, depending on the value of Q,

– S = 1 has the effect Q = 1 (i.e., switches state 0 from state 1)

– R = 1 has the effect Q = 0 ((i.e., switches state 1 from state 0)

– In sum, when S is set to 1 momentarily, the latch ends up in state Q=1, regardless of what state it was previously in. Likewise, setting R to 1 forces the latch to state Q=0. the circuit “remembers” whether S or R was last on Using this property, we can build computer memories

Clocked latches

· Adding a clock (enable, strobe) to the SR latch, prevents it from changing the state, except at specified times, see Figure 40.

· SR latches have a problem: S = R = 1 (the latch jumps to one of the stable states at random).

16

Figure 39: A NOR SR latch.

Figure 40: A clocked SR latch.

· Clocked D latches fix the ambiguity of SR latches by avoiding the situation that leads to the ambiguity, see Figure 41.

· Only one input D because the input to the lower AND is always the complement of the input to upper one, the problem of both inputs being 1 never arises.

Flip-flops

· Flip-flops - are circuits that sample the value on a certain line at a par-ticular instance and store it.

· The state transition occurs during the clock transition – from 0 to 1 (the rising edge) or from 1 to 0 (the falling edge), and not on plateau (i.e., when the clock is 1).

Figure 41: A clocked D latch.

17

Figure 42: (a) Pulse generator and (b) pulse signal.

Figure 43: A D flip-flop.

· Flip-flops differ from latches in that they are edge triggered as opposed to level triggered.

· Flip-flops can be obtained by combining a pulse generator (Figure 42) with a latch (Figure 43).

· Figure 44 illustrates the standard representations of latches and flip-flops.

· Latches and flip-flops may have additional inputs (Set, Preset to force Q = 1, Reset, Clear, to force Q = 0)

Registers

• Flip-flops are packed together on ICs to form registers, see Figure 45.

18

Figure 44: Representations of (a), (b) latches and (c), (d) flip-flops:

(c) changes state on the rising edge of the clock pulse (0 to 1 transition)

Figure 45: An 8 bit register.

19

Memory organization

· Memory chips are more complicated than the registers.

· Figure 46 illustrates a memory chip holding 4 3 bit memory words.

· Inputs:

– 3 data inputs (I0 I2),

– 2 address inputs (A1; A2),

– 3 control inputs: CS (chip select), RD(read or write), OE (output enable).

• Outputs: 3 data outputs (O0 O2).

· The chip has 14 pins (as opposed to the 8 bit register before).

· CS selects the memory chip, RD with value 1 (read) or 0 (write).

· Write:

– CS=1, RD & OE will be 0, so no output,

– CS RD together with A1A2 will enable one of the write gates,

– I0I1I2 will be written in the flip-flops corresponding to the selected word.

· Read:

– CS RD OE will be 1, so the output is enabled,

– CS RD will be 0, so all the write gates are disabled and none of flip-flops is modified.,

– A1A2 will select the word whose content is sent to the output.

Noninverting buffers

· In practice, the same lines are used for input and output.

· Noninverting buffers, see Figure 47, allow the interruption of cable, such that the memory chip will not attempt to get input and output at the same time.

· Noninverting buffers are tri-state devices (they can output 0, 1 or nothing at all).

20

Figure 46: A 4 x 3 bit memory.

Figure 47: (a) A noninverting buffer (IN: data, control, OUT: data), (b) the effect of the control being 1, (c) the effect of the control being 0, (d) an inverting buffer.

21

Figure 48: The organization of a 4 Mbit memory chip: (a) 512K x 8, (b) 4096K x 1.

Memory Chips

· The previous design can be easily extended.

· For efficiency reasons, the number of words should be powers of 2.

· Memory chips observe Moore’s law.

· There are various ways in which to organize chips for a given memory size. In Figure 48 we have 4 Mbit chips organized in different ways: 512K x 8, 4096K x 1 (memories tent to be measured in bits, not in bytes).

· Terminology:

– asserting - setting a value (positive if the line is set to 1, negative otherwise),

– negating - the opposite of asserting (same as disconnecting).

· The 512K x 8 chip:

– CS (Chip Select) asserted to select the chip.

– WE (Write Enable) asserted to indicate that data is being written.

– OE (Output Enable) asserted to drive the output signals.

· The 4096 x 1 chip:

– internally, it is a 2048 x 2084 matrix of 1 bit cells, which gives 4 Mbits

22

– the chip writes/reads 1 bit at a time,

– RAS (Row Address Strobe),

– CAS (Column Address Strobe).

· The organization of memories as matrices of n x n reduces the number of pins needed, but addressing is slower, since it has to be done twice, once for rows, once fore columns.

· In case building a memory with a 32-bit word from 4096K x 1 chips requires 32 chips in parallel.

RAMs and ROMs (revisited)

· RAMs (Random Access Memories) are memories that can be read and written.

– Static RAMs:

· constructed using D flip-flops,

· keep their contents as long as there is power,

· very fast (popular as level 2 cache memories).

– Dynamic RAMs:

· an array of cells, each cell containing one transistor and a tiny capacitor,

· have to be refreshed every few milliseconds,

· high density but slower,

· FPM (Fast Page Mode) - matrix of bits like x 1 chip.

· EDO (Extended Data Output) - improve memory bandwidth.

– S(ynchronous)DRAM is a combination of static and dynamic RAM (used in large caches and main memory) (FPM and EDO are asyn-chronous).

· ROMs (Read Only Memories)

– cheaper to build (when ordered in large numbers), but the mask may cost,

– PROM (Programmable ROM) eliminates the cost of the mask, can be written (programmed) once,

– EPROM (Erasable PROM) can be reused,

– EEPROM - improvement over PROM - no special device needed to program it.

– examples of EEPROM - flash memory.

23

3.4 CPU Chips and Buses

CPU Chips

· All modern CPUs are contained on a single chip.

· All interactions to the outside world is done through the processor’s pins.

· These pins communicate with memory chips and I/O chips through buses.

· Address pins:

– the CPU puts a memory address on the address pins to load the word (instruction) from the respective address.

· Data pins:

– the memory sends the instruction to the data pins,

· Control pins:

– the CPU also needs arguments, so it informs the memory over the control pins that it wants to read data,

– the memory tells the CPU over the control lines whether data is available.

– bus control: the CPU tells the bus what it wants to read or write memory or do something else,

– interrupts: input from the I/O devices to the CPU,

– bus arbitration: needed to regulate traffic on the bus (CPU counts as an I/O device in this context),

– coprocessor signalling: pins for making and granting requests to/from the coprocessor (such as a floating-point chip, graphics chip, and other chips)

– status: provide or accept status information,

– miscellaneous: various, e.g. backward compatibility.

· Other pins: power, ground, clock signal.

· Key parameters for CPU performance:

– the number of address pins: a chip with m address can address 2m addresses (usual values for m are 16, 20, 32, 64).

– the number of data pins: n can transfer an n bit word at once (usual values for n are 8, 16, 32, 64).

· Figure 49 illustrates a typical microprocessor.

24

Figure 49: The logical pinout of a generic microprocessor.

Computer buses

· Buses = electrical pathway between multiple devices.

· Types: Internal buses (ALU-registers), external buses (CPU-memory-I/O devices), see Figure 50.

· Bus protocols are sets of well defined rules about how the bus works, and which all devices connected to the bus must obey.

· Examples: Omnibus (PDP-8), Unibus (PDP-11), Multibus (8086), ISA bus (PC/AT), EISA (80386), PCI bus (many PCs), SCSI (PCs, work-stations), Nubus (Macintosh), USB (modern PCs), FireWire, VME bus (physics lab equipment), Camac bus (high energy physics).

· Some devices attached to the bus are active and can initiate transfers - masters, whereas some of them are passive - slaves.

· Example:

– CPU - master, memory - slave: fetching instructions and data,

– CPU - master, I/O device - slave: initiating data transfer,

– CPU - master, coprocessor - slave: CPU handling instructions to the coprocessor,

– I/O - master, memory - slave: DMA.

25

Figure 50: A computer system with multiple buses.

– coprocessor - master, CPU - slave: coprocessor fetching operands from the CPU.

· Bus drivers - essentially digital amplifiers for bus masters (signals are not powerful enough to power the bus, usually),

· Bus receivers - chips that connect the slaves to the bus.

· Bus transceivers - for devices that can be both masters and slaves.

· A bus, like a CPU, has address, data and control lines, but there is not necessarily a one-to-one mapping between the CPU pins and the bus sig-nals (decoders are places between the CPU and the bus).

Bus design issues

· The more address lines a bus has, the more memory the CPU can address directly.

· However, more lines make the bus more expensive, a tradeoff may be necessary.

· Example: Figure 51 illustrates the evolution of the address buses for Intel processors, backward compatibility leading to a messy design.

· To increase the data transfer through the buses:

– increase the number of data wires (same design problems mentioned above, more bits/transfer),

– decrease the bus cycle time (more transfer/sec)

– the above two ways has problems 1) bus skew (signals have different speeds on different lines, the faster bus, the more skew), 2) speeding up the bus causes backward compatibility (old boards designed for the slower bus will not work with the new one)

– recent trend: transition to serial buses - high speed and simple design.

· Multiplexed buses use the same wires for both data and address signals (slower bus) : at the start of a bus operation, the lines are used for the address, later on they are used for data

26

Figure 51: The evolution of an address bus over time (a) enough for addressing 1 Mb, (b) extended to address 16 Mb, (c) extended further to address 1 Gb.

Bus clocking - synchronous buses

· Synchronous buses has a line driven by a crystal oscillator, all bus activities take an integral number of these bus cycles.

· Figure 52 illustrate the timing of a read operation on a synchronous bus.

Bus clocking - asynchronous buses

· Synchronous buses are geared towards the slowest device in the configuration.

· Asynchronous buses do not have a master clock, but a synchronization signal.

· A full handshake - set of interlocking signals ensure the correctness of operations.

· Figure 53 illustrates a read operation on an asynchronous bus.

· Note that despite the apparent obvious advantage, most of the buses are still synchronous (compatibility, investment, simplicity).

Bus arbitration

· Bus arbitration decides which device gets to use the bus.

· In the case of centralized arbitration, see Figure 54, access is granted by a bus arbiter.

27

*Reading from memory takes 3 bus cycles

*The start of : the rising edge of the clock.

*During the CPU puts the address on the address line

*After the address lines are set to their new value, & are asserted where

indicates that memory is being accessed, is asserted for reads and negated for writes.

*At the start of , the memory asserts the since the memory accessing is not finished yet.

So, there is extra bus cycle (wait states)

*During the first half of , memory puts the data one the data lines.

*At the falling edge of the CPU reads the data lines, storing the value in an internal register.

Figure 52: Timing of a read operation on a synchronous bus.

*When the bus master has asserted the address, it asserts .

*When the slave sees this, it asserts .

*When the master sees asserted, it knows that the data are available, so it stores them and the negates the address lines, along with ,

*When the slave sees the negation of , it know that the cycle has been completed, so it negates , and we are back in the original situation.

Figure 53: A read operation on an asynchronous bus.

28

Figure 54: Centralized bus arbitration: (a) with daisy chaining and (b) multi-level daisy chaining.

Figure 55: Decentralized bus arbitration.

· Daisy chaining

a) When the arbiter sees a bus request, it issues a grant by asserting the bus grant line

b) When the device physically closest to the arbiter sees the grant, it checks to see if it has made a request. If so, it takes over the bus but does not propagate the grant further down the line. If it has not made a request, it propagates the grant to the next device in line, which behave the same way, and so on..

· There could be several access levels of access priorities.

· Decentralized bus arbitration is possible, see Figure 55:

– As an example, decentralized daisy chaining (same behavior as the centralized case, but cheaper and easier to implement).

a) Uses 3 bus lines, no matter how many devices are present. 1st line: a wired OR line for requesting the bus, 2nd line (called Busy) : asserted by the current bus master, 3rd line: used to arbitrate the bus, daisy chained through all the devices.

b) When no device wants the bus, the asserted arbitration line is propagated through all devices. To acquire the bus, a device first checks if the bus is idle and IN is asserted. If IN is negated, it does not become the bus master and it negates OUT. However, if IN is asserted, the device negates OUT, which causes its downstream neighbors to negate both IN and its OUT. So now the device becomes the bus master, asserts BUSY and OUT, and begins its transfer.

29

Figure 56: Block operations on buses.

Block operations

· Block read/writes are used in some cases, e.g. when using caching, as illustrated in Figure 56.

a) When a block read is started, the bus master tells the slave how many words are to be transferred, for example, by putting the word count on the data lines during . Instead of just returning one word, the slave outputs one word during each cycle until the count has been exhausted.

b) is asserted to indicate that a block transfer is requested.

Handling interrupts

· When the CPU commands an I/O device to do something, it expects back an interrupt. The interrupt signaling requires the bus.

· Since multiple devices may want to cause an interrupt simultaneously, the same kind of arbitration problems are present solution: assigns priorities to devices and use a centralized arbiter to give priority to the most time-critical devices.

· Interrupt controllers are chips that arbiter the interrupts.

· Figure 57 illustrates the 8259A controller.

· Up to 8 devices can be connected to the 8259A controller.

· More can be connected using multiple controlled cascaded.

· The procedures for handling interrupts are stored in hardware tables - interrupt vectors.

30

Figure 57: The 8259 interrupt controller.

3.5 Example CPUs

31

Operation	Clock cycles
CPU instructions	13
Register access	1
Cache access	1
Main memory access	10
Disk seek time	10,000,000