Computer Architecture Reflective Journal Report

Sanish123

CALectureWeek5-Lec.ppt

Home >Computer Science homework help >Computer Architecture Reflective Journal Report

Kent Institute Australia Pty. Ltd.

ABN 49 003 577 302 CRICOS Code: 00161E
RTO Code: 90458 TEQSA Provider Number: PRV12051

CARC103 – Computer Architecture

Prescribed Text

Bird, S. D. (2017), Systems Architecture, 7th ed, Cengage Learning

Systems Architecture,
Seventh Edition

Chapter 6

System Integration and Performance

Systems Architecture, Seventh Edition

Chapter Objectives

In this chapter, you will learn to:
Describe the system and subsidiary buses and bus protocol
Describe how the CPU and bus interact with peripheral devices
Describe the purpose and function of device controllers
Describe how interrupt processing coordinates the CPU with secondary storage and I/O devices
Describe how buffers and caches improve computer system performance

Systems Architecture, Seventh Edition

FIGURE 6.1 Topics covered in this chapter

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

System Bus

A bus is a shared electrical or optical channel that connects two or more devices
The system bus is the communication channel that connects the CPU, primary storage, and peripheral devices such as secondary storage, network, and video controllers

FIGURE 6.2 The system bus and attached devices

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

System Bus Subsets

The system bus of a typical computer system is logically or physically divided into subchannels, each with a specific purpose:
Address bus – carries bits of a memory address (which identify the source or destination of a bus transfer)
Data bus – carries bits of a data item being transferred to/from memory or the CPU
Control bus – carries bits of commands, responses, status codes, and similar signals
Power bus – routes electrical power (voltage and ground) to attached devices
Multiple voltages are usually supported and many ground lines are provided to improve electrical signal stability

Systems Architecture, Seventh Edition

Bus Clock Rate

Like the CPU a bus has a clock rate:
For a parallel bus (e.g., PCI-X), the rate is much lower than the CPU (e.g., 66 MHz)
For a serial bus (e.g., PCIe), the rate is similar to the CPU (e.g., 2.5 GHz) though only a few bits are transmitted per cycle
One clock cycle = one transfer of command information, data, or both
The send activates (sends) data/command(s) for one cycle
The receiver has one cycle to recognize the message and “read” its content (typically transferring it to some internal buffer)
The clock is a common timing reference that coordinates the activities of sending and receiving devices
As with CPU clock rates, cycle time is computed as:

Systems Architecture, Seventh Edition

Bus Data Transfer Rate

The data transfer rate of a bus is computed as:

Bus DTR = data transfer unit size × clock rate

For older an older parallel bus, slow clock rate is partly mitigated by a relatively large data transfer unit size, for example:

Bus DTR = 64 bits × 66 MHz

= 8 bytes × 66,000,000

= 528,000,000 bytes per second

= 528 MBps

For a newer serial bus, higher clock rates look good but data transfer unit size is low

Bus DTR = 1 bit × 2.5 GHz

= 1 bit × 2,500,000,000

= 2,500,000,000 bits per second

= 312.5 MBps

Data transfer rate can only be increased by:
Increasing clock rate
Increasing number of bits transmitted per clock cycle (data transfer unit size)
Using a more efficient bus protocol

Systems Architecture, Seventh Edition

Bus Protocol

The bus protocol is the “language” of the bus:
Which lines are used for what purposes?
What are the electrical or optical characteristics of each signal?
How are bit values encoded in a line?
How are commands, errors, and status information encoded in bit values on the control bus?
Which device(s) get to use the bus when? How do they “ask” for access? For how long do they have access?
Bus protocols vary in their efficiency, for example
Consider a bus in which a command is transferred in one cycle and the data is transferred in the next
Effective data transfer rate is half that computed previously because only half the cycles carry “real” data
Efficient protocols tend to be more complex:
Require a more complex command structure and thus more control bus lines
Require more lines to carry data, address, and command information all at the same time
Bus and all devices that interface with it are more expensive

Systems Architecture, Seventh Edition

Bus Masters

Two approaches to controlling bus access:
Master-slave bus
Bus master (CPU or a designated device) controls all bus transfers
All other devices are bus slaves
Simple but relatively inefficient because the bus master is a bottleneck
Peer-to-peer bus
Any device can temporarily become a bus master
Requires a protocol for “passing around” the bus master role and a prioritization scheme for competing requests to be bus master

Systems Architecture, Seventh Edition

Bus Memory Access

The data transfer rate mismatch between the system bus and the CPU is usually two orders of magnitude or more – relatively very slow comparing to CPU speed:
Direct interface would waste many CPU cycles
To improve performance:
Data transfers to/from the CPU go “through” memory – commonly called direct memory access (DMA)
CPU becomes a bus master only when absolutely necessary

Systems Architecture, Seventh Edition

Multiple Buses

In the “bad old days” there was only the system bus and all devices were connected to it
Computer system, bus, and connected devices were relatively simple
Inefficiencies resulted because system-wide performance was limited by the slowest devices
In the modern world there are multiple buses for different purposes, each optimized for a particular purpose or kind of data communication
The system bus still connects many of the devices in the system (e.g., network, USB, keyboard, …)
Separate bus connects CPU to memory (e.g., Intel’s Front-Side Bus)
High clock rate (shorter distance)
Supports two-way communication
Separate bus for video (AGP video card slot on PCs, ±2005-2010)
Connects memory to video controller
High clock rate (shorter distance)
Data transfer is one-way
Other separate buses
Secondary storage (e.g., SCSI and SATA)
External devices (USB and Firewire)

Thinking about stream lines to serve customers

in a super market – to speed up services

North and South bridges of a motherboard explained:

https://srgtech78.wordpress.com/2019/01/25/north-and-south-bridges-of-a-motherboard-explained/

North Bridge

South Bridge

Systems Architecture, Seventh Edition

Device Access

An input/output (I/O) port is:
A communication pathway between the CPU and a peripheral device
A set of memory addresses through which data is passed between the CPU and one peripheral device
A logical abstraction that enables the CPU to interact with all devices using the same protocol
CPU interface to bus devices is simpler, thus more efficient and faster
New devices can be incorporated into an existing computer
Install device into available bus slot
Assign one or more I/O ports
Add an appropriate device driver to the operating system

Systems Architecture, Seventh Edition

Logical and Physical Device Accesses

The CPU interacts with each device as though it were a storage device with sequentially organized storage locations (e.g., a magnetic tape)
The storage locations of the hypothetical device is a linear address space – a sequence of storage locations number starting with zero
A read (device to CPU) or write (CPU to device) operation is called a logical access
The device or a device controller translates logical accesses into physical accesses
The translation specifics vary depending on the device specifics

Systems Architecture, Seventh Edition

Logical and Physical Accesses - Continued

Sample logical/physical translations:
Keyboard – no write operations, read transfers a few bytes representing the last keystroke
Video
Assign addresses to pixel locations (e.g., top row is 0-1919, next row is 1920-3839, …)
Pixel update sends address followed by a few bytes describing the pixel content
Disk
Assign addresses to sectors, typically by sector within track, then recording surface within cylinder starting at the outer edge of the platters
Read/write sends sector address then 5412 bytes of data

Systems Architecture, Seventh Edition

Motherboard and Device Connectors

FIGURE 6.4 A typical desktop computer motherboard

Courtesy of Course Technology/Cengage Learning

North Bridge

South Bridge

Systems Architecture, Seventh Edition

Motherboard and Device Connectors

FIGURE 6.5 External I/O connectors (left-side view of Figure 6.4)

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

Logical and Physical Access (continued)

Logical access:
The device, or its controller, translates linear sector address into corresponding physical sector location on a specific track and platter

FIGURE 6.6 An example of assigning logical sector numbers to physical sectors on disk platters

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

Device Controllers

Peripheral devices are connected to a bus via a device controller
Device controller implements:
Bus protocol
Translation between bus and device protocols
Translation between logical and physical accesses and addresses

FIGURE 6.7 Secondary storage and I/O device connections using device controllers

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

Device Controllers - Continued

A device controller can connect multiple devices to a single bus slot
Reduces number of bus slots and bus length
Enables higher bus clock rates
Combines the smaller I/O capacity of multiple devices to match the larger capacity of the bus
An I/O channel is a device controller “on steroids”
Typically found only in mainframes
Implemented as a special purpose computer
Connects dozens of physical devices to a single bus port
Can support a variety of different device types

Systems Architecture, Seventh Edition

Interrupts

The CPU would incur many I/O wait states if it waited for peripheral devices to complete tasks
To prevent I/O wait states the CPU:
Issues a command to a peripheral device
Does something else until the device “signals” that it has completed the task
Returns its “attention” to the device
An interrupt is a signal to the CPU that some event has occurred that requires its attention, for example:
Storage device has completed a read command
User pressed a key or clicked a mouse
Packet arrived from the network
UPS switched to backup power
A processing error occurred (e.g., overflow)

“You wasted my time”, said the CPU

Systems Architecture, Seventh Edition

Interrupts - Continued

The CPU has a hardwired mechanism for recognizing and processing interrupts:
Control bus carries interrupt signals
Interrupt value is an unsigned integer called an interrupt code transmitted across the bus
CPU continuously monitors control bus for interrupt signals (occurs independently of other CPU activity)
When an interrupt is detected it is automatically stored in the interrupt register
CPU checks the interrupt register after every execute cycle
If interrupt register is filled, the CPU branches to the operating system supervisor

Systems Architecture, Seventh Edition

Interrupt Handling

The supervisor extracts the interrupt code from the interrupt register and uses that value as an index into an interrupt table
The interrupt table matches interrupt codes with memory addresses of programs that “handle” the interrupt
The supervisor resets the interrupt register to zero before branching to an interrupt handler
When interrupt handler completes its work, the operating system resumes execution of whatever program was executing when the interrupt was detected

Systems Architecture, Seventh Edition

Simple Interrupt Handling Example

Application program asks operating system to read from a file
Operating system suspends program and sends a read request to the disk controller
Operating system executes other program(s) while disk accesses the requested data
When read access is completed, the disk controller
Transfers data to designated memory address (I/O port)
Sends an interrupt to indicate data is ready
CPU detects interrupt and calls supervisor
Supervisor recognizes interrupt code as being from the disk controller and transfers control to appropriate interrupt handler
Interrupt handler transfers data from I/O port to application program memory area
Interrupt handler exits back to supervisor
Supervisor starts application program at next instruction after the read request

Systems Architecture, Seventh Edition

Multiple Interrupts

What happens when an interrupt occurs while another interrupt is being processed?
Interrupts are usually prioritized by interrupt code
Lower-numbered interrupts have higher priority
Error conditions normally have highest priority
I/O events normally have “middle” priority
Program service requests usually have lowest priority
A lower-priority interrupt that occurs while processing a higher-priority interrupt is held until the higher-priority processing is completed
A higher-priority interrupt that arrives while processing a lower-priority interrupt results in suspension of processing of the lower-priority interrupt.

Systems Architecture, Seventh Edition

Stack

All programs use general- and special-purpose registers
When a program is suspended (e.g., during interrupt processing) the values in the registers just before suspension represent its current state
While suspended, other programs may execute and may overwrite register values
How are a suspended program’s register values “restored” when it resumes execution?
A stack is a reserved primary storage area that:
Holds register values of suspended programs
Is accessed on a last-in first-out (LIFO) basis via two instructions:
PUSH – copy all current register values to the “top” of the stack
POP – move values from the top of the stack back to registers
Set of register values copied/moved is called the machine state
The stack pointer is special purpose register that always points to the memory address at the “top” of the stack
Incremented after a PUSH
Decremented after a POP

Systems Architecture, Seventh Edition

Interrupt Handling With PUSH/POP

FIGURE 6.8 Interrupt processing

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

Buffering and Caching

Buffering and caching are both techniques that use “extra” primary storage to improve performance
Buffer – a small storage area that holds data in transit from one device or location to another
Cache – a “fast” storage area used to improve the performance of read/write accesses to a slower storage device
Though both use RAM, they use it in different amounts and different ways to achieve performance improvements

Systems Architecture, Seventh Edition

Buffers

The source and destination of an I/O operation typically different in two important respects:
Speed of data transfer
Data transfer unit size
A buffer can improve the efficiency of the faster device during an I/O operation by:
Reducing the number of interruptions
Reducing processing overhead for the interruptions
A buffer is generally required when data unit transfer sizes differ
Data flows through a buffer:
The sending device adds data to the buffer, gradually filling it
The receiving device (or buffer manager) consumes data in the buffer, gradually emptying it
The content of the buffer rises and falls depending on the timing and relative speed of addition and consumption

eg. keyboard

Systems Architecture, Seventh Edition

Buffer Management and Overflow

Buffer contents must be managed to prevent:
Buffer overflow – a condition where:
All storage locations in the buffer have data and the receiving device has not yet processed the data
New data arrives and is either lost or overwrites other data in the buffer
Preventing buffer overflow requires a “hand-shaking” protocol (usually part of the bus protocol)
A device wanting to transfer data first “asks permission”
The receiving device (buffer manager) grants permission if there is empty space in the buffer
When the buffer is full (or nearly so) the receiving device sends a message asking the sending device to stop
When the receiving device has emptied enough buffer space it sends another message asking the sender to restart transmission

Systems Architecture, Seventh Edition

Cache

Cache – a “fast” storage area used to improve the performance of read/write accesses to a slower storage device
How does a cache differ from a buffer?
Reading from a cache doesn’t consume the data
Cache is used for bidirectional data transfer
Cache is used only for storage devices
A cache is usually much larger than a buffer
More “intelligence” is require to manage cache content
Cache-related performance improvements depend on:
Cache size
Cache management intelligence

Systems Architecture, Seventh Edition

Caching and Performance

Cache performance principles:
If the cache is faster than the underlying storage device then access to the cache (read or write) will be faster than access to the underlying storage device
For reading, can we guess what will be read next from the storage device and put it in the cache “ahead of time”?
For writing, can we (quickly) place the data in the cache and copy it to the storage device later?

Systems Architecture, Seventh Edition

Caching and Performance - Continued

Reading from a cache:
Data is read from the storage device into the cache in anticipation of future read requests
Data is requested from the receiving device
If the data is already in the cache is immediately transmitted

Writing to a cache:
Data is sent and written immediately to the cache
Write completion is acknowledged before data is written to the storage device
Data is copied from the cache to the storage device “later”

Systems Architecture, Seventh Edition

Cache Size

Surprising small cache sizes (relative to storage device size) can yield significant improvements
Typical ratios of storage device size to cache size range from 1,000:1 to 10,000:1, for example:
Typical PC primary storage cache
On-chip CPU cache: 2 MBytes
Installed RAM: 4 GBytes
Size ratio 2048:1
Typical server secondary storage cache
RAM dedicated to secondary storage cache: 4 GBytes
Secondary storage capacity: 20 Terabytes
Size ratio 5000:1

Systems Architecture, Seventh Edition

Primary Storage Caching

Terminology (assuming 3 cache levels exist):
L1 cache – integrated with each CPU’s control unit
L2 cache – implemented outside of but dedicated to a single CPU
L3 cache – shared among multiple CPU cores ona single chip
There’s lots of “extra space” on modern chips to implement very large SRAM L2 and L3 caches

Systems Architecture, Seventh Edition

Secondary Storage Caching

Through the 1990s secondary storage caches were typically implemented on the secondary storage controller with:
A special purpose processor as the cache controller
Installed memory chips or DIMMs
Since that time, secondary storage cache is usually implemented using:
“Extra” system RAM
Cache management by the OS using the CPU
Why the change?
Most computer systems have “extra CPU cycles”
System RAM is cheap and motherboards have been redesigned to accommodate lots of it
The operating system can best decide how to allocate RAM for performance improvement – secondary storage cache or other purposes – not possible if RAM is on device controller
The operating system is in the best position to make cache replacement decisions since it processes all secondary storage operations (via service layer calls) and thus “knows” their pattern(s)

Systems Architecture, Seventh Edition

Processing Parallelism

Many applications are too big for most/any single computer, for example:
Large-scale transaction processing
Data mining
Large-scale numerical simulations
Parallel processing:
Breaks large problems up into smaller pieces
Allocates problem pieces to multiple compute nodes
Reassembles piece-wise solutions
Common parallel processing architectures include:
Multicore microprocessors
Multi-CPU architectures
Multicomputer architectures (clusters and grids)

Systems Architecture, Seventh Edition

Multicore Microprocessors

A multicore microprocessor places:
Multiple processing cores (roughly comparable to a CPU) on a single chip
Usually includes one or more large (multi-megabyte) L3 primary storage caches
Multicore chips are a natural consequence of two current semiconductor fabrication realities:
Increasing difficulty shrinking process size and increasing clock rate
Better luck at increasing transistor count
The latest chips are 8-way, 12-way, and 15-way, but higher core counts are expected through the next decade
I/O data transfer capacity struggles to keep pace!

Systems Architecture, Seventh Edition

Multi-CPU Architecture

Multi-CPU architecture employs multiple microprocessors on a single motherboard sharing:
Primary storage
Secondary storage
System bus
I/O devices
Today, each “CPU” is usually a multicore microprocessor
This architecture is common in midrange and mainframe computers
It’s less common today in workstations as microprocessor core counts increase

Systems Architecture, Seventh Edition

Scaling Up vs. Scaling Out

Scaling up – increasing available computer power by employing powerful computers, for example
Multicore microprocessors
Multi-CPU architecture
Scaling out – increasing available computer power by employing multiple computers in parallel, for example:
Clusters
Grids
Scaling out is gradually (but not completely) supplanting scaling up:
The gap between network vs. “within computer” data transfer rates has narrowed, though it’s still several orders of magnitude
Software for managing multicomputer configurations has improved significantly – morphing into “cloud” computing
Organizational need for flexibility – need to be able to quickly redeploy computing resources

Systems Architecture, Seventh Edition

Compression

Compression - a technique that reduces the number of bits used to encode a set of data items (e.g., a file or a stream of motion video images):
Compression algorithm – computational technique for reducing data size
Decompression algorithm – computational technique for reversing compression algorithm
Compression ratio – Ratio of uncompressed to compressed data size (often an average)

Systems Architecture, Seventh Edition

Lossy and Lossless Compression

Lossless compression – Compression followed by decompression exactly reproduces original data
Generally used for data such as files containing programs or numeric data (e.g., ZIP files)
Lossy compression – Compressing then decompressing does not reproduce original data (common with audio and still/motion video)
Two common approaches to lossy compression:
Decompression algorithm isn’t a mirror of the compression algorithm (e.g., PDF)
There is no decompression algorithm (e.g., MP3) or the decompression algorithm is really just a decoding algorithm (e.g., video DVD)
With lossy compression, quality generally decreases as compression ratio increases, all other things being equal
For example, 64 Kbit/sec MP3 files sound worse than 256 Kbit/sec MP3 files
Quality loss can be somewhat offset by extra processing resources but only if a decompression algorithm is in use

Systems Architecture, Seventh Edition

Lossy Compression Example

Top picture is uncompressed
Lower picture is heavily compressed
Loss of detail in text
Blotchy color patches and jagged edge and shadow lines

FIGURE 6.16 A digital image before (top) and after (bottom) 20:1 JPEG compression

Courtesy of Course Technology/Cengage Learning

Systems Architecture, Seventh Edition

Compression - Continued

Rapid increases in processor performance relative to data communication and storage performance has made compression increasingly attractive:
MP3 and similar audio compression techniques
GIF and JPEG
Video DVDs
H.323 video-conferencing
Compressed tape formats
Automatic compression of “old” files on secondary storage

Systems Architecture, Seventh Edition

Summary

The CPU uses the system bus and device controllers to communicate with secondary storage and input/output devices
Hardware and software techniques for improving data efficiency, and thus, overall computer system performance
Bus protocols, interrupt processing, buffering, caching, and compression

kent.edu.au

Kent Institute Australia Pty. Ltd.
ABN 49 003 577 302 ● CRICOS Code: 00161E ● RTO Code: 90458 ● TEQSA Provider Number: PRV12051