Introduction to Parallel Computer Architecture Gaussian Elimination using OpenMP

profileRock1027
gaussian_elimination.pdf

Introduction to Parallel Computer Architecture

Gaussian Elimination using OpenMP

Instructor: Prof. Naga Kandasamy, ECE Department, Drexel University

January 24, 2018

The assignment is due on 9 February by 11:59 pm. You may work on this assignment in a team of

up to two people.

Consider the problem of solving a system of linear equations of the form

a0,0x0 + a0,1x1 + · · · + a0,n−1xn−1 = b0, a1,0x0 + a1,1x1 + · · · + a1,n−1xn−1 = b1,

. . . .

. . . .

an−1,0x0 + an−1,1x1 + · · · + an−1,n−1xn−1 = bn−1.

In matrix notation, the above system is written as Ax = b where A is a dense n × n matrix of coefficients such that A[i, j] = ai,j, b is an n × 1 vector [b0, b1, . . . , bn−1]

T , and x is the desired solution vector [x0, x1, . . . , xn−1]

T . From here on, we will denote the matrix elements ai,j and xi by A[i, j] and x[i], respectively. A system of equations Ax = b is usually solved in two stages. First, through a set of algebraic manipulations, the original system of equations is reduced to an

upper triangular system of the form

x0 + u0,1x1 + u0,2x2 + · · · + u0,n−1xn−1 = y0, x1 + u1,2x2 + · · · + u1,n−1xn−1 = y1,

. .

. .

xn−1 = yn−1.

We write the above system as Ux = y, where U is a unit upper-triangular matrix, that is, one where the subdiagonal entries are zero and all principal diagonal entries are equal to one. More formally,

U[i, j] = 0 if i > j, otherwise U[i, j] = ui,j, and furthermore, U[i, i] = 1 for 0 ≤ i < n. In the second stage of solving a system of linear equations, the upper-triangular system is solved for the

variables in reverse order, from x[n − 1] to x[0] using a procedure called back-substitution.

1

1: procedure GAUSS ELIMINATE(A, b, y) 2: int i, j, k; 3: for k := 0 to n − 1 do 4: for j := k + 1 to n − 1 do 5: A[k, j] := A[k, j]/A[k, k]; /* Division step. */ 6: end for

7: y[k] := b[k]/A[k, k]; 8: A[k, k] := 1; 9: for i := k + 1 to n − 1 do

10: for j := k + 1 to n − 1 do 11: A[i, j] := A[i, j] - A[i, k] × A[k, j]; /* Elimination step. */ 12: end for

13: b[i] := b[i] − A[i, k] × y[k]; 14: A[i, k] := 0; 15: end for

16: end for

A serial implementation of a simple Gaussian elimination algorithm is shown above. The algorithm

converts the system of linear equations Ax = b into a unit upper-triangular system Ux = y. We assume that the matrix U shares storage with A and overwrites the upper-triangular portion of A. So, the element A[k, j] computed in line 5 of the code is actually U[k, j]. Similarly, the element A[k, k] that is equated to 1 in line 8 is U[k, k]. Also, our program assumes that A[k, k] 6= 0 when it is used as a divisor in lines 5 and 7. So, our implementation is numerically unstable, though

it should not be a concern for this assignment. For k ranging from 0 to n − 1, the Gaussian elimination procedure systematically eliminates the variable x[k] from equations k + 1 to n − 1 so that the matrix of coefficients becomes upper-triangular. In the kth iteration of the outer loop (line 3), an appropriate multiple of the kth equation is subtracted from each of the equations k + 1 to n − 1.

This problem asks you to develop a parallel formulation of GAUSS ELIMINATE using OpenMP.

The program given to you accepts no arguments. The CPU computes the reference (or sin-

gle threaded) solution which is compared with the result provided by the multi-threaded imple-

mentation. If the solutions match within the specified tolerance, the application will print out

“Test PASSED” to the screen before exiting. Edit the gauss eliminate using openmp()

function in gauss eliminate.c to complete the functionality of Gaussian elimination using

OpenMP. Build the code as follows:

Submit all of the files needed to run your code as a single zip file via BBLearn. Also include

a brief report describing how you designed your multi-threaded code, using code or pseudocode

to help the discussion, and the amount of speedup obtained over the serial version for 2, 4, 8,

and 16 threads, for the following matrix sizes: 1024 × 1024, 2048 × 2048, 4096 × 4096, and 8192 × 8192.

2