Help
Sequence Alignment
Overview
In this project, you and your partner will create a multi-threaded version of an non-trivial sequential program using pthreads and written in C or C++.
https://computing.llnl.gov/tutorials/pthreads/
Details
Sequence alignment is an important problem in computational biology. Although there are many solutions, the solution that you are provided with using dynamic programming. You are provided with a program that already computes the dynamic programming matrix for two sequences (i.e. pairwise sequence alignment). Your task is to create a similar program that uses threads, specifically pthreads.
http://en.wikipedia.org/wiki/Sequence_alignment
Example
Assume that you want to align the two sequences DVL and DAV, where each letter represents a amino acid. The following similarity matrix provides information on how similar two amino acids are. Note that leucine and valine are similar to each other.
|
Similarity Matrix for the Two Sequences |
|||||
|
|
|
A |
D |
L |
V |
|
A |
alanine |
16 |
0 |
0 |
0 |
|
D |
aspartic acid |
0 |
20 |
0 |
0 |
|
L |
leucine |
0 |
0 |
20 |
12 |
|
V |
valine |
0 |
0 |
12 |
20 |
The alignment of the two sequences is determined using a dynamic programming matrix where each sequence in an axis. As part of the alignment, gaps can be inserted into a sequence, but at a penalty. The values for the cells are computed using the following formula:
DPM[x][y] = max(DPM[x][y-1] - gap penalty, DPM[x-1][y] - gap penalty, DPM[x-1][y-1] + similarity(x,y))
The dynamic programming matrix for this alignment looks like the following (using a -10 penalty for gaps):
|
DPM for aligning the sequences |
||||
|
|
gap |
D |
A |
V |
|
gap |
0 |
-10 |
-20 |
-30 |
|
D |
-10 |
20 |
10 |
0 |
|
V |
-20 |
10 |
20 |
30 |
|
L |
-30 |
0 |
10 |
32 |
From this DPM, the alignment can be found by backtracking through the matrix. (This last step is not part of the assignment, just the generation of the matrix).
Instructions
1. Read the following sections of the textbook to make sure you have the needed background, in case the material has not yet been covered in class.
http://www.engcomputacaopucgo.com/arquivos/Materiais/Sistemas%20Operacionais/OSEssentials.pdf
· Section 4.3.1 (Pthreads)
· Section 6.2 (The Critical Section Problem)
· Sections 6.6.1 & 6.6.2 (Bounded Buffer and Readers-Writers Problems)
· Section 6.8.4 (Synchronization in Pthreads)
· Sign up for a project group on Canvas
1. Click on People in the navigation bar.
2. Click on View User Groups on the right-hand side.
3. Scroll down until you find the Seq. Align Project Group entries.
4. Click on the group you want to join.
· Download the project files
· Extract the project files
· Examine the program. There are some general comments, but the code is intentionally not well commented to make you think about how to solve the problem in a parallel fashion. This program is more complex than it needs to be, but gives you significant help in creating a parallel solution.
· Run the command make solutions and make all. These will create the executables, including the parallel version. However, the parallel version has no implementation for the actual computation.
Tips
1. If you have any questions or get stuck for a long period of time, immediately let the instructor know. You can easily spend a lot of time spinning your wheels on this project.
2. This is not a project that you want to start the night before the due date. That will give you no time to get help if you need it.
Project Files
seqAlignProject.tar.gz
(Eclipse project)
sequenceAlignmentcpp.tar.gz
(Eclipse project)
Grading
Grading of this assignment is in two parts. One part is the C/C++ source code file that contains the multi-threaded implementation that solves the problem (seqAlign_thread.c for C,seqAlign_thread.cpp for C++). The second part is a demonstration of your code where you and your partner will have a maximum of 10 minutes to explain your solution (or as far as you got with it). The demonstrations will be scheduled for Feb 4 both in and outside of lab time. Absent partners will assume to have not contributed and recieve a 0.
Rubric
Seq Align Project