Help in Bioinformatic

profileAhmmm888
the_code_of_the_global_alignment.docx

Chapter 3 St. Clair & Visick, continued

Tuesday, October 21

A global alignment program in Ruby. The program uses essentially the same steps to fill in the dynamic programming table that we used manually above. Tracing backwards from the bottom right to the top left is done the same way we did manually, but we use a stack to allow keeping multiple optimal paths separated. Each time a backlink is followed an entry is placed on top of the todo-list stack. If there are 2 or 3 backlinks leaving a particular cell then 2 or 3 entries are placed on the todo-list stack. In this manner the alignment path is forked. All the path threads are followed until they reach a cell with no backlinks -- the cell in the top left corner.

The code of global Alignment

# global.rb

# Global Alignment.

# Scoring constants:

Match = 1

Mu = 0

Sigma = -1

# Help:

Help = <<HELP_MESSAGE

global.rb

To align 2 nucleotide sequences using the Needleman-Wunsch algorithm.

The scoring parameters are simplified to:

Match = #{Match}, Mismatch = #{Mu}, Gap = #{Sigma}.

Usage: global.rb [-options] sequence_1 sequence_2

Options:

-d Print out the scoring matrix for debugging.

-h Print out this help message.

HELP_MESSAGE

# See if help is needed.

if ARGV.grep(/^-h/).size > 0 or

ARGV.size - ARGV.grep(/^-[hd]/).size != 2

print Help

exit

end

# Decide if the scoring matrix should be printed. and get the input sequences.

debug = ARGV.delete('-d')

last_col = ARGV[0].size # 1st argument across the top.

last_row = ARGV[1].size # 2nd argument down the left side.

# Pad the sequences to make labels for row and column 0.

seq = ARGV.map {|s| (" " + s).split("")}

# Prepare the dynamic programming table as 2 matrices, one for the score and

# one for the backlinks. Initialize each matrix with each row with just

# the 1st column filled in. Backlink codes t: top, l: left, d: diagonal.

score = (0..last_row).inject([]) {|s,e| s << [e * Sigma]}

backlink = (0..last_row).inject([]) {|b,e| b << [e == 0 ? "":'t']}

(1..last_col).each do |col| # Rest of top row.

score[0][col] = col * Sigma

backlink[0][col] = 'l'

end

(1..last_row).each do |row|

(1..last_col).each do |col|

up = score[row - 1][col] + Sigma

left = score[row][col - 1] + Sigma

delta = seq[1][row] == seq[0][col] ? Match : Mu

diag = score[row - 1][col - 1] + delta

score[row][col] = [up,left,diag].max

# Mark all scores equal to the maximum.

backlink[row][col] = ""

backlink[row][col] += 't' if score[row][col] == up

backlink[row][col] += 'l' if score[row][col] == left

backlink[row][col] += 'd' if score[row][col] == diag

end

end

# Display the dynamic programming table if the debug option was requested.

# Drawing table grid with characters

if debug

puts "Dynamic programming table:"

puts ' ' * 4 + '|' + seq[0].map{|l| " #{l} |"}.join

grid_line = "----+" + ('-' * 6 + '+') * seq[0].size

puts grid_line

(0..last_row).each do |row|

puts " |" + (0..last_col).map{|col|

"#{backlink[row][col].split("").include?('d') ? "\\" : " "} " +

"#{backlink[row][col].split("").include?('t') ? "^" : " "} |"}.join

puts " #{seq[1][row]} |" + (0..last_col).map{|col|

"#{backlink[row][col].split("").include?('l') ? "<" : " "} " +

"%3d |" % score[row][col]}.join

puts grid_line

end

end

# Find the optimal alignment paths starting from the lower right corner.

todo_list = [[last_row,last_col,"",""]] # Entry (row,col,string0,string1)

done_list = [] # Entry (str0,str1)

while !todo_list.empty?

row,col,str0,str1 = todo_list.pop

if !backlink[row][col].empty? # If some back-link exists.

todo_list.push([row - 1,col - 1,

seq[0][col] + str0,seq[1][row] + str1]

) if backlink[row][col].split("").include?('d')

todo_list.push([row - 1,col,

'_' + str0,seq[1][row] + str1]

) if backlink[row][col].split("").include?('t')

todo_list.push([row,col - 1,

seq[0][col] + str0,'_' + str1]

) if backlink[row][col].split("").include?('l')

else

done_list.push([str0,str1])

end

end

# Print each of the alignments in done_list.

i = 1

while !done_list.empty?

str0,str1 = done_list.pop

puts "Alignment #{i}\n#{str0}\n#{str1}"

i += 1

end

Yields:

$ ruby global.rb

global.rb

To align 2 nucleotide sequences using the Needleman-Wunsch algorithm.

The scoring parameters are simplified to:

Match = 1, Mismatch = 0, Gap = -1.

Usage: global.rb [-options] sequence_1 sequence_2

Options:

-d Print out the scoring matrix for debugging.

-h Print out this help message.

$ ruby global.rb -d cacgtat cgca

Here is the output

Dynamic programming table:

| | c | a | c | g | t | a | t |

----+------+------+------+------+------+------+------+------+

| | | | | | | | |

| 0 |< -1 |< -2 |< -3 |< -4 |< -5 |< -6 |< -7 |

----+------+------+------+------+------+------+------+------+

| ^ |\ | |\ | | | | |

c | -1 | 1 |< 0 |< -1 |< -2 |< -3 |< -4 |< -5 |

----+------+------+------+------+------+------+------+------+

| ^ | ^ |\ |\ |\ | | | |

g | -2 | 0 | 1 |< 0 | 0 |< -1 |< -2 |< -3 |

----+------+------+------+------+------+------+------+------+

| ^ |\ ^ |\ ^ |\ | |\ |\ |\ |

c | -3 | -1 | 0 | 2 |< 1 |< 0 |< -1 |< -2 |

----+------+------+------+------+------+------+------+------+

| ^ | ^ |\ | ^ |\ |\ |\ | |

a | -4 | -2 | 0 | 1 | 2 |< 1 | 1 |< 0 |

----+------+------+------+------+------+------+------+------+

Alignment 1

cacgtat

__cgca_

Alignment 2

cacgtat

c__gca_

Alignment 3

cacgtat

cgc__a_