[Biojava-l] DNA sequence alignment - Percent Identity

Katerina Stillou kstillou at gmail.com
Tue May 18 19:07:53 UTC 2010


Hello,
I am fairly new to Biojava and I have recently encountered a problem
concerning the results of the method pairwiseAlignment. It is my
impression, and please do correct me if I am wrong, that the only results I
can get from this class are:
Time (ms):
Length:
Score:
Query:        query,        Length:
Target:        target,        Length:
followed by the alignment itself.

What is more, this result is in a String format so I have to use some string
manipulation methods in Java to extract each value, apart from the score
which is the value returned from the call of the pairwiseAlignment method.

However, what I am really interested in, is to find the percent identity of
the two sequences. Therefore, I would be grateful to anyone that could point
out a way to compute this percentage by using the data returned from the
alignment. From what I have gathered by searching through the internet is
that I need at least one of these: # of identical positions, # of aligned
positions. Is it possible that the number of identical positions is the
total number of   " | " in the result of getAlignmentString()? Yeap, I am
really confused.

Some more information on my code: I am using the exact code presented in the
Biojava Cookbook for global alignment with the NUC.4.4 substitution matrix.

Thanks in advance,
Katerina



More information about the Biojava-l mailing list