[Bioperl-l] alignIO::fasta bug

Bernd Web bernd.web at gmail.com
Mon Jan 26 10:24:41 UTC 2009


Regarding the symbols, I change the variable with allowed symbols in my script:
$Bio::PrimarySeq::MATCHPATTERN = 'A-Za-z\-\.\*\?=~:';

This works fine; if Iihave an unusual gap symbol i can just add it to
this var. For transparency it might be nice to have a methods for
setting allowed symbols, of better allowed gap symbols.

I also needed to change Bio::LocatableSeq::_ungapped_len to include
the same gap symbols. SimpleAlign (sub slice) deletes all non-word
characters from the string, but LocatableSeq does not. This caused
SimpleAlign to crash after slicing an alignment. E.g. it looked for a
sequence with end 0, whereas end had become 17 in LocatableSeq (since
i used a non-standard gap symbol).  LocatableSeq always calculates the
end (sub end) and returns a different end due to the difference in
treating the allowed/gap symbols, when slicing an alignment.

SimpleAlign slice uses: 		$slice_seq =~ s/\W//g;
LocatableSeq, _ungapped_len uses:     $string =~ s/[\.\-]+//g;


>I would like to hammer out the specifics on how to deal with various symbols, how >we extract a subsequence via subseq(), etc.

More information about the Bioperl-l mailing list