[Bioperl-l] alignIO::fasta bug
bernd.web at gmail.com
Mon Jan 26 10:24:41 UTC 2009
Regarding the symbols, I change the variable with allowed symbols in my script:
$Bio::PrimarySeq::MATCHPATTERN = 'A-Za-z\-\.\*\?=~:';
This works fine; if Iihave an unusual gap symbol i can just add it to
this var. For transparency it might be nice to have a methods for
setting allowed symbols, of better allowed gap symbols.
I also needed to change Bio::LocatableSeq::_ungapped_len to include
the same gap symbols. SimpleAlign (sub slice) deletes all non-word
characters from the string, but LocatableSeq does not. This caused
SimpleAlign to crash after slicing an alignment. E.g. it looked for a
sequence with end 0, whereas end had become 17 in LocatableSeq (since
i used a non-standard gap symbol). LocatableSeq always calculates the
end (sub end) and returns a different end due to the difference in
treating the allowed/gap symbols, when slicing an alignment.
SimpleAlign slice uses: $slice_seq =~ s/\W//g;
LocatableSeq, _ungapped_len uses: $string =~ s/[\.\-]+//g;
>I would like to hammer out the specifics on how to deal with various symbols, how >we extract a subsequence via subseq(), etc.
More information about the Bioperl-l