[Bioperl-l] alignIO::fasta bug
Chris Fields
cjfields at illinois.edu
Mon Jan 26 13:37:00 UTC 2009
On Jan 26, 2009, at 4:24 AM, Bernd Web wrote:
> Hi,
>
> Regarding the symbols, I change the variable with allowed symbols in
> my script:
> $Bio::PrimarySeq::MATCHPATTERN = 'A-Za-z\-\.\*\?=~:';
>
> This works fine; if Iihave an unusual gap symbol i can just add it to
> this var. For transparency it might be nice to have a methods for
> setting allowed symbols, of better allowed gap symbols.
PrimarySeqs shouldn't have a way to define gaps (no start/end);
LocatableSeqs (on the other hand) have the global $GAP_SYMBOLS. But
see here for caveats:
http://bugzilla.open-bio.org/show_bug.cgi?id=2715
> I also needed to change Bio::LocatableSeq::_ungapped_len to include
> the same gap symbols. SimpleAlign (sub slice) deletes all non-word
> characters from the string, but LocatableSeq does not. This caused
> SimpleAlign to crash after slicing an alignment. E.g. it looked for a
> sequence with end 0, whereas end had become 17 in LocatableSeq (since
> i used a non-standard gap symbol). LocatableSeq always calculates the
> end (sub end) and returns a different end due to the difference in
> treating the allowed/gap symbols, when slicing an alignment.
>
> SimpleAlign slice uses: $slice_seq =~ s/\W//g;
> LocatableSeq, _ungapped_len uses: $string =~ s/[\.\-]+//g;
>
> Regards,
> Benrd
This behavior stems from various problems within both LcatableSeq and
SimpleAlign, nothing that isn't fixable per se. If anything
SimpleAlign::slice shouldn't role it's own way of determining the
ungapped length. A bug report with the problems you are seeing would
help tremendously.
chris
More information about the Bioperl-l
mailing list