[Bioperl-l] Bio::SeqIO::gcg bug
Stefan Kirov
stefan.kirov at bms.com
Tue Nov 7 15:54:06 UTC 2006
Bio::SeqIO::gcg is checking the checksum against the GCG generated one.
There is a problem with the way this is done:
1. Bio::SeqIO::gcg removes all characters, except [A-Za-z] (which by the
way is always wrong).
2. GCG calculates the checksum on uppercase
I assume Hilmar removed the $_ = uc($_); line for a very good reason,
but the call to validate should be:
_validate_checksum(uc($sequence),$chksum))
Also I believe the regexp for checking the alphabet should remove
explicitly numbers and whitespaces. Removing everything else is not a
good idea because gaps, end of translation are removed also and possible
parsing errors might be suppressed incorrectly.
Let me know if I am missing some other considerations here. If not I
will commit these changes.
Stefan
More information about the Bioperl-l
mailing list