[Biojava-dev] http://bugzilla.open-bio.org/show_bug.cgi?id=2311

Richard Holland holland at ebi.ac.uk
Thu Jun 7 08:22:15 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The PHYLIPFileFormat code already does that. The problem was that the
exception it catches to determine non-DNA-ness was not happening because
DNATools was not throwing it.

There is no easy way to determine a sequence file's type short of
reading the whole file in and scanning through it in advance to spot
symbol combinations that are unique to a particular alphabet, then
repeating the read of the file to do the actual parsing. As BioJava uses
mostly stream-based parsers which don't expect to be able to repeatedly
read the same data, they have to rely on other methods. They make good
guesses wherever they can but obviously they don't always get that guess
right.

Wherever the API allows it is always a good idea to specify the type of
sequence in advance. PHYLIPFileFormat is not an API that allows this
though, although there's nothing saying it couldn't be modified to do so
by some willing volunteer! :)

cheers,
Richard

Felipe Albrecht wrote:
> Hello,
> 
> about http://bugzilla.open-bio.org/show_bug.cgi?id=2311
> 
> isnt better check the sequence before create the DNA/RNA/Protein sequence?
> That is: dont wait for an exception, but view what is the type.
> 
> Felipe Albrecht
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZ8A24C5LeMEKA/QRAginAJ40bL5YJl4pUgmRb1yBf9jDXz72bQCbBssO
k24Us468mhs8wCx/f/gnU68=
=4xYe
-----END PGP SIGNATURE-----



More information about the biojava-dev mailing list