[Biojava-l] beginners bug fileToBiojava problem with msf format

Keith James kdj at sanger.ac.uk
Wed Feb 4 10:14:07 EST 2004


>>>>> "Dan" == Dan Bolser <dmb at mrc-dunn.cam.ac.uk> writes:

    Dan> Hello, I am a new user to biojava (and almost new to java).

    Dan> The following code works fine reading a 'FASTA' format file,
    Dan> but causes an error reading 'MSF' format...

[...]

   Dan> --- Exception in thread "main"
    Dan> java.lang.IllegalArgumentException: No alphabet was set in
    Dan> the identifier at
    Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:801)
    Dan> at
    Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:787)
    Dan> at
    Dan> ReadAlignMakeDistribution.main(ReadAlignMakeDistribution.java:60)

The exception message here is referring to the integer identifier
which biojava has for every known combination of file-format (fasta,
genbank, embl) and alphabet-type (dna, rna, protein). The way these
are created/interpreted is documented in SeqIOConstants (for the
sequence formats) and AlignIOConstants (for the alignment
formats). All the common ones exist as static int fields so that you
can compare using == or use them in switches.

The format guessing code (in SeqIOTools.identifyFormat) appears to be
missing "msf" and "clustal". This is a bug - I'll fix it today. The
result is that it guesses SeqIOConstants.UNKNOWN as the format
identifier (which has no alphabet set - hence the message).

The public method fileToBiojava(int fileType, BufferedReader br)
should work if you pass it the value AlignIOConstants.MSF_AA

Keith

-- 

- Keith James <kdj at sanger.ac.uk> Microarray Facility, Team 65 -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -


More information about the Biojava-l mailing list