[Biojava-l] beginners bug fileToBiojava problem with msf format
Dan Bolser
dmb at mrc-dunn.cam.ac.uk
Wed Feb 4 10:37:36 EST 2004
On 4 Feb 2004, Keith James wrote:
> >>>>> "Dan" == Dan Bolser <dmb at mrc-dunn.cam.ac.uk> writes:
>
> Dan> Hello, I am a new user to biojava (and almost new to java).
>
> Dan> The following code works fine reading a 'FASTA' format file,
> Dan> but causes an error reading 'MSF' format...
>
> [...]
>
> Dan> --- Exception in thread "main"
> Dan> java.lang.IllegalArgumentException: No alphabet was set in
> Dan> the identifier at
> Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:801)
> Dan> at
> Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:787)
> Dan> at
> Dan> ReadAlignMakeDistribution.main(ReadAlignMakeDistribution.java:60)
>
> The exception message here is referring to the integer identifier
> which biojava has for every known combination of file-format (fasta,
> genbank, embl) and alphabet-type (dna, rna, protein). The way these
> are created/interpreted is documented in SeqIOConstants (for the
> sequence formats) and AlignIOConstants (for the alignment
> formats). All the common ones exist as static int fields so that you
> can compare using == or use them in switches.
>
> The format guessing code (in SeqIOTools.identifyFormat) appears to be
> missing "msf" and "clustal". This is a bug - I'll fix it today. The
> result is that it guesses SeqIOConstants.UNKNOWN as the format
> identifier (which has no alphabet set - hence the message).
>
> The public method fileToBiojava(int fileType, BufferedReader br)
> should work if you pass it the value AlignIOConstants.MSF_AA
Just for the record, I see two forms of the fileToBiojava function...
fileToBiojava(
int fileType,
java.io.BufferedReader br
);
fileToBiojava(
java.lang.String formatName,
java.lang.String alphabetName,
java.io.BufferedReader br
);
I was using the second form which sould not have to guess the format
(perhaps I misunderstand what you said above). Aditionaly, I explicitly
pass an alphabet name... Why are formats linked to alphabets? "...
SeqIOConstants.UNKNOWN as the format identifier (which has no alphabet
set...".
Please let me know if I am terminally confused
Cheers,
Dan.
>
> Keith
>
>
More information about the Biojava-l
mailing list