[Biojava-l] beginners bug fileToBiojava problem with msf format

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Wed Feb 4 10:37:36 EST 2004


On 4 Feb 2004, Keith James wrote:

> >>>>> "Dan" == Dan Bolser <dmb at mrc-dunn.cam.ac.uk> writes:
> 
>     Dan> Hello, I am a new user to biojava (and almost new to java).
> 
>     Dan> The following code works fine reading a 'FASTA' format file,
>     Dan> but causes an error reading 'MSF' format...
> 
> [...]
> 
>    Dan> --- Exception in thread "main"
>     Dan> java.lang.IllegalArgumentException: No alphabet was set in
>     Dan> the identifier at
>     Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:801)
>     Dan> at
>     Dan> org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:787)
>     Dan> at
>     Dan> ReadAlignMakeDistribution.main(ReadAlignMakeDistribution.java:60)
> 
> The exception message here is referring to the integer identifier
> which biojava has for every known combination of file-format (fasta,
> genbank, embl) and alphabet-type (dna, rna, protein). The way these
> are created/interpreted is documented in SeqIOConstants (for the
> sequence formats) and AlignIOConstants (for the alignment
> formats). All the common ones exist as static int fields so that you
> can compare using == or use them in switches.
> 
> The format guessing code (in SeqIOTools.identifyFormat) appears to be
> missing "msf" and "clustal". This is a bug - I'll fix it today. The
> result is that it guesses SeqIOConstants.UNKNOWN as the format
> identifier (which has no alphabet set - hence the message).
> 
> The public method fileToBiojava(int fileType, BufferedReader br)
> should work if you pass it the value AlignIOConstants.MSF_AA

Just for the record, I see two forms of the fileToBiojava function...

fileToBiojava(
  int fileType, 
  java.io.BufferedReader br
);
fileToBiojava(
  java.lang.String formatName, 
  java.lang.String alphabetName, 
  java.io.BufferedReader br
);

I was using the second form which sould not have to guess the format
(perhaps I misunderstand what you said above). Aditionaly, I explicitly
pass an alphabet name... Why are formats linked to alphabets? "...
SeqIOConstants.UNKNOWN as the format identifier (which has no alphabet
set...".

Please let me know if I am terminally confused

Cheers,
Dan.

> 
> Keith
> 
> 



More information about the Biojava-l mailing list