[Biojava-l] TMTOWTDI in biojava?

mark.schreiber at group.novartis.com mark.schreiber at group.novartis.com
Wed Feb 4 20:57:45 EST 2004


Hi Dan -

It pretty much is a matter of taste. Assuming we have all the tests in 
place (which we may not) either method should be equal (at least in terms 
of results if not performace). The method you were using from SeqIOTools 
allows for a dynamic choice of format. Eg the user could supply "fasta" 
and "dna" as command line parameters or "genbank" "DNA" to the same 
program and it would figure out which to use.

If you use a specific hardcoded format your user may not be able to select 
the format at runtime. If this is a problem the SeqIOTools method is more 
flexible and therefore better. If it's not a problem use which ever one 
you want. From a code documentation point of view it might be more obvious 
what you are doing if you use the hardcoded version but it shouldn't 
matter.

- Mark

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
1 Science Park Road
#04-14 The Capricorn
Singapore 117528

phone +65 6722 2973
fax  +65 6722 2910





Dan Bolser <dmb at mrc-dunn.cam.ac.uk>
Sent by: biojava-l-bounces at portal.open-bio.org
02/04/2004 11:55 PM

 
        To:     Keith James <kdj at sanger.ac.uk>
        cc:     biojava-l at biojava.org
        Subject:        [Biojava-l] TMTOWTDI in biojava?



Hello, I found an alternate solution...

---
BufferedReader br = new BufferedReader(new FileReader(file));
MSFAlignmentFormat x = new MSFAlignmentFormat();
Alignment align = x.read( br );
---

(MSFAlignmentFormat.read( br ) didn't work)


Is this just a matter of taste? Are their preferred ways to do things, or
should we just do things any which way? Does functional overlap exists for
specific reasons, or for exactly this kind of flexibility?

As a new java programer, I am natrually insecure about my code, do I just
need confidence?

Ta,
Dan.

On 4 Feb 2004, Keith James wrote:

> >>>>> "Dan" == Dan Bolser <dmb at mrc-dunn.cam.ac.uk> writes:
> 
>     Dan> Hello, I am a new user to biojava (and almost new to java).
> 
>     Dan> The following code works fine reading a 'FASTA' format file,
>     Dan> but causes an error reading 'MSF' format...
> 
> [...]
> 
>    Dan> --- Exception in thread "main"
>     Dan> java.lang.IllegalArgumentException: No alphabet was set in
>     Dan> the identifier at
>     Dan> 
org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:801)
>     Dan> at
>     Dan> 
org.biojava.bio.seq.io.SeqIOTools.fileToBiojava(SeqIOTools.java:787)
>     Dan> at
>     Dan> 
ReadAlignMakeDistribution.main(ReadAlignMakeDistribution.java:60)
> 
> The exception message here is referring to the integer identifier
> which biojava has for every known combination of file-format (fasta,
> genbank, embl) and alphabet-type (dna, rna, protein). The way these
> are created/interpreted is documented in SeqIOConstants (for the
> sequence formats) and AlignIOConstants (for the alignment
> formats). All the common ones exist as static int fields so that you
> can compare using == or use them in switches.
> 
> The format guessing code (in SeqIOTools.identifyFormat) appears to be
> missing "msf" and "clustal". This is a bug - I'll fix it today. The
> result is that it guesses SeqIOConstants.UNKNOWN as the format
> identifier (which has no alphabet set - hence the message).
> 
> The public method fileToBiojava(int fileType, BufferedReader br)
> should work if you pass it the value AlignIOConstants.MSF_AA
> 
> Keith
> 
> 

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list