[Biojava-l] Multiple questions

mark.schreiber at novartis.com mark.schreiber at novartis.com
Tue Nov 29 20:15:24 EST 2005

Regarding the format guessing function. It was deprecated cause it cannot 
be gaurenteed to work. However, deprecation might be a bit extreme, 
especially if many people use it. I would propose that we undeprecate it 
and just document a warning saying it may not work. Any objections?

- Mark

Kalle Näslund <kalle.naslund at genpat.uu.se>
Sent by: biojava-l-bounces at portal.open-bio.org
11/29/2005 09:34 PM

        To:     Ola Spjuth <ola.spjuth at farmbio.uu.se>
        cc:     biojava-l at biojava.org, (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Multiple questions

Ola Spjuth wrote:

>I am investigating the usefulness of BioJava as a backend for sequence
>management in Bioclipse (www.bioclipse.net). As a total newbie to
>Biojava, I have read the tutorial, BIA examples, glanced at the API,
>read my first FASTA-sequence and have come up with a few questions:
>1) Is it possible to search the Biojava-l archives without having to
>manually browse by month?
>2) Is there a wrapper for SequenceIO.fileToBiojava(..) that
>automatically detects file formats or is it necessary to distinguish
>sequence formats externally, i.e. with different file-extensions? If so,
>does anyone know of a complete list of file-extensions that could be
>mapped to a format?
There is a deprecated piece of code available, that quite many people 
actualy use
in their code still. Even though it might not be the greatest thing to 
try to auto
guess file format, its the desireable thing to do in many cases.
If i just look at people in my lab, they want to open the file, they 
dont want to keep
track of what file format that particular sequence was in, and so on.

So, even if file format guessing is bad, people are going to write it, 
and imho its
better to have one centralised good, known to work file guesser, then 
different implementations that differ in each persons own application.

So, my suggestion is to start with using the deprecated version thats in 
biojava, if
it gets removed you can easily just copy that small part of the code 
into your own
application, or as an external little jarfile.

>3) How robust are the I/O-classes for different formats? The
>test-library provided is rather short in my opinion and my first test
>broke since there was a space in the wrong position...
>4) What are the capabilities for multiple sequence alignment in Biojava?
>Is it limited to parse results into Biojava objects (as in BIA) or does
>it contain any stable MSA-implementations? Due to BioJavas size it is
>not easy to get an overview of the current capabilities and the standard
>of different parts.

There is some support for multiple alignments in biojava. The Alignment 
and implementations happily handle multiple alignments. And you can 
choose how
to interpret it, either as SymbolList over a crossproduct alphabet, or 
as individual
sequences accessable by some label.

There is a basic framework for handling multiple alignment formats in 
the biojava
org.biojava.bio.seq.io package. It currently only implements two 
formats, FASTA
and MSF. Most programs seem to be able to generate multiple alignment 
into either FASTA or MSF format so you should be able to get the results 

>5) As a novice, has anyone implemented BLAST or CLUSTALW in Java? Any
>public web-services running for this?
I have been told by greater deities that implementing BLAST in java is 
hard, because
the blast algorithm makes heavy use of low level data structures, 
pointers ? and similar
things that are very hard to implement and controll in java. So the 
resulting implementation
would most likely run pretty darn slow, and not do what you want.

Depending on what you want to do with BLAST, the biojava SSAHA 
might be something you can use instead ( it works pretty ok on quite 
conserved sequences,
but its not realy suited for more divergent sequences )

When it comes to webservices i just know of a few things, i have not 
used any of these
to an large extent, so i cant comment on how well they work for large 
sequences, big
jobs and so on.


Sadly they all use their own data encoding and service invocation setup, 
so its pretty darn
annoying to use.

>6) Is there some example-code on how to use DAS (as a client)?
>7) How can I submit an RFE?
>Sorry for so many questions in one post; I have a lot of catching up to
>do and was hoping for some guidance. Some answers have probably already
>been answered in earlier posts but I have not been able to search the
>   .../Ola
>Biojava-l mailing list  -  Biojava-l at biojava.org

Biojava-l mailing list  -  Biojava-l at biojava.org

More information about the Biojava-l mailing list