[Biojava-l] Multiple questions

Ola Spjuth ola.spjuth at farmbio.uu.se
Tue Nov 29 08:54:51 EST 2005


Thanks for all the comments, it should get me started.
I do not intend to write a file-format guesser (out of my scope) but I
would like to make an RFE for it here and now. I think many people would
benefit from it.

Cheers,

   .../Ola


On Tue, 2005-11-29 at 14:34, Kalle Näslund wrote:
> Ola Spjuth wrote:
> 
> >Hi,
> >
> >I am investigating the usefulness of BioJava as a backend for sequence
> >management in Bioclipse (www.bioclipse.net). As a total newbie to
> >Biojava, I have read the tutorial, BIA examples, glanced at the API,
> >read my first FASTA-sequence and have come up with a few questions:
> >
> >1) Is it possible to search the Biojava-l archives without having to
> >manually browse by month?
> >
> >2) Is there a wrapper for SequenceIO.fileToBiojava(..) that
> >automatically detects file formats or is it necessary to distinguish
> >sequence formats externally, i.e. with different file-extensions? If so,
> >does anyone know of a complete list of file-extensions that could be
> >mapped to a format?
> >  
> >
> There is a deprecated piece of code available, that quite many people 
> actualy use
> in their code still. Even though it might not be the greatest thing to 
> try to auto
> guess file format, its the desireable thing to do in many cases.
> If i just look at people in my lab, they want to open the file, they 
> dont want to keep
> track of what file format that particular sequence was in, and so on.
> 
> So, even if file format guessing is bad, people are going to write it, 
> and imho its
> better to have one centralised good, known to work file guesser, then 
> several
> different implementations that differ in each persons own application.
> 
> So, my suggestion is to start with using the deprecated version thats in 
> biojava, if
> it gets removed you can easily just copy that small part of the code 
> into your own
> application, or as an external little jarfile.
> 
> >3) How robust are the I/O-classes for different formats? The
> >test-library provided is rather short in my opinion and my first test
> >broke since there was a space in the wrong position...
> >
> >4) What are the capabilities for multiple sequence alignment in Biojava?
> >Is it limited to parse results into Biojava objects (as in BIA) or does
> >it contain any stable MSA-implementations? Due to BioJavas size it is
> >not easy to get an overview of the current capabilities and the standard
> >of different parts.
> >  
> >
> 
> There is some support for multiple alignments in biojava. The Alignment 
> interface
> and implementations happily handle multiple alignments. And you can 
> choose how
> to interpret it, either as SymbolList over a crossproduct alphabet, or 
> as individual
> sequences accessable by some label.
> 
> There is a basic framework for handling multiple alignment formats in 
> the biojava
> org.biojava.bio.seq.io package. It currently only implements two 
> formats, FASTA
> and MSF. Most programs seem to be able to generate multiple alignment 
> output
> into either FASTA or MSF format so you should be able to get the results 
> into
> biojava.
> 
> >5) As a novice, has anyone implemented BLAST or CLUSTALW in Java? Any
> >public web-services running for this?
> >
> >  
> >
> I have been told by greater deities that implementing BLAST in java is 
> hard, because
> the blast algorithm makes heavy use of low level data structures, 
> pointers ? and similar
> things that are very hard to implement and controll in java. So the 
> resulting implementation
> would most likely run pretty darn slow, and not do what you want.
> 
> Depending on what you want to do with BLAST, the biojava SSAHA 
> implementation
> might be something you can use instead ( it works pretty ok on quite 
> conserved sequences,
> but its not realy suited for more divergent sequences )
> 
> When it comes to webservices i just know of a few things, i have not 
> used any of these
> to an large extent, so i cant comment on how well they work for large 
> sequences, big
> jobs and so on.
> 
> http://www.ebi.ac.uk/Tools/webservices/services.html
> http://xml.ddbj.nig.ac.jp/wsdl/index.jsp
> 
> Sadly they all use their own data encoding and service invocation setup, 
> so its pretty darn
> annoying to use.
> 
> 
> >6) Is there some example-code on how to use DAS (as a client)?
> >
> >7) How can I submit an RFE?
> >
> >Sorry for so many questions in one post; I have a lot of catching up to
> >do and was hoping for some guidance. Some answers have probably already
> >been answered in earlier posts but I have not been able to search the
> >archives.
> >
> >Cheers,
> >
> >   .../Ola
> >
> >
> >
> >
> >_______________________________________________
> >Biojava-l mailing list  -  Biojava-l at biojava.org
> >http://biojava.org/mailman/listinfo/biojava-l
> >  
> >
-- 
Ola Spjuth <ola.spjuth at farmbio.uu.se>




More information about the Biojava-l mailing list