[Biojava-l] Multiple questions

mark.schreiber at novartis.com mark.schreiber at novartis.com
Mon Nov 28 20:39:25 EST 2005


>I am investigating the usefulness of BioJava as a backend for sequence
>management in Bioclipse (www.bioclipse.net). As a total newbie to
>Biojava, I have read the tutorial, BIA examples, glanced at the API,
>read my first FASTA-sequence and have come up with a few questions:
>
>1) Is it possible to search the Biojava-l archives without having to
>manually browse by month?

This page is a search index for all of the open-bio hosted websites:
http://search.open-bio.org/cgi-bin/obf-search.cgi

This page is just for searching mailing list archives:
http://search.open-bio.org/cgi-bin/mail-search.cgi

>2) Is there a wrapper for SequenceIO.fileToBiojava(..) that
>automatically detects file formats or is it necessary to distinguish
>sequence formats externally, i.e. with different file-extensions? If so,
>does anyone know of a complete list of file-extensions that could be
>mapped to a format?

There used to be but it is deprecated as there is no fool proof way to 
guess formats. I would suggest you adopt your own conventions for 
bioeclipse. There are no standard file extensions either however I try and 
use .gb for genbank. .fna for Fasta DNA .faa for Fasta aminoacid, .fra for 
Fasta rna etc. Again for bioeclipse you could define your own 
expectations.

>3) How robust are the I/O-classes for different formats? The
>test-library provided is rather short in my opinion and my first test
>broke since there was a space in the wrong position...

In my opinion they are poor. The newer parsers in org.biojavax (available 
only in CVS at this stage) are much better and have survived some stress 
testing. Robustness is an issue, sometimes a file that claims to be in one 
format doesn't really follow the conventions so technically isn't in that 
format. Were these are found we try to allow them if we can. I have found 
a few Genbank files that don't really seem to follow genbanks own 
conventions.

>4) What are the capabilities for multiple sequence alignment in Biojava?
>Is it limited to parse results into Biojava objects (as in BIA) or does
>it contain any stable MSA-implementations? Due to BioJavas size it is
>not easy to get an overview of the current capabilities and the standard
>of different parts.

Biojava cannot do anything other than pairwise alignments although there 
is no reason why you could not implement an algorithm.

>5) As a novice, has anyone implemented BLAST or CLUSTALW in Java? Any
>public web-services running for this?

Not to my knowledge. There are some similar programs in Java but not those 
ones. There are blast webservices at NCBI (e-utils) and probably EBI. Not 
sure about ClustalW

>6) Is there some example-code on how to use DAS (as a client)?

take a look at http://www.biojava.org/dazzle/

>7) How can I submit an RFE?

No formal proceedure, just post to the list. Even better code it yourself 
and post it to the list :)


- Mark




More information about the Biojava-l mailing list