[Biojava-l] Multiple questions

Richard HOLLAND hollandr at gis.a-star.edu.sg
Wed Nov 30 00:48:50 EST 2005


Hi there,

Seeing as people think it'd be quite useful, I've added some format-guessing functionality to BioJavaX (although I haven't touched the old one).

Here's how you would use it:

	// Load the classes that represent the selection of formats your program expects to receive
	Class.forName("org.biojavax.bio.seq.io.EMBLFormat");
	Class.forName("org.biojavax.bio.seq.io.GenbankFormat");
	Class.forName("org.biojavax.bio.seq.io.FastaFormat");

	// Obtain the default BioJavaX namespace.
	Namespace ns = RichObjectFactory.getDefaultNamespace();

	// Find the file
	File file = new File("myfile.seq");

	// Read the file (indicating that you want to load sequences into the default namespace).
      // BioJavaX will guess the format based only on the selection of format classes that have
 	// previously been loaded either using Class.forName above or by instantiating them elsewhere.
	RichSequenceIterator seqs = RichSequence.IOTools.readFile(file,ns);

	// NB. If you do know the format in advance, you don't need to load the class first, and instead
	// you should just use one of the predefined methods in RichSequence.IOTools, eg.:
	// 	BufferedReader br = new BufferedReader(new FileReader(file))
	//    RichSequenceIterator seqs = RichSequence.IOTools.readFastaDNA(br, ns);
 	
	// Iterate over the sequences
	while (seqs.hasNext()) {
	      RichSequence rs = seqs.nextRichSequence();
		// ... Do something with it here ...
	}

cheers,
Richard

Richard Holland
Bioinformatics Specialist
GIS extension 8199
---------------------------------------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you.
---------------------------------------------


> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org 
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of Ola Spjuth
> Sent: Tuesday, November 29, 2005 9:55 PM
> To: Kalle Näslund
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Multiple questions
> 
> 
> Thanks for all the comments, it should get me started.
> I do not intend to write a file-format guesser (out of my scope) but I
> would like to make an RFE for it here and now. I think many 
> people would
> benefit from it.
> 
> Cheers,
> 
>    .../Ola
> 
> 
> On Tue, 2005-11-29 at 14:34, Kalle Näslund wrote:
> > Ola Spjuth wrote:
> > 
> > >Hi,
> > >
> > >I am investigating the usefulness of BioJava as a backend 
> for sequence
> > >management in Bioclipse (www.bioclipse.net). As a total newbie to
> > >Biojava, I have read the tutorial, BIA examples, glanced 
> at the API,
> > >read my first FASTA-sequence and have come up with a few questions:
> > >
> > >1) Is it possible to search the Biojava-l archives without 
> having to
> > >manually browse by month?
> > >
> > >2) Is there a wrapper for SequenceIO.fileToBiojava(..) that
> > >automatically detects file formats or is it necessary to 
> distinguish
> > >sequence formats externally, i.e. with different 
> file-extensions? If so,
> > >does anyone know of a complete list of file-extensions 
> that could be
> > >mapped to a format?
> > >  
> > >
> > There is a deprecated piece of code available, that quite 
> many people 
> > actualy use
> > in their code still. Even though it might not be the 
> greatest thing to 
> > try to auto
> > guess file format, its the desireable thing to do in many cases.
> > If i just look at people in my lab, they want to open the 
> file, they 
> > dont want to keep
> > track of what file format that particular sequence was in, 
> and so on.
> > 
> > So, even if file format guessing is bad, people are going 
> to write it, 
> > and imho its
> > better to have one centralised good, known to work file 
> guesser, then 
> > several
> > different implementations that differ in each persons own 
> application.
> > 
> > So, my suggestion is to start with using the deprecated 
> version thats in 
> > biojava, if
> > it gets removed you can easily just copy that small part of 
> the code 
> > into your own
> > application, or as an external little jarfile.
> > 
> > >3) How robust are the I/O-classes for different formats? The
> > >test-library provided is rather short in my opinion and my 
> first test
> > >broke since there was a space in the wrong position...
> > >
> > >4) What are the capabilities for multiple sequence 
> alignment in Biojava?
> > >Is it limited to parse results into Biojava objects (as in 
> BIA) or does
> > >it contain any stable MSA-implementations? Due to BioJavas 
> size it is
> > >not easy to get an overview of the current capabilities 
> and the standard
> > >of different parts.
> > >  
> > >
> > 
> > There is some support for multiple alignments in biojava. 
> The Alignment 
> > interface
> > and implementations happily handle multiple alignments. And you can 
> > choose how
> > to interpret it, either as SymbolList over a crossproduct 
> alphabet, or 
> > as individual
> > sequences accessable by some label.
> > 
> > There is a basic framework for handling multiple alignment 
> formats in 
> > the biojava
> > org.biojava.bio.seq.io package. It currently only implements two 
> > formats, FASTA
> > and MSF. Most programs seem to be able to generate multiple 
> alignment 
> > output
> > into either FASTA or MSF format so you should be able to 
> get the results 
> > into
> > biojava.
> > 
> > >5) As a novice, has anyone implemented BLAST or CLUSTALW 
> in Java? Any
> > >public web-services running for this?
> > >
> > >  
> > >
> > I have been told by greater deities that implementing BLAST 
> in java is 
> > hard, because
> > the blast algorithm makes heavy use of low level data structures, 
> > pointers ? and similar
> > things that are very hard to implement and controll in java. So the 
> > resulting implementation
> > would most likely run pretty darn slow, and not do what you want.
> > 
> > Depending on what you want to do with BLAST, the biojava SSAHA 
> > implementation
> > might be something you can use instead ( it works pretty ok 
> on quite 
> > conserved sequences,
> > but its not realy suited for more divergent sequences )
> > 
> > When it comes to webservices i just know of a few things, i 
> have not 
> > used any of these
> > to an large extent, so i cant comment on how well they work 
> for large 
> > sequences, big
> > jobs and so on.
> > 
> > http://www.ebi.ac.uk/Tools/webservices/services.html
> > http://xml.ddbj.nig.ac.jp/wsdl/index.jsp
> > 
> > Sadly they all use their own data encoding and service 
> invocation setup, 
> > so its pretty darn
> > annoying to use.
> > 
> > 
> > >6) Is there some example-code on how to use DAS (as a client)?
> > >
> > >7) How can I submit an RFE?
> > >
> > >Sorry for so many questions in one post; I have a lot of 
> catching up to
> > >do and was hoping for some guidance. Some answers have 
> probably already
> > >been answered in earlier posts but I have not been able to 
> search the
> > >archives.
> > >
> > >Cheers,
> > >
> > >   .../Ola
> > >
> > >
> > >
> > >
> > >_______________________________________________
> > >Biojava-l mailing list  -  Biojava-l at biojava.org
> > >http://biojava.org/mailman/listinfo/biojava-l
> > >  
> > >
> -- 
> Ola Spjuth <ola.spjuth at farmbio.uu.se>
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 



More information about the Biojava-l mailing list