[Biojava-l] BioJava-X parsing of RichSequences

Richard Holland richard.holland at ebi.ac.uk
Wed May 3 08:38:38 UTC 2006


Ah yes, I hadn't thought about that aspect. In which case, a Stream-
capable format-guesser is not going to be possible. But there's nothing
stopping Ola from reading/writing to Streams directly, as long as he
knows what format they're in.

It's also worth pointing out that the format guesser is not to be relied
on. It'll sometimes get it wrong and some formats it won't recognise at
all. I wouldn't rely on it - it's there for simple applications only.

cheers,
Richard


On Wed, 2006-05-03 at 09:19 +0800, mark.schreiber at novartis.com wrote:
> Ola Spjuth <ola.spjuth at farmbio.uu.se>
> Sent by: biojava-l-bounces at lists.open-bio.org
> 05/02/2006 09:15 PM
> 
>  
>         To:     biojava-l <biojava-l at biojava.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-l] BioJava-X parsing of RichSequences
> 
> 
> > 1) I'd like to use Biojava-X with Bioclipse. Are there any problems
> > running it with Java 1.5 (as is required by Bioclipse)?
> 
> Shouldn't be a problem. Biojava-X doesn't use Java1.5 but JDK1.5 (JRE5.0) 
> can run and compile biojava.
> 
> >2) I would propose the addition of a readStream(...) method in
> >RichSequence.IOTools in addition to readFile(...). For the Bioclipse
> >project it would be most useful to be able to guess the format of a
> >Stream. As IOTools is marked final it cannot be subclassed.
> 
> The reason you cannot do this is because format guessing involves reading 
> some data from the source and then either pushing it back or re-opening 
> when it has guessed the format. You cannot guarentee a pushback to a 
> Stream and you cannot guarentee you could re-open it again. As a hack you 
> could read the stream into a temp file and pass that to IOTools. You may 
> also be able to read it to a ByteArrayBuffer and read that as a Stream.
> 
> >3) Is HashBioEntryDB a suitable base object for storing 1-N
> >RichSequences in memory or should I use RichSequence[]? Which solution
> >has the simplest toByte() method for writing to e.g. a File?
> >
> >So, basically I am looking for the most convenient way of doing:
> >
> >i)   Read byte[] (from a File containing 1-N sequences) into a base
> >object in memory (HashBioEntryDB or RichSequence[])
> >ii) Write the (HashBioEntryDB or RichSequence[]) to byte[] (and then
> >later to File using Bioclipse-methods)
> >
> 
> The simplist way to read in and write out directly is to take the 
> RichSequenceIterator you get from the IOTools read method and pass it 
> direct to the IOTools out method of choice. If you want to manipulate data 
> in between a RichSequence[] is probably smaller in memory but not as user 
> freindly as a DB object.
> 
> You should also be aware that RichSequenceIterators are lazy, eg they only 
> read data from a file for each request to nextRichSequence(), thus you can 
> manipulate each sequence as it comes in and not have to worry about 
> running out of memory.
> 
> Hope this helps,
> 
> - Mark
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the Biojava-l mailing list