[Biojava-l] BioJava-X parsing of RichSequences
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Wed May 3 01:19:02 UTC 2006
Ola Spjuth <ola.spjuth at farmbio.uu.se>
Sent by: biojava-l-bounces at lists.open-bio.org
05/02/2006 09:15 PM
To: biojava-l <biojava-l at biojava.org>
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] BioJava-X parsing of RichSequences
> 1) I'd like to use Biojava-X with Bioclipse. Are there any problems
> running it with Java 1.5 (as is required by Bioclipse)?
Shouldn't be a problem. Biojava-X doesn't use Java1.5 but JDK1.5 (JRE5.0)
can run and compile biojava.
>2) I would propose the addition of a readStream(...) method in
>RichSequence.IOTools in addition to readFile(...). For the Bioclipse
>project it would be most useful to be able to guess the format of a
>Stream. As IOTools is marked final it cannot be subclassed.
The reason you cannot do this is because format guessing involves reading
some data from the source and then either pushing it back or re-opening
when it has guessed the format. You cannot guarentee a pushback to a
Stream and you cannot guarentee you could re-open it again. As a hack you
could read the stream into a temp file and pass that to IOTools. You may
also be able to read it to a ByteArrayBuffer and read that as a Stream.
>3) Is HashBioEntryDB a suitable base object for storing 1-N
>RichSequences in memory or should I use RichSequence[]? Which solution
>has the simplest toByte() method for writing to e.g. a File?
>
>So, basically I am looking for the most convenient way of doing:
>
>i) Read byte[] (from a File containing 1-N sequences) into a base
>object in memory (HashBioEntryDB or RichSequence[])
>ii) Write the (HashBioEntryDB or RichSequence[]) to byte[] (and then
>later to File using Bioclipse-methods)
>
The simplist way to read in and write out directly is to take the
RichSequenceIterator you get from the IOTools read method and pass it
direct to the IOTools out method of choice. If you want to manipulate data
in between a RichSequence[] is probably smaller in memory but not as user
freindly as a DB object.
You should also be aware that RichSequenceIterators are lazy, eg they only
read data from a file for each request to nextRichSequence(), thus you can
manipulate each sequence as it comes in and not have to worry about
running out of memory.
Hope this helps,
- Mark
More information about the Biojava-l
mailing list