[Biojava-dev] [Biojava-l] file i/o with ArrayList

Paolo Pavan paolo.pavan at gmail.com
Fri Feb 6 15:38:04 UTC 2015


Hi Stefan,
I had a look at the GenbankWriter because I could also need it in the
future. Can you please specify what are the issues you are meeting? Because
I made few quick tests and everything seemed work to me.

Just in case, if you are reading then writing a Genbank file, are you using
the last release of biojava 4.0.0 version? This would explain empty genbank
files in output (If I have understood correctly what you have done).

Paolo

2015-02-06 11:03 GMT+01:00 stefan harjes <stefanharjes at yahoo.de>:

> @Andreas: Yes I understand, thanks anyhow.
>
> @Paolo: I will have another look at GenbankWriter maybe I find some time.
>
> Cheers
> Stefan
>
>
>
>   Andreas Prlic <andreas at sdsc.edu> schrieb am 7:01 Freitag, 6.Februar
> 2015:
>
>
> Hi Stefan,
>
> thanks for your reply. You are trying to use the code base in a way that
> has not been done before. While I share your desire that this should work
> in principle, I think it is also important to point out that we never
> promised that serialization would be a supported feature. We started a
> thread to add better support on this here:
> https://github.com/biojava/biojava/issues/249 .
>
> Regarding your project: It seems it would make sense to split your array
> of sequences into two: DNA sequences and protein sequences. Dealing with
> each of those separately might be easier.
>
> Andreas
>
>
> On Wed, Feb 4, 2015 at 3:42 PM, stefan harjes <stefanharjes at yahoo.de>
> wrote:
>
> Hi Andreas,
>
> yes I took a look at FastaWriterHelper as well as GenbankWriter and they
> only seem to implement writing the name and sequence as fasta. Also they do
> not allow to read/write a mixed array of protein and DNA sequences. I asked
> myself what is the sense of constructing a complicated sequence with
> annotations, features and links, if I can only write fasta?
>
> This lead me to check out why one of the most basic classes of biojava
> like sequence (i.e. AbstractSequence) is not serializable.
> (Isn't it like String for java?)
>
> The first thing I noticed is that for some reason every sequence has a
> proxyloader. As fas as I understand the proxy is implemented in order to
> not load the entire sequence in case it is very big. Sure, then you can
> load sequences which have Gigabase length. But I have never in my 25 years
> of biochemistry actually worked with a single sequence of > 1GB. While
> there are some plant chromosomes which might fit this description, I would
> argue that the vast majority of biological sequences are much smaller and
> thus do not need a proxy for a single sequence. Thus, I would conclude that
> a small subset of ChromosomeSequence might need a proxyreader
> implementation.
> And thus it should be implemented there and not in the most basic class?
>
> The first class which prevents serialization is as you mentioned
> NucleotideCompound. I lack the biojava experience to say what is essential
> in NucleotideCompond and why it does not allow an empty constructor. But I
> saw for example in biojava 3.1 that compounds are allowed to have flexible
> name length, which I have never seen in actual sequence data, where it is
> always 1 or three characters. Is it not a better strategy to keep basic
> classes such as Sequence and Compound more basic in order to allow
> serialization. Implementation of more complex features could then be moved
> to classes which extend the basic classes?
>
> In my humble opinion one could instantiate a compound without a 'base'
> name but once this compound is added to the compound set, I could check
> that it actually has a base name?
>
> I do not want to sound like a know-it-all and do not try to reinvent
> biojava. However to be honest the (unsuccessful) effort in trying to
> serialize an ArrayList<Sequence<?>> either to send it around over TCP/IP,
> to JSON or to disk has been so frustrating and time consuming, that I
> actually consider changing to jython/biopython, biojavaX, or to write my
> own implementation.
>
> Cheers
> Stefan
>
>
>
>
>
>
>   Andreas Prlic <andreas at sdsc.edu> schrieb am 4:32 Donnerstag, 5.Februar
> 2015:
>
>
>
>
> Hi Stefan,
>
> just another quick follow up. You took a look at FastaWriterHelper and it
> was not useful, right? You need to serialize some header information as
> well, or what was the problem with it?
>
>
> http://www.biojava.org/docs/api/org/biojava/nbio/core/sequence/io/FastaWriterHelper.html
>
> Thanks,
>
> Andreas
>
>
> On Wed, Feb 4, 2015 at 7:13 AM, Andreas Prlic <andreas at sdsc.edu> wrote:
>
> Thanks for pointing this out, Stefan. The problem is that the
> NucleotideCompound class does not have a zero-args constructor. That means
> you need to tweak kryo a bit. Kryo can be configured to use an
> InstantiatorStrategy to handle creating instances of a class.
> https://github.com/EsotericSoftware/kryo/blob/master/README.md
>
> Having said that, we need to improve the API and make something like this
> easier.
>
> Andreas
>
>
>
> On Wed, Feb 4, 2015 at 2:54 AM, stefan harjes <stefanharjes at yahoo.de>
> wrote:
>
> I finally had some time to try the serialization/deserialization library
> (Kryo) you mentioned, but I do not seem to get it to work. I can not even
> save a DNASequence:
>
> void test() {
>     Kryo kryo = new Kryo();
>     DNASequence dna=null;
>     try {
>         dna = new DNASequence("AGCT");
>     } catch (CompoundNotFoundException e1) {
>         // TODO Auto-generated catch block
>         e1.printStackTrace();
>     }
>     try {
>         Output output = new Output(new FileOutputStream("test.ser"));
>          kryo.writeObject(output, dna);
>         output.close();
>     } catch (FileNotFoundException e) {
>         // TODO Auto-generated catch block
>         e.printStackTrace();
>     }
>     try {
>         Input input = new Input(new FileInputStream("test.ser"));
>         dna = kryo.readObject(input, DNASequence.class);
>         input.close();
>     } catch (FileNotFoundException e) {
>         // TODO Auto-generated catch block
>         System.out.println("file not found");
>         e.printStackTrace();
>     }
> }
> I tried several calls of Kryo and also registration, but I can not get it
> to work.... Any ideas?
>
>
> Cheers
> Stefan
>
>
>   Andreas Prlic <andreas at sdsc.edu> schrieb am 3:47 Samstag, 31.Januar
> 2015:
>
>
> Hi Stefan,
>
> for your use case (save and load at server start/stop) I'd recommend the
> Kryo library.  It will store your data as a binary. Should be only two
> lines of code each to persist and load the data.
> https://github.com/EsotericSoftware/kryo
>
> You are right, writing is not very well developed, but then there are so
> many utility libraries in Java that can be used for efficient
> serialization/deserialization in many ways, once you have an object in
> memory.
>
> Andreas
>
>
>
> On Fri, Jan 30, 2015 at 3:01 AM, stefan harjes <stefanharjes at yahoo.de>
> wrote:
>
> Hi biojava-l
>
>
>
> I have a huge number of small sequences in an Array
> (ListArray<Sequence<?>>) which for server start and stop I would like to
> store on disk. Unfortunately Sequence is not serilizable, so I searched and
> found that GenbankWriterHelper.writeSequences(OutputStream os,
> Collection<Sequence<?>> seqs) should be able to do the job.
> However when looking at GenbankReaderHelper, there are no methods which
> correspond to the above writer method. Am I on the wrong track completely?
>
> When looking at the writer/reader helpers, I think I remember reading that
> they are rudimentary and save only the sequence (fasta)? I would expect in
> such an advanced verision of biojava (4.0 is being prepared?) that there
> must be a standard way to serialize rich sequences/arrays of them in order
> to send them around on streams/Json etc?
>
> Any help would be appreciated
>
> Cheers
> Stefan
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150206/dd426619/attachment.html>


More information about the biojava-dev mailing list