[Biojava-dev] BioJava 3 code usage examples

Richard Holland holland at eaglegenomics.com
Wed Nov 19 12:00:17 UTC 2008


Hello.

Thanks for your feedback. You are right that we've continued to
provide a Symbol-based alphabet/symbol structure, but it is no longer
a central concept nor is it required to use it.

You'll notice that when FASTA is read using the new parser, it reads
the sequence from the FASTA file as a simple String (actually, a
CharSequence). If you want to work with it as a String/CharSequence
and don't want to convert it into Symbols/Lists, you can do so. This
is the big change from the existing BioJava way of doing things, which
automatically converts everything into the BioJava object model
instead of giving the user the choice of what to do with it. This
change is consistent with the part of the design document you quote in
your email.

So, this is giving users the choice of whether they want to work with
the sequences directly as Strings/CharSequences, or whether they want
to convert them into Symbols/Lists. Users can then tailor their choice
depending on locally observed speed/memory usage issues should they so
wish.

cheers,
Richard


2008/11/19 Hongyu Zhang <me at hongyu.org>:
> Hi Richard,
>
> Thanks for your great work! I noticed from your examples that you decided to continue to use the Symbol object-based model to represent sequences even though in the Biojava3 design page ( http://biojava.org/wiki/BioJava3_Design ) it said
> "Sequences are perfectly happy as Strings unless you want to do complex
> things like store base quality information, and only at that point
> should you want to convert them into more complex object models."
>
>
> The original Biojava tutorial ( http://biojava.org/wiki/BioJava:Tutorial:Symbols_and_SymbolLists#Doesn.27t_this_all_waste_memory.3F ) discussed the memoery space difference between Symbol object-based sequence representation and String-based sequence representation, but it didn't address speed issue. One of the advantages of Java String library is that it was optimized using native machine codes, so  I think an Sybmol object-based sequence representation would be slower than String-based sequence representation for certain operations such as substring search.
>
> Let me know if I missed something. Thanks!
>
> Best,
>
> Hongyu Zhang, Ph.D.
> Ceres Inc., Thousand Oaks, CA
> Cell: 805-405-5394
> Fax: 866-447-8750
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/



More information about the biojava-dev mailing list