[Biojava-dev] Fwd: [Biojava-l] file i/o with ArrayList

Paolo Pavan paolo.pavan at gmail.com
Fri Feb 13 15:51:12 UTC 2015


2015-02-13 16:26 GMT+01:00 Peter Cock <p.j.a.cock at googlemail.com>:

> On Fri, Feb 13, 2015 at 2:21 PM, Paolo Pavan <paolo.pavan at gmail.com>
> wrote:
> > Good point!
> > The InsdcParser class is responsible of parsing locations and nested
> > locations with splits. It reads and records the numbers in the location
> > string as they are in genbank file. So it is 1 based.
>
> As an aside, Biopython actually has two GenBank parsers.
> One builds Biopython objects with zero-based coordinates.
> The other (less used) is a more faithful as-is representation
> which just exposes the feature locations as strings (unmodified).
>
> > Andreas, does biojava has to be considered 0 based or 1 based for
> sequence
> > coordinates? Or it is an uncoded behaviour?
>
> See below for my outsider's impression.
>
> > Said so, no tests have actually failed, so I tought that sequence
> > coordinates are 1 based as in other bio projects.
>
> Not all the other Bio* projects. Python strings/arrays/etc and
> therefore Biopython sequences and feature co-ordinates are
> zero-based.
>

Ok! Good to know.


>
> According to the following pages the BioPerl and BioRuby
> sub-sequence methods are one-based though:
> http://search.cpan.org/~cjfields/BioPerl/Bio/Seq.pm#subseq
> http://bioruby.org/rdoc/Bio/Sequence/Common.html
>
> > Also, in my last before-send-this-email check,
> > AbstractSequence.getSubSequence() (and ProxyView) does not apply any
> > transformation so I can conclude that biojava is 1 based.
>
> Since Java strings are zero based, I would have guessed
> (without checking the BioJava documentation) that the
> BioJava sequences are also zero based. I was surprised
> to read here that BioJava's sequences are one-based:
>
> http://biojava.org/wiki/BioJava:Cookbook:Sequence:SubSequence
>
> See also: https://www.biostars.org/p/49909/#130967
>

I understand your concern, but I would say that since in java you are
compelled to use methods and is not possible for a user to work directly on
the string representation of the sequence, in my opinion is not evil to
have a method that makes abstraction of the machine logic and comes closer
to the user/domain logic.
It is the concept of encapsulation itself. But of course is also a matter
of tastes.



> So, getting back to the GenBank location problem,
> it sounds like since BioJava uses 1-based sequences,
> the GenBank feature locations should be left as is,
> and the start-1 operation in the writer should be
> removed as Stefan suggested.
>
> Regards,
>
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150213/17149c66/attachment.html>


More information about the biojava-dev mailing list