[Biojava-dev] Fwd: [Biojava-l] file i/o with ArrayList

Peter Cock p.j.a.cock at googlemail.com
Fri Feb 13 15:26:45 UTC 2015


On Fri, Feb 13, 2015 at 2:21 PM, Paolo Pavan <paolo.pavan at gmail.com> wrote:
> Good point!
> The InsdcParser class is responsible of parsing locations and nested
> locations with splits. It reads and records the numbers in the location
> string as they are in genbank file. So it is 1 based.

As an aside, Biopython actually has two GenBank parsers.
One builds Biopython objects with zero-based coordinates.
The other (less used) is a more faithful as-is representation
which just exposes the feature locations as strings (unmodified).

> Andreas, does biojava has to be considered 0 based or 1 based for sequence
> coordinates? Or it is an uncoded behaviour?

See below for my outsider's impression.

> Said so, no tests have actually failed, so I tought that sequence
> coordinates are 1 based as in other bio projects.

Not all the other Bio* projects. Python strings/arrays/etc and
therefore Biopython sequences and feature co-ordinates are
zero-based.

According to the following pages the BioPerl and BioRuby
sub-sequence methods are one-based though:
http://search.cpan.org/~cjfields/BioPerl/Bio/Seq.pm#subseq
http://bioruby.org/rdoc/Bio/Sequence/Common.html

> Also, in my last before-send-this-email check,
> AbstractSequence.getSubSequence() (and ProxyView) does not apply any
> transformation so I can conclude that biojava is 1 based.

Since Java strings are zero based, I would have guessed
(without checking the BioJava documentation) that the
BioJava sequences are also zero based. I was surprised
to read here that BioJava's sequences are one-based:

http://biojava.org/wiki/BioJava:Cookbook:Sequence:SubSequence

See also: https://www.biostars.org/p/49909/#130967

So, getting back to the GenBank location problem,
it sounds like since BioJava uses 1-based sequences,
the GenBank feature locations should be left as is,
and the start-1 operation in the writer should be
removed as Stefan suggested.

Regards,

Peter


More information about the biojava-dev mailing list