[Biojava-dev] Fwd: [Biojava-l] file i/o with ArrayList

Paolo Pavan paolo.pavan at gmail.com
Fri Feb 13 14:21:22 UTC 2015


Good point!
The InsdcParser class is responsible of parsing locations and nested
locations with splits. It reads and records the numbers in the location
string as they are in genbank file. So it is 1 based.

Andreas, does biojava has to be considered 0 based or 1 based for sequence
coordinates? Or it is an uncoded behaviour?
Said so, no tests have actually failed, so I tought that sequence
coordinates are 1 based as in other bio projects.

Also, in my last before-send-this-email check,
AbstractSequence.getSubSequence() (and ProxyView) does not apply any
transformation so I can conclude that biojava is 1 based.

Paolo

2015-02-13 13:42 GMT+01:00 Peter Cock <p.j.a.cock at googlemail.com>:

> On Fri, Feb 13, 2015 at 9:53 AM, stefan harjes <stefanharjes at yahoo.de>
> wrote:
> > Hi Erik,
> >
> > thanks for the offer, but the little mistake with the enumeration was
> very
> > quick to find. There is an increment of one added to the start position
> of
> > each location for some reason. If you change it from 1 to 0, the sequence
> > location start positions are no longer incremented.
> >
> > you can find it in line 284 of GenericInsdcHeaderFormat.java
> >
> > Cheers
> > Stefan
>
> Erik said the code was copied and translated from Biopython,
> in which case we deliberately adjust the start position on parsing
> *and* writing to use Python style slice counting, rather than the
> INSDC feature table one-based counting.
>
> I would first check the BioJava policy on feature coordinates,
> it may be the bug is actually in the BioJava GenBank parser?
>
> Regards,
>
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150213/b784d904/attachment.html>


More information about the biojava-dev mailing list