[Open-bio-l] GenBank and EMBL - join(complement(...)) vs complement(join(...))

Peter biopython at maubp.freeserve.co.uk
Mon Jan 11 14:42:51 UTC 2010


On Mon, Jan 11, 2010 at 10:42 AM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Sat, Jan 9, 2010 at 2:54 AM, Chris Fields <cjfields at illinois.edu> wrote:
>>
>> I can't recall which of the two BioPerl uses, but if it helps it standardizes
>> on one of them for output but parses both.  I think GenBank and EMBL
>> have converged on using the same format, but I'm not absolutely sure
>> on that.
>>
>> Ironic actually that I can't remember, as I'm the author of the above page
>> and started a discussion about this very subject a while back on the list
>> (in an effort to sort out some issues with BioPerl locations).
>>
>> chris
>
> Thanks Chris,
>
> I'm glad my email made sense - on re-reading I had made more typos
> than usual :(
>
> As to the BioPerl behaviour, I think I know enough to get BioPerl
> to convert GenBank files into EMBL or vice versa, and thus find
> out what it does...

After stumbling over this issue, I made some progress:
http://lists.open-bio.org/pipermail/bioperl-l/2010-January/031889.html

> I hope you are right that GenBank and EMBL have converged on
> using the same format - any confirmation of this (and which format)
> would be very welcome.

I took the Arabidopsis thaliana chloroplast complete genome as
an example. This is AP000423 in EMBL, NC_000932 in GenBank
(although there are some minor differences in the annotation).
Looking at these files (both from early 2009), they seem to use the
same feature location styles, e.g. for reverse strand joins:

complement(join(97999..98793,69611..69724))

e.g. for mixed strand features:

join(complement(69611..69724),139856..140087, 140625..140650)

I'm going to assume that this is what both EMBL and GenBank
will be using in future.

I have confirmed that BioPerl 1.6.x preserves these style locations
on converting EMBL/GenBank to EMBL/GenBank. I need to find
a reverse strand "join(complement(..." example to test with now...

Peter




More information about the Open-Bio-l mailing list