[Open-bio-l] GenBank and EMBL - join(complement(...)) vs complement(join(...))

Chris Fields cjfields at illinois.edu
Sat Jan 9 02:54:41 UTC 2010


On Jan 8, 2010, at 11:33 AM, Peter wrote:

> Hi all,
> 
> Currently Biopython reads both GenBank and EMBL files, and write GenBank.
> I'm looking at writing EMBL files too - and wanted to see if any of you knew
> anything definitive on join(complement(...)) vs complement(join(...)) in
> feature location strings.
> 
> References:
> http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
> http://www.genbank.lipi.go.id/docs/FTv6_2.html
> 
> Both give this in example, two ways for writing the same location:
> 
> complement(join(2691..4571,4918..5163)
>                          Joins regions 2691 to 4571 and 4918 to 5163, then
>                          complements the joined segments (the feature is
>                          on the strand complementary to the presented strand)
> 
> join(complement(4918..5163),complement(2691..4571))
>                          Complements regions 4918 to 5163 and 2691 to 4571,
>                          then joins the complemented segments (the feature is
>                          on the strand complementary to the presented strand)
> 
> This suggests that either form is valid in both GenBank and EMBL
> format files.
> 
> Anecdotally, I have observed GenBank uses the first form (which is
> shorter) while EMBL seems to use the second form (which to me is
> logical, if you consider how to represent mixed strand features).
> This seems to fit with this BioPerl wiki page:
> 
> http://www.bioperl.org/wiki/BioPerl_Locations
> 
> Is there any official documentation regarding this discrepancy that
> I have overlooked? Am I right to think that GenBank and EMBL do
> still use these different forms (any word on if they might
> standardised one way or the other in future?)
> 
> What do EMBOSS, BioPerl, etc do in this situation? Do you treat
> these two examples the same on parsing, and use one layout
> when writing GenBank and the other for writing EMBL files?
> 
> Peter

I can't recall which of the two BioPerl uses, but if it helps it standardizes on one of them for output but parses both.  I think GenBank and EMBL have converged on using the same format, but I'm not absolutely sure on that.

Ironic actually that I can't remember, as I'm the author of the above page and started a discussion about this very subject a while back on the list (in an effort to sort out some issues with BioPerl locations).

chris





More information about the Open-Bio-l mailing list