[Bioperl-l] split location problems
Amir Karger
akarger at CGR.Harvard.edu
Tue Oct 17 16:53:19 UTC 2006
> From: Jason Stajich [mailto:jason.stajich at gmail.com]
>
> The whole point of split locations is to represent genes with
> introns
> so that is not the "rare" case.
Absolutely.
> I have processed the genbank fungal genomes into GFF3 and
> have had no
> problems so I'm confused where you are breaking down. If I write
> them out as embl I also get the correct thing. This is using
> the CVS
> version of bioperl from the HEAD.
>
> I've added code to test this to bug 2101 including a C.glabrata
> chromsome downloaded from genbank. Perhaps the problem is on the
> EMBL parsing side, I didn't test that.
Well, I don't know whether it's EMBL parsing, or a bit further down the
pipe, but I downloaded C.glabrata chromosome B for GenBank (NC_005968),
and it describes the complement/joins in the way that Bioperl is
handling correctly.
GenBank:
CDS complement(join(10347..10372,10632..11157))
/locus_tag="CAGL0B00242g"
EMBL:
FT CDS
join(complement(10632..11157),complement(10347..10372))
FT /locus_tag="CAGL0B00242g"
Here's the diff when I run the location-printing script I posted
yesterday:
diff biogb bio
1c1,5
< complement(join(10347..10372,10632..11157))
---
> complement(1701..2651)
> complement(2635..3345)
> complement(3980..4408)
> complement(join(10632..11157,10347..10372))
> 10379..10615
209a214,217
> 498198..498890
> 499712..500062
> 499851..500702
> 500579..501364
As you can see, the complement/join CDS is written out in a different
order, which is Bad.
(I looked at at least one of the other differences: the GB file says
it's a "misc feature" and EMBL says it's a CDS. But they don't seem to
be relevant here.)
-Amir
>
> On the technical side, I still am not sure I fully know where the
> strand information should be stored - the top level container or the
> sub-features. I'll try and stay up on the discussion if
> anything has
> been decided that I should know about.
>
> -jason
>
>
>
>
More information about the Bioperl-l
mailing list