[Bioperl-l] split location problems
akarger at CGR.Harvard.edu
Tue Oct 17 16:53:19 UTC 2006
> From: Jason Stajich [mailto:jason.stajich at gmail.com]
> The whole point of split locations is to represent genes with
> so that is not the "rare" case.
> I have processed the genbank fungal genomes into GFF3 and
> have had no
> problems so I'm confused where you are breaking down. If I write
> them out as embl I also get the correct thing. This is using
> the CVS
> version of bioperl from the HEAD.
> I've added code to test this to bug 2101 including a C.glabrata
> chromsome downloaded from genbank. Perhaps the problem is on the
> EMBL parsing side, I didn't test that.
Well, I don't know whether it's EMBL parsing, or a bit further down the
pipe, but I downloaded C.glabrata chromosome B for GenBank (NC_005968),
and it describes the complement/joins in the way that Bioperl is
Here's the diff when I run the location-printing script I posted
diff biogb bio
As you can see, the complement/join CDS is written out in a different
order, which is Bad.
(I looked at at least one of the other differences: the GB file says
it's a "misc feature" and EMBL says it's a CDS. But they don't seem to
be relevant here.)
> On the technical side, I still am not sure I fully know where the
> strand information should be stored - the top level container or the
> sub-features. I'll try and stay up on the discussion if
> anything has
> been decided that I should know about.
More information about the Bioperl-l