[Bioperl-l] Re: *major* error in genbank parser or am i just insane?
Ewan Birney
birney@ebi.ac.uk
Fri, 9 Aug 2002 09:24:02 +0100 (BST)
On Fri, 9 Aug 2002, Brian King wrote:
> > But is this just random cruft from Genbank/EMBL that
> > they didn't
> > realise
> > when they designed it or something deeper?
>
> After long struggles with the join operator I finally
> concluded is that it's just a way to represent
> hierarchical features in the flat feature table
> structure. The regions within the join usually
> correspond to some other contiguous feature in the
> same feature table. I'm interested to know if someone
> with more experience than me sees it the same way.
I know this does not hold up 100% across the archive - there are CDS lines
with no corresponding separate exon features...
>
> Because of the ambiguities in the join operator my
> ideal solution would be to not support the join syntax
> at all, but to match up the joined feature with its
> intended sub-features in the same table when parsing,
> or at least create generic sub-features at the
> contiguous regions on the join. I'd make a real
> hierarchical representation in the object model and
> abandon the join syntax. Unfortunately you'd have to
> hard-code some biological knowledge to judge if a
> corresponding sub-feature was really supposed to be
> part of a joined feature. I doubt that round-trip
> preservation of the GenBank/EMBL record is necessary.
> You could write out the record in a format that has
> hierarchical features and refer to the original record
> as needed. Anyway, all that would be pretty hard to
> do, but I like to have an ideal in mind anyway.
>
Ha!
This is very hard to do because you have to handle:
(a) CDS with no Exons
and, my particular favourite
(b) a mRNA join operator which is out of sync with the CDS join
operator (!)
Quite what is going on in (b) is of course, anyone's guess. The simplest
solution is a typo by the author but perhaps he/she was trying to say
something profound ;)
Certainly doing this automatically works for 90% but sadly not 100% of
cases.
Don't forget remote features as well (joins across entries) which have
their own can of worms ...
> Sorry I only have an analysis and no solution.
>
> Regards,
> Brian
>
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> HotJobs - Search Thousands of New Jobs
> http://www.hotjobs.com
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------