[Bioperl-l] CDS/exon was Re: *major* error in genbank pars er or am i just insane?

Lin, Xiaoying J. Xiaoying.Lin@celera.com
Fri, 9 Aug 2002 16:24:10 -0400


Francis,

Thanks for the clarification on Genbank model.  Otherwise I will be guilty
of submitting several thousand genes without proper annotation ;-). 

For better data handling and to avoid having out of sync mRNA/CDS features,
I am thinking to avoid store exons on CDS as separate feature at all, but
just to store the coordinates for mRNA and start/stop for translation.  

and I will need help on 2 aspects:

1. Is this model OK?  has anyone tried this.

2. I have not find a way to translate part of an exon (feature) with
bioperl, where remaining part of an exon is UTR.  Could someone give me a
hit on how to do this?

Thanks.

Xiaoying


> -----Original Message-----
> From: Francis Ouellette [mailto:francis@cmmt.ubc.ca]
> Sent: Friday, August 09, 2002 3:40 PM
> To: Lin, Xiaoying J.
> Cc: lstein@cshl.org; brian.king@animorphics.net; Brian King; Ewan
> Birney; bioperl-l@bioperl.org
> Subject: Re: [Bioperl-l] Re: *major* error in genbank parser or am i
> just insane?
> 
> 
> 
> 
> { apologies: long reply]
> 
> "Lin, Xiaoying J." wrote:
> 
> > but for CDS features but no exon features, I am not sure I 
> understand
> > you correctly. there are lots submissions in Genbank, which 
> only comes
> > with CDS (join) features, but no separate exon features. If 
> that is a
> > mistake, it is a systematic mistake then. How does the 
> current parser
> > handle a record like
> > 
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=
> nucleotide
> > &list_uids=1458097&dopt=GenBank
> 
> 
> Having a CDS (with join) and no exon feature is how most (the great 
> majority) of CDS's are built that where submitted to the NCBI to be 
> included into GenBank.
> 
> The rationale for this is that there where tooooo many where the exon 
> feature where not valid/validated and it was a bad feature, 
> and that the 
> very best place (within NCBI's data model) to check and 
> validate these 
> was to make sure the join that make up the CDS are valid, and 
> make the 
> right protein, with valid exons. All of the information you need/want 
> is in the join statement.
> 
> But "Ha" you say ... what about UTR's? Well, if you have non-coding
> exons, 
> and you have their coordinates, you should put that information in a
> join
> statement in an mRNA feature. 
> 
> With those two features (CDS and mRNA) the exon feature becomes
> superfluous 
> (in the NCBI data model, I know and understand this is not the case in
> bioperl 
> world.
> 
> Another thing, which as far as I know is *not* validated in 
> the current
> NCBI 
> model (well, it wasn't a few years back when I was a humble civil
> servant) 
> was that the join statement from the mRNA and the one from the
> corresponding 
> CDS where not matched to make sure they where in accordance, and
> obviously 
> you don't have  a translation to validate that join.
> 
> Before people get bent out of shape against NCBI for not 
> encouraging the 
> exon feature, let me state the philosophy and reasoning behind that 
> (very good, imho) decision: mRNA and proteins are real biological
> entities 
> within the cell and with the NCBI data model, exon are not -- 
> they don't 
> exit on their own. The NCBI data model (of which the GenBank flatfile 
> is a *poor* text/report representation) tries to represent (read: 
> validate, promote, allow computation on) biological "stuff". 
> It doesn't 
> care much for things which are not really "validatable" (an exon on 
> it's own is next to impossible to validate, and CDS is much easier 
> to validate).
> 
> Anyway, I hope this long discourse explains a little where things 
> are coming from ...
> 
> cheers,
> 
> f.
> 
> 
> -- 
> | B.F. Francis Ouellette                       francis@cmmt.ubc.ca | 
> | Director, Bioinformatics Centre              Tel: (604) 875-3815 | 
> | University of British Columbia               Fax: (604) 608-4795 | 
> | Vancouver, BC Canada            http://www.cmmt.ubc.ca/ouellette |
>