[Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region?

Scott Cain scott at scottcain.net
Tue Mar 23 18:18:46 UTC 2010


Hi Leighton,

I wonder if this is a change stemming from Nathan's work on this
script.  Nathan?

Scott


On Tue, Mar 23, 2010 at 12:35 PM, Leighton Pritchard
<Leighton.Pritchard at scri.ac.uk> wrote:
> Hi,
>
> I can't seem to find any discussion of this on the mailing list archives (if
> anyone has a link, I'll happily follow it), so I was wondering what the
> rationale was for the bp_genbank2gff3.pl script as modified in bioperl-live
> mapping CDS features to gene_component_region.
>
> For example, if I use the script on the E.coli sequence/annotation
> NC_000913.gbk, the gene:
>
>     gene            190..255
>                     /gene="thrL"
>                     /locus_tag="b0001"
>                     /note="synonyms: ECK0001, JW4367"
>                     /db_xref="EcoGene:EG11277"
>                     /db_xref="ECOCYC:EG11277"
>                     /db_xref="GeneID:944742"
>     CDS             190..255
>                     /gene="thrL"
>                     /locus_tag="b0001"
>                     /function="leader; Amino acid biosynthesis: Threonine"
>                     /function="1.5.1.8 metabolism; building block
>                     biosynthesis; amino acids; threonine"
>                     /note="GO_process: threonine biosynthetic process [goid
>                     0009088]"
>                     /codon_start=1
>                     /transl_table=11
>                     /product="thr operon leader peptide"
>                     /protein_id="NP_414542.1"
>                     /db_xref="ASAP:ABE-0000006"
>                     /db_xref="UniProtKB/Swiss-Prot:P0AD86"
>                     /db_xref="GI:16127995"
>                     /db_xref="EcoGene:EG11277"
>                     /db_xref="ECOCYC:EG11277"
>                     /db_xref="GeneID:944742"
>                     /translation="MKRISTTITTTITITTGNGAG"
>
> Is mapped to
>
> NC_000913       GenBank region  190     255     .       +       .
> ID=GenBank:region:NC_000913:190:255
> NC_000913       GenBank exon    190     255     .       +       .
> ID=GenBank:exon:NC_000913:190:255
> NC_000913       GenBank gene    190     255     .       +       .
> ID=b0001;Dbxref=EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=synonyms:
> ECK0001%2C JW4367;gene=thrL;locus_tag=b0001
> NC_000913       GenBank gene_component_region   190     255     .       +
> .
> Parent=b0001;Dbxref=ASAP:ABE-0000006,UniProtKB/Swiss-Prot:P0AD86,GI:16127995
> ,EcoGene:EG11277,ECOCYC:EG11277,GeneID:944742;Note=GO_process: threonine
> biosynthetic process [goid
> 0009088];Ontology_term=GO:0009088;codon_start=1;function=leader%3B Amino
> acid biosynthesis: Threonine,1.5.1.8 metabolism%3B building block
> biosynthesis%3B amino acids%3B
> threonine;gene=thrL;locus_tag=b0001;product=thr operon leader
> peptide;protein_id=NP_414542.1;transl_table=11;translation=MKRISTTITTTITITTG
> NGAG
>
> I understand the region-exon-gene part of the model, but not the
> gene_component_region, which appears to be a catch-all.  I would have
> assumed that the CDS is better mapped to a polypeptide, as described in the
> CHADO documentation:
>
> http://gmod.org/wiki/Chado_Best_Practices#Canonical_Gene_Model
>
> There is no difference in script output whether --CDS or --noCDS is used.
>
> Cheers,
>
> L.
>
> --
> Dr Leighton Pritchard MRSC
> D131, Plant Pathology Programme, SCRI
> Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
> e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
> gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405
>
>
> ______________________________________________________
> SCRI, Invergowrie, Dundee, DD2 5DA.
> The Scottish Crop Research Institute is a charitable company limited by guarantee.
> Registered in Scotland No: SC 29367.
> Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
>
>
> DISCLAIMER:
>
> This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries.  This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed.  It may not be disclosed or used by any other than that
> addressee.
> If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system.
>
> Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any).
> ______________________________________________________
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Bioperl-l mailing list