[Bioperl-l] [Gmod-schema] bp_genbank2gff3.pl in bioperl-live: why map CDS to gene_component_region?
Chris Fields
cjfields at illinois.edu
Wed Mar 24 13:06:01 UTC 2010
On Mar 24, 2010, at 7:05 AM, Leighton Pritchard wrote:
> Hi,
>
> I'm surprised that this issue hasn't come up already, as the change to the
> gene model is quite significant. For comparison, this is what the old
> bp_genbank2gff3.pl script would produce with --CDS:
> ...
> So, although the new script improves the parent-child relationships by
> identifying parents on the locus_tag field (guaranteed to be unique), rather
> than gene name (not guaranteed to be unique), the GFF3 gene model has
> apparently changed from canonical:
>
> gene <- mRNA <- {polypeptide/CDS, exon}
>
> to this:
>
> region ; exon ; gene <- gene_component_region
>
> So I guess I don't understand the region-exon-gene part of the new model,
> after all. This new model doesn't appear to be Sequence Ontology-compatible
> any more (e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1175956/) as exon
> is no longer considered part_of the transcript. In fact, there's not a
> transcript. Given that the SO cite bp_genbank2gff3.pl as a way to get
> SO-compliant GFF3
> (http://www.sequenceontology.org/resources/faq.html#convert), this might be
> an issue requiring a prompt fix or reversion.
I agree. I think this commit needs more code review to understand the reasoning behind it, though it will be a little trickier than a simple reversion (I think there have been additional unrelated commits since then). Nathan, was this the intent, or is this a bug? I would agree with Leighton that it's the latter.
chris
> For now, due to the downstream problems this model causes with GBROWSE and
> ARTEMIS, I'm going to go back to BioPerl 1.6.1, with a modification to the
> script to use the locus_tag field rather than the gene field for the feature
> ID.
>
> Cheers,
>
> L.
More information about the Bioperl-l
mailing list