[Bioperl-l] Getting a Gene Object

Jason Stajich jason at cgt.duhs.duke.edu
Mon Dec 8 12:38:12 EST 2003


On Mon, 8 Dec 2003, James Wasmuth wrote:

> Dear All,
>
> I am trying to extract the CDS sequence from EBML files.  I've got so
> far as pulling back the features from the EMBL file, but noticed that
> $feat->seq->seq give me the sequence from the start to the end of the
> feature ignoring the presence of introns.
>

$feat->seq returns the underlying sequence the feature is attached to,
$feat->spliced_seq gives you the sequence defined by the location in
$feat spliced together.

> SeqFeature::Gene seems to provide me with what I want with
> @exons=$gene->exons.  But how do I go from the EMBL file to a
> SeqFeature::Gene::GeneStructure object?  I can't seem to see where the
> required object is returned from...

There is currently no way in bioperl to make this happen automatically
because it is a bit of a hard problem to do it correctly all the time.

Some say it requires an ontology (enter Sequence Ontology [SO/SOFA]) to
map the EMBL/GenBank annotations into objects which have semantics like
exon/intron from CDS. Chris Mungall has put some effort into
Bio::SeqFeature::Tools::Unflattener to begin to achive this.  However the
next step is to take unflattened objects and build SeqFeature::Gene object
(where appropriate) from this proper hierarchy of gene,mRNA,exon.  A
simple way is to use the Unflattener to produce GFF3 compliant features,
load these into Bio::DB::GFF and use Lincoln's aggregators to get out the
genes.  I'm not sure if GFF3 (>2 level hierarchies) is completely
supported there yet though.  All of this is a bit bleeding edge so you
have to follow up some of the past mailing list traffic to get the whole
picture and/or bug people for a project's status.


Best,
-jason

>
> Many thanks
>
> james
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list