[Bioperl-l] Getting a Gene Object
Jason Stajich
jason at cgt.duhs.duke.edu
Mon Dec 8 12:38:12 EST 2003
On Mon, 8 Dec 2003, James Wasmuth wrote:
> Dear All,
>
> I am trying to extract the CDS sequence from EBML files. I've got so
> far as pulling back the features from the EMBL file, but noticed that
> $feat->seq->seq give me the sequence from the start to the end of the
> feature ignoring the presence of introns.
>
$feat->seq returns the underlying sequence the feature is attached to,
$feat->spliced_seq gives you the sequence defined by the location in
$feat spliced together.
> SeqFeature::Gene seems to provide me with what I want with
> @exons=$gene->exons. But how do I go from the EMBL file to a
> SeqFeature::Gene::GeneStructure object? I can't seem to see where the
> required object is returned from...
There is currently no way in bioperl to make this happen automatically
because it is a bit of a hard problem to do it correctly all the time.
Some say it requires an ontology (enter Sequence Ontology [SO/SOFA]) to
map the EMBL/GenBank annotations into objects which have semantics like
exon/intron from CDS. Chris Mungall has put some effort into
Bio::SeqFeature::Tools::Unflattener to begin to achive this. However the
next step is to take unflattened objects and build SeqFeature::Gene object
(where appropriate) from this proper hierarchy of gene,mRNA,exon. A
simple way is to use the Unflattener to produce GFF3 compliant features,
load these into Bio::DB::GFF and use Lincoln's aggregators to get out the
genes. I'm not sure if GFF3 (>2 level hierarchies) is completely
supported there yet though. All of this is a bit bleeding edge so you
have to follow up some of the past mailing list traffic to get the whole
picture and/or bug people for a project's status.
Best,
-jason
>
> Many thanks
>
> james
>
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list