[Bioperl-l] Request for direction...
Malcolm Cook
mcook@dna.com
Wed, 10 Oct 2001 11:29:03 -0700
I have implemented Ewan's approach and it works like a charm.
Unfortunately I can not share the source code.
I implemented it for the express purpose of 'canonicalizing' a set of seq
features for the purpose of 'reasoning' about 'variation' features (somewhat
a la Heiki's Bio::Variation::* modules).
I recommend creating, say, Bio::SemanticFeatureInterpreter::Interface as a
class of objects which programmatically 'edit' Bio::SeqI or
Bio::Seq::RichSeq (or Bio::LiveSeq?) objects. The nature of the edit could
be arbitrary. Objects which conform to the interface would implement a
method named, say, 'interpret', which would take the seq and implement some
systematic change to it.
Then, implement Bio::SemanticFeatureInterpreter::BuildGeneObjects (say)
which @ISA(Bio::SemanticFeatureInterpreter::Interface) and whose 'interpret'
method would 'Do the right thing'.
Then, change Bio::SeqIO module to take another ->new param, one which
specifies an ordered list of Bio::SemanticFeatureInterpreter objects to
apply to each sequence object as it is read.
allowing:
$in = Bio::SeqIO->new(-file => "inputfilename" , '-format' => 'Genbank',
'-SemanticFeatureInterpreter' => [BuildGeneObjects]);
<aside>
As a departure from the usual BioPerl way of dynamically requiring modules,
the change to implementation of Bio::SeqIO could
require UNIVERSAL::Require
and then, in Bio::SeqIO->new, do a
$self->_SemanticFeatureInterpreterObjects = map
{"Bio::SemanticFeatureInterpreter::$_"->require;
"Bio::SemanticFeatureInterpreter::$_"->new;}
self->SemanticFeatureInterpreter;
later, in the next_seq method, before returning the read seq, loop over the
$self->_SemanticFeatureInterpreterObjects and have them applied in turn to
the seq.
this works (I've done it) and is independent of data source, as Ewan values.
</aside>
Regarding exons v. mRNA/CDS....
I defer to Francis Ouellette's knowledge and experience. I now think that
if your code or your GUI likes to compute/display in terms of exons, then
build them on the fly if needed based on location underlying the CDS, mRNA,
and/or gene feature. But heed that they will still need to possibly be
distinct by transcript, as for instance a given location may be exon2 in one
transcript and exon1 in another (or not even appear). However, I really do
now think you ought rather take Francis' advise, and then build your CDS,
5'UTR, 3'UTR features on-the-fly if needed (i.e. they are not present, but
exon/intron boundaries are) and then implement BuildGeneObjects over their
underlying locations.
What think you?
-Malcolm
>-----Original Message-----
>From: Ewan Birney [mailto:birney@ebi.ac.uk]
>Sent: Wednesday, October 10, 2001 8:58 AM
>To: Mark Wilkinson
>Cc: bioperl-l@bioperl.org
>Subject: Re: [Bioperl-l] Request for direction...
>
>
>
>Mark ...
>
>I'm coming into this discussion late, and I haven't been following it.
>
>Is the basic loop of code this
>
> Run SeqIO ... Get SeqFeature::Generics out ... process
>SeqFeature::Generics (set of) into richer SeqFeature objects
>
>splitting it this way
>
>
> (a) allows people to switch the processing on or off (on by
>default in
>my view)
>
> (b) means that code can be reused, in particular inside EMBL
>
>
>Mucho respect for getting stuck in there, irregardless...
>
>
>-----------------------------------------------------------------
>Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
><birney@ebi.ac.uk>.
>-----------------------------------------------------------------
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l@bioperl.org
>http://bioperl.org/mailman/listinfo/bioperl-l
>