[Bioperl-l] Request for direction...

Malcolm Cook mcook@dna.com
Wed, 10 Oct 2001 11:29:03 -0700

I have implemented Ewan's approach and it works like a charm.  

Unfortunately I can not share the source code.

I implemented it for the express purpose of 'canonicalizing' a set of seq
features for the purpose of 'reasoning' about 'variation' features (somewhat
a la Heiki's Bio::Variation::* modules).

I recommend creating, say, Bio::SemanticFeatureInterpreter::Interface as a
class of objects which programmatically 'edit' Bio::SeqI or
Bio::Seq::RichSeq (or Bio::LiveSeq?) objects.  The nature of the edit could
be arbitrary.  Objects which conform to the interface would implement a
method named, say, 'interpret', which would take the seq and implement some
systematic change to it.

Then, implement Bio::SemanticFeatureInterpreter::BuildGeneObjects (say)
which @ISA(Bio::SemanticFeatureInterpreter::Interface) and whose 'interpret'
method would 'Do the right thing'.

Then, change Bio::SeqIO module to take another ->new param, one which
specifies an ordered list of Bio::SemanticFeatureInterpreter objects to
apply to each sequence object as it is read.


$in  = Bio::SeqIO->new(-file => "inputfilename" , '-format' => 'Genbank',
'-SemanticFeatureInterpreter' => [BuildGeneObjects]);

As a departure from the usual BioPerl way of dynamically requiring modules,
the change to implementation of Bio::SeqIO could 
   require UNIVERSAL::Require
and then, in Bio::SeqIO->new, do a
  $self->_SemanticFeatureInterpreterObjects = map

later, in the next_seq method, before returning the read seq, loop over the
$self->_SemanticFeatureInterpreterObjects and have them applied in turn to
the seq.

this works (I've done it) and is independent of data source, as Ewan values.


Regarding exons v. mRNA/CDS....

I defer to Francis Ouellette's knowledge and experience.  I now think that
if your code or your GUI likes to compute/display in terms of exons, then
build them on the fly if needed based on location underlying the CDS, mRNA,
and/or gene feature.   But heed that they will still need to possibly be
distinct by transcript, as for instance a given location may be exon2 in one
transcript and exon1 in another (or not even appear).  However, I really do
now think you ought rather take Francis' advise, and then build your CDS,
5'UTR, 3'UTR features on-the-fly if needed (i.e. they are not present, but
exon/intron boundaries are) and then implement BuildGeneObjects over their
underlying locations.

What think you?


>-----Original Message-----
>From: Ewan Birney [mailto:birney@ebi.ac.uk]
>Sent: Wednesday, October 10, 2001 8:58 AM
>To: Mark Wilkinson
>Cc: bioperl-l@bioperl.org
>Subject: Re: [Bioperl-l] Request for direction... 
>Mark ...
>I'm coming into this discussion late, and I haven't been following it.
>Is the basic loop of code this
>  Run SeqIO ... Get SeqFeature::Generics out ... process
>SeqFeature::Generics (set of) into richer SeqFeature objects
>splitting it this way
>   (a) allows people to switch the processing on or off (on by 
>default in
>my view)
>    (b) means that code can be reused, in particular inside EMBL
>Mucho respect for getting stuck in there, irregardless...
>Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
>Bioperl-l mailing list