[Bioperl-l] {SPECS] Sequence meta data
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Tue Apr 1 10:09:42 EST 2003
Peter,
This is a great! I haven't thought of that.
"The Perl motto is "There's more than one way to do it." Divining how
many more is left as an exercise to the reader." 'man perl'
Assuming meta information in features is needed, what we need is a
standard way of storing residue based meta data in Bio::SeqFeatureI
(some subclass of Bio::SeqFeature::Generic?) objects and a way of
transferring them into Bio::Seq::Meta level if needed. Does it make
sense to do it in other way too, seq->ft?
...When I think about it, I am not quite sure what kind of sequence meta
information you'd want to store in features... Could you give me
examples of what you do, please?
The seq<->ft question if a pretty deep one. Lincoln wants to think
everything is a feature, and although I caused him grey hairs a while
ago by removing start() and end() for standard BioPerl sequence objects,
I do agree with him. It can be confusing, but everything should be
possible to be a sequence feature, including sequences. I think that one
way of limiting the amount of confusion is to make explicit what is and
what is not a feature at any one time.
In other words, a gene can be modelled as a feature to a sequence, and
most of the time you want that feature to be as light weight as
possible. On the other hand, a really useful model of a gene can be
really heavy, and include numerous sequences.
BioPerl has started from the former model, but it is definitely going to
support the latter as well.
In practise, we need to keep there two approaces in mind, and make sure
it is as easy as possible to switch between them, and any other approach
that turns up...
Does this make sense?
-Heikki
On Tue, 2003-04-01 at 05:02, Peter Schattner wrote:
> First of all, thanks for taking this on, Heikki. I think this will be
> useful to a lot of people.
>
> But...
>
> Heikki Lehvaslaiho wrote:
>
> >The idea is that meta data makes sense only in the context of the
> >sequence and should be stored as an integral part of the sequence
> >object.
>
> This points up something in Bioperl that has been confusing me for a while
> ? what belongs in a Seq vs what belongs in a SeqFeature on that Seq?
> Generally when I need to use "meta" sequence information it is associated
> with a gene, a transcript or some other "feature" rather than an entire
> sequence. Consequently I have assocated metasequence information with
> SeqFeature objects rather than Seq objects. This also has the benefit that
> I am able to write out these annotations using Bio::Tools::GFF.
>
> Quality data is probably most appropriately associated with entire
> sequences. But encodings and other metasequence info seem to me are more
> often be associated with a feature rather than the entire underlying
> sequence. (I realize that Seq::Encoded associates metasequences with
> sequences rather than features, but I’m not convinced this is desirable).
>
> Well, I’m not adamant about this, but I think this will eventually affect
> others and is worth a bit a thought before jumping totally into the idea
> that metasequence information should always be connected to the Seq rather
> than the SeqFeature.
>
> My $0.02 worth.
>
> Peter
>
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list