[Bioperl-l] {SPECS] Sequence meta data

Lincoln Stein lstein at cshl.org
Thu Apr 3 12:46:06 EST 2003


At the hackathon we mulled over a new sort of design in which all the various 
bits of information about a sequence go into a series of "annotation" objects 
that are not directly associated with either a sequence or a location on a 
sequence.  The "feature" object is used to assign the annotation to a 
position on a particular sequence.  The nice thing about this is that the 
same annotation can be placed on different sequences.  It also allows us to 
move the relationships between annotations (i.e. transcript to exons) into 
the annotation, thereby making the relationship between annotations 
independent of the coordinate system of their physical positions.

This works great for traditional features, such as genes, and even for 
Heikki's quality scores, but is not so great for other meta-information such 
as Author and Reference.  The way I proposed to work around this is to create 
annotations of the "Submission" type, which then get attached to the sequence 
via a Feature that starts at position 1 and ends at position==sequence length 
(see diagram).

      Feature
         | --------------- Annotation (of the "BioEntry" or "Submission" type)
         | --------------- Location, possibly defined in terms of itself
         | --------------- PrimarySequence

Some people have responded positively to this, but others have run screaming 
in the opposite direction.  I'm interested in other responses.

Lincoln

On Tuesday 01 April 2003 04:09 am, Heikki Lehvaslaiho wrote:
> Peter,
>
> This is a great! I haven't thought of that.
>
> "The Perl motto is "There's more than one way to do it."  Divining how
> many more is left as an exercise to the reader." 'man perl'
>
> Assuming meta information in features is needed, what we need is a
> standard way of storing residue based meta data in Bio::SeqFeatureI
> (some subclass of Bio::SeqFeature::Generic?) objects and a way of
> transferring them into Bio::Seq::Meta level if needed. Does it make
> sense to do it in other way too, seq->ft?
>
> ..When I think about it, I am not quite sure what kind of sequence meta
> information you'd want to store in features... Could you give me
> examples of what you do, please?
>
>
> The seq<->ft question if a pretty deep one. Lincoln wants to think
> everything is a feature, and although I caused him grey hairs a while
> ago by removing start() and end() for standard BioPerl sequence objects,
> I do agree with him. It can be confusing, but everything should be
> possible to be a sequence feature, including sequences. I think that one
> way of limiting the amount of confusion is to make explicit what is and
> what is not a feature at any one time.
>
> In other words, a gene can be modelled as a feature to a sequence, and
> most of the time you want that feature to be as light weight as
> possible. On the other hand, a really  useful model of a gene can be
> really heavy, and include numerous sequences.
>
> BioPerl has started from the former model, but it is definitely going to
> support the latter as well.
>
> In practise, we need to keep there two approaces in mind, and make sure
> it is as easy as possible to switch between them, and any other approach
> that turns up...
>
> Does this make sense?
>
> 	-Heikki
>
> On Tue, 2003-04-01 at 05:02, Peter Schattner wrote:
> > First of all, thanks for taking this on, Heikki.  I think this will be
> > useful to a lot of people.
> >
> > But...
> >
> > Heikki Lehvaslaiho wrote:
> > >The idea is that meta data makes sense only in the context of the
> > >sequence and should be stored as an integral part of the sequence
> > >object.
> >
> > This points up something in Bioperl that has been confusing me for a
> > while ? what belongs in a Seq vs what belongs in a SeqFeature on that
> > Seq? Generally when I need to use "meta" sequence information it is
> > associated with a gene, a transcript or some other "feature" rather than
> > an entire sequence.  Consequently I have assocated metasequence
> > information with SeqFeature objects rather than Seq objects.  This also
> > has the benefit that I am able to write out these annotations using
> > Bio::Tools::GFF.
> >
> > Quality data is probably most appropriately associated with entire
> > sequences.  But encodings and other metasequence info seem to me are more
> > often be associated with a feature rather than the entire underlying
> > sequence. (I realize that Seq::Encoded associates metasequences with
> > sequences rather than features, but I’m not convinced this is desirable).
> >
> > Well, I’m not adamant about this, but I think this will eventually affect
> > others and is worth a bit a thought before jumping totally into the idea
> > that metasequence information should always be connected to the Seq
> > rather than the SeqFeature.
> >
> > My $0.02 worth.
> >
> > Peter

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list