[Bioperl-l] Bio::LocatableSeq and Annotation vs Feature

Chris Fields cjfields at illinois.edu
Thu Jun 25 17:02:48 UTC 2009


On Jun 25, 2009, at 9:46 AM, Chase Miller wrote:

> Hi all,
>
> Quick question I came across while writing the Bio::Nexml module.
>
> I'm trying to link taxon data to a Bio::LocatableSeq object inside a
> Bio::SimpleAlign object.  Bio::SimpleAlign has the ability to add
> SeqFeatures, but according to this HowTo (
> http://www.bioperl.org/wiki/HOWTO:Feature-Annotation) a feature is
> considered to refer to a portion of a sequence, whereas something  
> like taxon
> data would refer to the entire sequence and should be handled as an
> annotation. However, as far as I can tell Bio::LocatableSeq does not  
> support
> annotation objects.
> What would be the best way to relate taxon data to a single sequence  
> inside
> an alignment?
>
> Thanks,
> Chase

 From working with feature/annotation-rich alignment formats such as  
stockholm I found this is one of the areas for Align that needs some  
rethinking. One way to work around this w/o major refactoring is to  
have a full-length SeqFeature (pointing to the proper LocatableSeq)  
that stores the Bio::Annotation.  I don't necessarily like that  
approach as a long-term solution, though, as it's a little hacky and  
indirect, but it might get you started (just mark it as TODO so we can  
catch it at some point).

For a long-term solution I don't think the answer is as simple as  
making LocatableSeq Bio::AnnotatableI; that would not be congruent  
with the PrimarySeq implementation (which is not AnnotatableI).   
LocatableSeq is supposed to represent a simple PrimarySeq that can be  
mapped to other sequences via start/end/strand, and thus inherits from  
both Bio::PrimarySeq (note lack of 'I') and RangeI.

Three options:
1) Bio::Seq could be refactored to handle both Bio::PrimarySeq and  
Bio::LocatableSeq, and SimpleAlign reworked to allow any simple RangeI.
2) Bio::PrimarySeq can be AnnotatableI (Bio::Seq would delegate to the  
PrimarySeq AnnotationCollection).
3) All AnnotationI need to be linked back to the PrimarySeqI somehow  
e.g. features.

I personally think option #2 is easiest, as this means anything that  
is-a PrimarySeq is also AnnotatableI, and it might not break past  
scripts.  Not sure how this would affect overall performance though.

chris



More information about the Bioperl-l mailing list