[Bioperl-l] Changes to Bio::SeqI broke Bio::Graphics

Matthew Pocock matthew_pocock@yahoo.co.uk
Tue, 12 Nov 2002 21:01:50 +0000

Lincoln Stein wrote:
> This allows an 
> entry to be a feature in a larger virtual sequence, such as a genome 
> assembly.  I don't see why we persist in thinking in this flat-file EMBL 
> entry way.
> Lincoln

In BioJava we have a feature interface ComponentFeature with the two 
important methods getComponentLocation() and getComponentSequence(). 
This indicates in an assembled sequence where you are to glue in bits of 
other sequences. The ComponentFeatures location is where it is in the 
assembly and the getComponentLocation is what region of 
getComponentSeqeunce to insert there. This has many advantages over 
making sequences features. Firstly, and most importantly, it allows us 
to project a single sequence (or multiple portions of a single sequence) 
into multiple assemblies (or different places in the same assembly). For 
example, we can take the same clones and project them into multiple 
versions of the human golden path. Translocations can trivialy be 
represented by building one assembled sequence for the normal and 
translocated chromosome, and the same underlying sequences can be used, 
providing a benefit in both memory and integrity of the annotations. It 
has many other benefits from the object modelling view of things, 
allowing us to implement stuff via lazy proxies (portions of an assembly 
don't need to be loaded untill you try to fetch data from them).

I would caution against rolling the location of a sequene in a larger 
assembly ino the sequence itself. At the very least, you are going to 
need to publish a start and stop coordinate within the sequence that 
gets projected. This whole line of modelling in our experience causes 
problems rather than providing solutions.


Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts