[Bioperl-l] Understanding LocatableSeq

Heikki Lehvaslaiho heikki at nildram.co.uk
Sat Nov 8 04:28:30 EST 2003


On Fri, 2003-11-07 at 04:31, Wes Barris wrote:
> > You still need to prefix/postfix with the requisite number of gaps.
> > 
> > The start/end describe where the sequence participating the alignment
> > COMES FROM not where they are in the alignment, so you have to
> > explicitly code their alignment by placing the right number of gaps.
> 
> Ok.  I understand that I have to add the gap characters on either end
> of each aligned sequence.  Sorry for being so dense but I still don't
> understand the use of the "start" and "end" attributes.  They don't
> appear to do anything.  If I have two sequences:
> 
> 	GATCGATC
> and
> 	 ATCGAT
> 
> what would be the start and end for each sequence or doesn't it matter?
> When you say that they represent where the sequence COMES FROM, what does
> that mean?

Wes,

Sequence alignments represent quite often not global but local
alignments between sequences. Local means that only a small portion of
the compared sequences match each other. This is the approach used by,
for example, Blast and Fasta programs. Now, when you see a high scoring
alignment from fasta run, you want to know which part of your query
sequence match which part of the database sequence, so that you can,
e.g., check the feature table: 

  300  ATGCGA  305
    3  ATGC--    6

If you are building your own alignment from scratch and you do not care
or know where the sequences came from, you assign '1' for the start and
the length of the sequence to the end. If you then later manipulate your
alignment, e.g. take a slice, the new object knows where in your
original alignment that slice came from (i.e, what were the original
start and end columns).

I hope this helped. 

Yours,
	-Heikki




More information about the Bioperl-l mailing list