[Bioperl-l] EST Alignment questions

Jamie Hatfield jamie@genome.arizona.edu
Thu, 31 Oct 2002 14:36:31 -0700


I've been on the list for a few days now, and have done quite a bit of
searching through the message archives, and I have actually answered one
question (wheeeee).

So anyway, I'd like to pose two questions about EST Alignments now.  
1) Is there a Graphics::?? Interface suitable for displaying SimpleAlign
(or EST Alignments in general)
2) Is SimpleAlign best for EST Alignments?

  First, I would like to use the Graphics::Panel interface to display
EST alignments.  But, the interface requires that all sequence features
(I guess an EST would be a 'feature' of the consensus sequence???) start
within the range of the sequence.  This makes sense.  But, the EST can
start before the consensus and therefore have a negative offset.  I'll
explain this more in the next question (with an example).

  Second, is SimpleAlign the best module to store information relating
to alignments?  The type of alignments I'm working with (EST) are not
'typical' as I believe SimpleAlign is built to handle.  That is, we
don't expect them to all begin at the same location.  Instead, we have a
consensus that these EST's are aligned to.  So in the following example:

CONSENSUS:       AGGCCTGAGGCCCCTTTT
EST1     :    CGCAGGCCCGAGGCC
EST2     :        GGCCTGAGGCCCCTT
EST3     :           CTGAGGCCACTTTTTCGC

The consensus starts at 0 (or 1 or whatever), and EST1 starts at -3,
EST2 starts at 1, and EST3 starts at 4.  Now this is the WHOLE sequence
for each EST, not just part of it.  I thought the start and end of
locatable sequence would work, but that refers to the section of the EST
(or sequence) you are aligning.

Sequence    StartPos
========    ========
  EST1         -3
  EST2          1
  EST3          4

The consensus does not cover the entire span of all EST's because the
leading and trailing sequences may be low quality, and cap chooses to
not include those low quality bases as part of the consensus, however,
we like to show them on each EST, just so the data isn't lost.

To get this data into SimpleAlign, I have to pad all sequence with '.'
to make them line up.  This doesn't seem real efficient, not that I
mind.  I would just think that if anybody else used this for EST
alignment (or should I be using something else for EST's) then the
capability would be built in to handle this type of alignments.

Thanks for all help.

------------------------------------------------------------------------
-
Jamie Hatfield                                Room 541H, Marley Building
Systems Programmer                            University of Arizona
Arizona Genomics Computational                Tucson, AZ  85721
  Laboratory (AGCoL)                          (520) 626-9598