[Bioperl-l] EST Alignment questions
Lincoln Stein
lstein@cshl.org
Fri, 1 Nov 2002 09:46:31 -0500
There's a "dna" glyph that will allow you to position the text representation
of the DNA string on the panel. You can use this to display multiple
alignments.
But there are problems:
1) the display will be a bitmapped graphic, so users can't cut and paste
2) only one line will be displayed, so no line wrapping
3) you can't introduce spaces every ten characters or add numbers
4) you can't (easily) place bp numbers to the left of the alignment
lines
What we need is a text-based alignment pretty-printer that will accept
SimpleAlign alignments and output them with the pads inserted. The BioPerl
alignment modules have been written to do the inverse task-- to parse
pretty-printed alignments produced by external programs. I actually need
this functionality fairly badly for my own work, so if someone has code
written already let me know and I'll incorporate it into an official bioperl
module.
Lincoln
On Thursday 31 October 2002 04:36 pm, Jamie Hatfield wrote:
> I've been on the list for a few days now, and have done quite a bit of
> searching through the message archives, and I have actually answered one
> question (wheeeee).
>
> So anyway, I'd like to pose two questions about EST Alignments now.
> 1) Is there a Graphics::?? Interface suitable for displaying SimpleAlign
> (or EST Alignments in general)
> 2) Is SimpleAlign best for EST Alignments?
>
> First, I would like to use the Graphics::Panel interface to display
> EST alignments. But, the interface requires that all sequence features
> (I guess an EST would be a 'feature' of the consensus sequence???) start
> within the range of the sequence. This makes sense. But, the EST can
> start before the consensus and therefore have a negative offset. I'll
> explain this more in the next question (with an example).
>
> Second, is SimpleAlign the best module to store information relating
> to alignments? The type of alignments I'm working with (EST) are not
> 'typical' as I believe SimpleAlign is built to handle. That is, we
> don't expect them to all begin at the same location. Instead, we have a
> consensus that these EST's are aligned to. So in the following example:
>
> CONSENSUS: AGGCCTGAGGCCCCTTTT
> EST1 : CGCAGGCCCGAGGCC
> EST2 : GGCCTGAGGCCCCTT
> EST3 : CTGAGGCCACTTTTTCGC
>
> The consensus starts at 0 (or 1 or whatever), and EST1 starts at -3,
> EST2 starts at 1, and EST3 starts at 4. Now this is the WHOLE sequence
> for each EST, not just part of it. I thought the start and end of
> locatable sequence would work, but that refers to the section of the EST
> (or sequence) you are aligning.
>
> Sequence StartPos
> ======== ========
> EST1 -3
> EST2 1
> EST3 4
>
> The consensus does not cover the entire span of all EST's because the
> leading and trailing sequences may be low quality, and cap chooses to
> not include those low quality bases as part of the consensus, however,
> we like to show them on each EST, just so the data isn't lost.
>
> To get this data into SimpleAlign, I have to pad all sequence with '.'
> to make them line up. This doesn't seem real efficient, not that I
> mind. I would just think that if anybody else used this for EST
> alignment (or should I be using something else for EST's) then the
> capability would be built in to handle this type of alignments.
>
> Thanks for all help.
>
> ------------------------------------------------------------------------
> -
> Jamie Hatfield Room 541H, Marley Building
> Systems Programmer University of Arizona
> Arizona Genomics Computational Tucson, AZ 85721
> Laboratory (AGCoL) (520) 626-9598
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================