[Bioperl-l] EST Alignment questions

Robson Francisco de Souza rfsouza@citri.iq.usp.br
Thu, 31 Oct 2002 21:40:33 -0200 (BRST)


	Hi,

On Thu, 31 Oct 2002, Jamie Hatfield wrote:
[snip]
> 2) Is SimpleAlign best for EST Alignments?
[snip]
>   Second, is SimpleAlign the best module to store information relating
> to alignments?  The type of alignments I'm working with (EST) are not
> 'typical' as I believe SimpleAlign is built to handle.  That is, we
> don't expect them to all begin at the same location.  Instead, we have a
> consensus that these EST's are aligned to.  So in the following example:
> 
> CONSENSUS:       AGGCCTGAGGCCCCTTTT
> EST1     :    CGCAGGCCCGAGGCC
> EST2     :        GGCCTGAGGCCCCTT
> EST3     :           CTGAGGCCACTTTTTCGC
[snip]

	I also noticed this problem. I believe the appropriate modules for
such data would specially desigined assembly modules. I'm writing a
few bioperl-complaint modules for such task. I've called then
Bio::Align::Contig and, based on this module, I also wrote Bio::Assembly,
Bio::AssemblyIO and Bio::AssemblyIO::ace to load phredPhrap assemblies. 
Not all methods described in my POD docs are implemented, just the ones I
need now. Maybe this modules will fit your needs, although they may be
quite slow with big assemblies (speed was not an issue for me when I wrote
them). 
	Core guys, can/should I send you the modules to you so that you
may tell me if they fit into bioperl? To whom should I send them? 

> The consensus starts at 0 (or 1 or whatever), and EST1 starts at -3,
> EST2 starts at 1, and EST3 starts at 4.  Now this is the WHOLE sequence
> for each EST, not just part of it.  I thought the start and end of
> locatable sequence would work, but that refers to the section of the EST
> (or sequence) you are aligning.

	Yes. While building my own module, I adopted the documented
interface for Bio::Align::AlignI objects, which requests that alignment
sequences be Bio::LocatableSeq. This way Bio::Align::Contig kept
complaint with Bio::Align::AlignI, but I ended not using start() and end()
methods from Bio::LocatableSeq. Both the requirement that alignment
sequence are locatable sequences and the strict meaning given to
start and end in Bio::LocatableSeq seem too restrictive for me, although
I don't know if this imposed by the Bio::LocationI object inside
locatable sequences.
	Bio::Seq objects have Bio::RangeI complaince themselves, so why
can't I use them in AlignI objects?