[Bioperl-l] EST Alignment questions
Robson Francisco de Souza
rfsouza@citri.iq.usp.br
Thu, 31 Oct 2002 21:40:33 -0200 (BRST)
Hi,
On Thu, 31 Oct 2002, Jamie Hatfield wrote:
[snip]
> 2) Is SimpleAlign best for EST Alignments?
[snip]
> Second, is SimpleAlign the best module to store information relating
> to alignments? The type of alignments I'm working with (EST) are not
> 'typical' as I believe SimpleAlign is built to handle. That is, we
> don't expect them to all begin at the same location. Instead, we have a
> consensus that these EST's are aligned to. So in the following example:
>
> CONSENSUS: AGGCCTGAGGCCCCTTTT
> EST1 : CGCAGGCCCGAGGCC
> EST2 : GGCCTGAGGCCCCTT
> EST3 : CTGAGGCCACTTTTTCGC
[snip]
I also noticed this problem. I believe the appropriate modules for
such data would specially desigined assembly modules. I'm writing a
few bioperl-complaint modules for such task. I've called then
Bio::Align::Contig and, based on this module, I also wrote Bio::Assembly,
Bio::AssemblyIO and Bio::AssemblyIO::ace to load phredPhrap assemblies.
Not all methods described in my POD docs are implemented, just the ones I
need now. Maybe this modules will fit your needs, although they may be
quite slow with big assemblies (speed was not an issue for me when I wrote
them).
Core guys, can/should I send you the modules to you so that you
may tell me if they fit into bioperl? To whom should I send them?
> The consensus starts at 0 (or 1 or whatever), and EST1 starts at -3,
> EST2 starts at 1, and EST3 starts at 4. Now this is the WHOLE sequence
> for each EST, not just part of it. I thought the start and end of
> locatable sequence would work, but that refers to the section of the EST
> (or sequence) you are aligning.
Yes. While building my own module, I adopted the documented
interface for Bio::Align::AlignI objects, which requests that alignment
sequences be Bio::LocatableSeq. This way Bio::Align::Contig kept
complaint with Bio::Align::AlignI, but I ended not using start() and end()
methods from Bio::LocatableSeq. Both the requirement that alignment
sequence are locatable sequences and the strict meaning given to
start and end in Bio::LocatableSeq seem too restrictive for me, although
I don't know if this imposed by the Bio::LocationI object inside
locatable sequences.
Bio::Seq objects have Bio::RangeI complaince themselves, so why
can't I use them in AlignI objects?