[Bioperl-l] EST Alignment questions

Heikki Lehvaslaiho heikki@ebi.ac.uk
01 Nov 2002 11:43:21 +0000


On Thu, 2002-10-31 at 23:40, Robson Francisco de Souza wrote:
> 
> 	Hi,
> 
> On Thu, 31 Oct 2002, Jamie Hatfield wrote:

Jamie, thanks for your help on the reverse translate problem!

> > 2) Is SimpleAlign best for EST Alignments?

It is the only one we have that can handle sequence alignments.

> >   Second, is SimpleAlign the best module to store information relating
> > to alignments?  The type of alignments I'm working with (EST) are not
> > 'typical' as I believe SimpleAlign is built to handle.  That is, we
> > don't expect them to all begin at the same location.

The "typical"  use of SimpleAlign is to store and manipulate results
from a MSA prograom like ClustalW. Hope this helps to understand where
SimpleAlign comes from.

There is a group of modules in Bio::Coordinate that might be of use.
They do not deal with sequences but in coordinate systems. You could use
Bio::Coordinate::Collection to store the start and end of ESTs (or
subranges of them) and how they map to the consensus. 

The key function is map() which can tell you where in the consensus
sequence any range in any of the ESTs it is, or the other way round you
can get back a all the overlapping ESTs and their ranges in EST
coordinates. If you change the consensus, you can use the first
collection to create a new one.

  Instead, we have a
> > consensus that these EST's are aligned to.  So in the following example:
> > 
> > CONSENSUS:       AGGCCTGAGGCCCCTTTT
> > EST1     :    CGCAGGCCCGAGGCC
> > EST2     :        GGCCTGAGGCCCCTT
> > EST3     :           CTGAGGCCACTTTTTCGC
> [snip]
> 
> 	I also noticed this problem. I believe the appropriate modules for
> such data would specially desigined assembly modules. I'm writing a
> few bioperl-complaint modules for such task. I've called then
> Bio::Align::Contig and, based on this module, I also wrote Bio::Assembly,
> Bio::AssemblyIO and Bio::AssemblyIO::ace to load phredPhrap assemblies. 
> Not all methods described in my POD docs are implemented, just the ones I
> need now. Maybe this modules will fit your needs, although they may be
> quite slow with big assemblies (speed was not an issue for me when I wrote
> them). 
> 	Core guys, can/should I send you the modules to you so that you
> may tell me if they fit into bioperl? To whom should I send them? 

There are no assembly modules in Bioperl so your's are the best fit for
the problem and therefore have the right to be in Bioperl! ;-) Send them
to the list, or directly to me and I'll add them to the CVS. 

> > The consensus starts at 0 (or 1 or whatever), and EST1 starts at -3,
> > EST2 starts at 1, and EST3 starts at 4.  Now this is the WHOLE sequence
> > for each EST, not just part of it.  I thought the start and end of
> > locatable sequence would work, but that refers to the section of the EST
> > (or sequence) you are aligning.
> 
> 	Yes. While building my own module, I adopted the documented
> interface for Bio::Align::AlignI objects, which requests that alignment
> sequences be Bio::LocatableSeq. This way Bio::Align::Contig kept
> complaint with Bio::Align::AlignI, but I ended not using start() and end()
> methods from Bio::LocatableSeq. Both the requirement that alignment
> sequence are locatable sequences and the strict meaning given to
> start and end in Bio::LocatableSeq seem too restrictive for me, although
> I don't know if this imposed by the Bio::LocationI object inside
> locatable sequences.
> 	Bio::Seq objects have Bio::RangeI complaince themselves, so why
> can't I use them in AlignI objects?

Are they? ... They are! Bio::Seq is actually inheriting from
Bio::RangeI! This is better dealt with in a separate thread.
Bio::Seq is such an important class in Bioperl.

The reson d'etre of Bio::LocatableSeq is that they can be treated as
ranges (with strand) from the actual sequences. They are PrimarySeqs
with that extra capability because when dealing with large alignments,
you really do not want to keep the objects as lean as possible.

If you want to use your existing primaryseq in SimpleAlign, you can
simply bless them to the new class (e.g.  bless $seq, Bio::LocatebleSeq;

Yours,

	-Heikki

> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________