[Bioperl-l] working with large alignments

Mon Feb 2 06:41:23 EST 2004

Albert Vilella who is visiting me here at EBI works with really big genomic 
sequence alignments. I've committed several of his modules into cvs for that 
purpose. The most important additions are:

* Bio::Seq::LargeLocatableSeq
    Bio::RangeI compliant Bio::Seq::LargePrimarySeq 
    uses File::Tmp for seq storing
* Bio::Seq::LargeSeqI
    Interface class for LargeSeq implemantations
* Bio::AlignIO::largemultifasta
    IO class creating Bio::Seq::LargeLocatableSeq and SimpleAlign objects

The LargeLocatableSeq is based on code from Bio::Seq::LargePrimarySeq. 
Everything seems to work but if we run tests added to the end of the 
t/AlignIO.t file with larger files, the process is still using large amount 
of memory. We'be interested from hearing from anyone who can suggest 
improvements.

You are willling to test the code with larger data sets, I've put two files 
here:

http://www.ebi.ac.uk/~lehvasla/bioperl/medium.largemultifasta (1.3M)
http://www.ebi.ac.uk/~lehvasla/bioperl/large.largemultifasta (31M)

Thanks,

	-Heikki  and Albert
-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________