[Bioperl-l] working with large alignments
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Mon Feb 2 06:41:23 EST 2004
Albert Vilella who is visiting me here at EBI works with really big genomic
sequence alignments. I've committed several of his modules into cvs for that
purpose. The most important additions are:
* Bio::Seq::LargeLocatableSeq
Bio::RangeI compliant Bio::Seq::LargePrimarySeq
uses File::Tmp for seq storing
* Bio::Seq::LargeSeqI
Interface class for LargeSeq implemantations
* Bio::AlignIO::largemultifasta
IO class creating Bio::Seq::LargeLocatableSeq and SimpleAlign objects
The LargeLocatableSeq is based on code from Bio::Seq::LargePrimarySeq.
Everything seems to work but if we run tests added to the end of the
t/AlignIO.t file with larger files, the process is still using large amount
of memory. We'be interested from hearing from anyone who can suggest
improvements.
You are willling to test the code with larger data sets, I've put two files
here:
http://www.ebi.ac.uk/~lehvasla/bioperl/medium.largemultifasta (1.3M)
http://www.ebi.ac.uk/~lehvasla/bioperl/large.largemultifasta (31M)
Thanks,
-Heikki and Albert
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list