[Bioperl-l] added Bio::SeqIO::largefasta

Jason Stajich jason@chg.mc.duke.edu
Mon, 4 Dec 2000 16:03:23 -0500 (EST)


I have added support for reading in a large fasta file and making it a
Bio::Seq::LargePrimarySeq.  Some more testing and debugging will
need to be done to insure all the weird fasta cases are handled
since I cannot use the same patterns as are possible in the fasta.pm 
module since I can only read in one line at a time in order to meet
our not holding the sequence in memory requirements.  

Please note that currently next_seq will return a PrimarySeq 
until I decide if we can have or need a LargeSeq class or just a wrapper 
as well. Also the Bio::Seq::LargePrimarySeq implementation means that it
will make a copy of the fasta file to your tmpdir (as defined by
File::Spec->tmpdir) which if overly large could make your machine very
unhappy as it could run out of swap space.  You can override the location
of the tmp file by setting 
$Bio::Seq::LargePrimarySeq::DEFAULT_TEMP_DIR = 'somedir' 
BEFORE you instantiate a new LargePrimarySeq object.

The test, largefasta.t has been added as well and some additional routines
were added LargePrimarySeq to bring it up to PrimarySeqI spec.

Some likely uses, at least from my perspective, is the ability to read in
a large sequence file and chop it into smaller managable chunks for some
specific tasks.

This will likely not be on the 0.7 branch as it is new code so we'll have
to omit it from the branch.

Suggestions and Comments are always appreciated.

-Jason

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.mc.duke.edu/