[Bioperl-l] added Bio::SeqIO::largefasta
Ewan Birney
birney@ebi.ac.uk
Tue, 5 Dec 2000 10:22:37 +0000 (GMT)
On Mon, 4 Dec 2000, Jason Stajich wrote:
> I have added support for reading in a large fasta file and making it a
> Bio::Seq::LargePrimarySeq. Some more testing and debugging will
> need to be done to insure all the weird fasta cases are handled
> since I cannot use the same patterns as are possible in the fasta.pm
> module since I can only read in one line at a time in order to meet
> our not holding the sequence in memory requirements.
Right.
>
> Please note that currently next_seq will return a PrimarySeq
> until I decide if we can have or need a LargeSeq class or just a wrapper
> as well. Also the Bio::Seq::LargePrimarySeq implementation means that it
> will make a copy of the fasta file to your tmpdir (as defined by
> File::Spec->tmpdir) which if overly large could make your machine very
> unhappy as it could run out of swap space. You can override the location
> of the tmp file by setting
> $Bio::Seq::LargePrimarySeq::DEFAULT_TEMP_DIR = 'somedir'
> BEFORE you instantiate a new LargePrimarySeq object.
I am with hilmar that this should return a Seq object which has-a
Bio::Seq::LargePrimarySeq.
>
> The test, largefasta.t has been added as well and some additional routines
> were added LargePrimarySeq to bring it up to PrimarySeqI spec.
>
> Some likely uses, at least from my perspective, is the ability to read in
> a large sequence file and chop it into smaller managable chunks for some
> specific tasks.
>
Also for adding features put a massive coordinate scale (perhaps produced
by some database group somewhere...) and then dumping out the sequence
associated with that efficiently
BTW - so that people know, LargePrimarySeq relies on the fact that
people use the
$seq->subseq(1000,1100);
methods to get out regions, not
substr($seq->seq,1000,100);
> This will likely not be on the 0.7 branch as it is new code so we'll have
> to omit it from the branch.
>
I, personally, think this is fine on the branch, but Hilmar is branch
king, so he has the final say ...
I don't think this is going to break anything.
> Suggestions and Comments are always appreciated.
>
> -Jason
>
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.mc.duke.edu/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------