[Bioperl-l] Memory-mapped sequence object

Lincoln Stein lstein@cshl.org
Tue, 13 Aug 2002 08:42:45 -0400


The problem with Bio::DB::Fasta is that it doesn't fit in with the other 
Bio::Index modules because of the optimizations it makes for Fasta files.  So 
it's hanging in limbo a bit.  I should take a look at the introductory 
documentation and try to rationalize Bio::DB::Fasta with respect to the rest 
of the toolkit.

I expect that on most systems, Bio::DB::Fasta will be slightly slower -- but 
not by much -- than the mmap solution.  This is because Unix does a great job 
at performing lookahead reads and caching recently used data.  Most of the 
overhead will come from the index lookup 

Lincoln

On Tuesday 13 August 2002 02:06 am, Jeremy Semeiks wrote:
> On Mon, Aug 12, 2002 at 08:49:35AM -0400, Jason Stajich wrote:
> > Re: Bio::DB::Fasta
> >
> > In that it does a good job doing the lookups on large sequences, but will
> > suffer just as much as the current implementation when you try and bring
> > a large sequence all the way into memory -
>
> You're absolutely right -- this module fits the bill for me. Thanks!
>
> > But a memory mapped seq object would still be a good thing.  If this
> > could replace the slow Bio::Seq::LargePrimarySeq implementation I'd love
> > to see it in the toolkit.
>
> I agree that it would still be useful to have a fast, lightweight
> large sequence object that doesn't need indexing. However, one big argument
> against re-implementing Bio::Seq::LargePrimarySeq using memory-mapping
> is that it would limit the types of files that could become
> LargePrimarySeqs (i.e., to files with only one sequence and fixed line
> width). On the other hand, implementing a whole new Bio::Seq-derived
> object based on an implementation idea might overcomplicate things and
> confuse people.
>
> Maybe Bio::DB::Fasta should be referenced in the documentation for
> Bio::Seq::Large[Primary]Seq.
>
> - Jeremy
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l