[Bioperl-l] Memory-mapped sequence object

Jeremy Semeiks jrs@farviolet.com
Mon, 12 Aug 2002 23:06:59 -0700


On Mon, Aug 12, 2002 at 08:49:35AM -0400, Jason Stajich wrote:
> Re: Bio::DB::Fasta
> 
> In that it does a good job doing the lookups on large sequences, but will
> suffer just as much as the current implementation when you try and bring a
> large sequence all the way into memory -

You're absolutely right -- this module fits the bill for me. Thanks!

> But a memory mapped seq object would still be a good thing.  If this could
> replace the slow Bio::Seq::LargePrimarySeq implementation I'd love to see
> it in the toolkit.

I agree that it would still be useful to have a fast, lightweight
large sequence object that doesn't need indexing. However, one big argument
against re-implementing Bio::Seq::LargePrimarySeq using memory-mapping
is that it would limit the types of files that could become
LargePrimarySeqs (i.e., to files with only one sequence and fixed line
width). On the other hand, implementing a whole new Bio::Seq-derived
object based on an implementation idea might overcomplicate things and
confuse people.

Maybe Bio::DB::Fasta should be referenced in the documentation for
Bio::Seq::Large[Primary]Seq.

- Jeremy