[Bioperl-l] using LargeSeq objects

Jason Stajich jason at cgt.duhs.duke.edu
Mon Jun 23 18:28:26 EDT 2003

There are still some shortcomings to Bio::DB::Flat I've found that makes
Bio::Index::Fasta still too useful for me to throw out.  This includes the
ease of specifying your own ID parser for fasta.  I'm still not clear if
this can be done to supply multiple IDs for a sequence as well.  I know
that it has the capability for do.  I'm sure it is also confusing to
people as to which indexing (bdb, binaryindex) schemes are appropriate,
etc so would be good to review these.

Bio::DB::Fasta also does things differently because it won't suck an
entire sequence into memory while Bio::Index::Fasta will - I don't think
Bio::DB::Flat will do subsequence offsets as part of its implementation,
but that needs to be investigated as well (and documented) if it is


On Mon, 23 Jun 2003, Brian Osborne wrote:

> Morten,
> >Go: perldoc Bio::DB::Fasta. I haven't been able to locate the
> >documentation on the website.
> There's some documentation in the bptutorial
> (http://bioperl.org/Core/Latest/bptutorial.html), one question in the FAQ
> (http://bioperl.org/Core/Latest/FAQ.html), and some discussion in
> biodatabases.pod (http://bioperl.org/Core/Latest/biodatabases.html). There's
> also doc/howto/ FLAT-DATABASES-HOWTO.txt, which addresses Bio::DB::Flat in
> the context of OBDA. Even bpindex.PLS and bpfetch.PLS have useful
> documentation! Certainly scattered, that's not good. Perhaps I should I
> write a HOWTO...
> One thing we may consider, since we now have 3 systems for local file
> indexing, is removing at least one of the 3. Does anyone have any opinions
> on this? In the words of Lincoln, this is particularly "weedy". I'd be happy
> to write this HOWTO if we take steps towards consolidation. The first step
> would be marking something for future deprecation.
> Brian O.
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Morten Lindow
> Sent: Monday, June 23, 2003 11:05 AM
> To: Michael R Olson
> Cc: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] using LargeSeq objects
> Michael R Olson wrote:
> >I'm currently writing a program that runs BLAST, then gets the start and
> >stop base pairs of the alignment for a hit and goes to the database
> >BLAST was run against, and gets base pairs before and after the start
> >and stop.  Right now I use LargeSeq objects to read the entire database
> >into memory (or a chunk, if it's divided up) and then say
> >$str = $seq->subseq($start,$end);
> >
> Remember that the exact start and stop depends on the parameter setting
> for blast - falloff etc.
> >
> >Is this significantly less efficient than going into the database myself
> >and using seek, tell and read, because using SeqIO objects is much
> >easier, but right now it's very slow.
> >
> >
> Use Bio::DB::Fasta - it is very fast and just as convenient.
> Go: perldoc Bio::DB::Fasta. I haven't been able to locate the
> documentation on the website.
>  - Morten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
Duke University
jason at cgt.mc.duke.edu

More information about the Bioperl-l mailing list