[Bioperl-l] using LargeSeq objects

Brian Osborne brian_osborne at cognia.com
Wed Jun 25 08:24:04 EDT 2003


Jason,

Both Bio::Index::Fasta and Bio::DB::Fasta allow the user to specify their
own id parser, and both allow the user to specify multiple ids per fasta
header. Bio::DB::Flat doesn't allow the user to specify substrings as ids,
that's a shortcoming but it could be by design as such a capability would
complicate the format of the seqdatabase.ini file. Recall that to set up
OBDA all one has to do is to create this file and put it in a known
location, the indexing is automatic after that.

To me it comes down to a comparison of Bio::Index::Fasta and Bio::DB::Fasta.
For example, if BioPAN materializes, what would go into the "core"? One or
both? If one which? In my opinion there should only be one, and this would
enable us to focus the documentation, if nothing else. Keeping something
outside of the core doesn't mean that thing is bad, we deprecated
Tools::Blast* at a time when it was better than SearchIO, in many functional
respects. We chose SearchIO because it was the better platform to support
and extend.

Brian O.

-----Original Message-----
From: bioperl-l-bounces at portal.open-bio.org
[mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Jason Stajich
Sent: Monday, June 23, 2003 5:28 PM
To: Brian Osborne
Cc: Bioperl
Subject: RE: [Bioperl-l] using LargeSeq objects

There are still some shortcomings to Bio::DB::Flat I've found that makes
Bio::Index::Fasta still too useful for me to throw out.  This includes the
ease of specifying your own ID parser for fasta.  I'm still not clear if
this can be done to supply multiple IDs for a sequence as well.  I know
that it has the capability for do.  I'm sure it is also confusing to
people as to which indexing (bdb, binaryindex) schemes are appropriate,
etc so would be good to review these.

Bio::DB::Fasta also does things differently because it won't suck an
entire sequence into memory while Bio::Index::Fasta will - I don't think
Bio::DB::Flat will do subsequence offsets as part of its implementation,
but that needs to be investigated as well (and documented) if it is
possible.

-jason

On Mon, 23 Jun 2003, Brian Osborne wrote:

> Morten,
>
> >Go: perldoc Bio::DB::Fasta. I haven't been able to locate the
> >documentation on the website.
>
> There's some documentation in the bptutorial
> (http://bioperl.org/Core/Latest/bptutorial.html), one question in the FAQ
> (http://bioperl.org/Core/Latest/FAQ.html), and some discussion in
> biodatabases.pod (http://bioperl.org/Core/Latest/biodatabases.html).
There's
> also doc/howto/ FLAT-DATABASES-HOWTO.txt, which addresses Bio::DB::Flat in
> the context of OBDA. Even bpindex.PLS and bpfetch.PLS have useful
> documentation! Certainly scattered, that's not good. Perhaps I should I
> write a HOWTO...
>
> One thing we may consider, since we now have 3 systems for local file
> indexing, is removing at least one of the 3. Does anyone have any opinions
> on this? In the words of Lincoln, this is particularly "weedy". I'd be
happy
> to write this HOWTO if we take steps towards consolidation. The first step
> would be marking something for future deprecation.
>
> Brian O.
>
> -----Original Message-----
> From: bioperl-l-bounces at portal.open-bio.org
> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of Morten Lindow
> Sent: Monday, June 23, 2003 11:05 AM
> To: Michael R Olson
> Cc: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] using LargeSeq objects
>
> Michael R Olson wrote:
>
> >I'm currently writing a program that runs BLAST, then gets the start and
> >stop base pairs of the alignment for a hit and goes to the database
> >BLAST was run against, and gets base pairs before and after the start
> >and stop.  Right now I use LargeSeq objects to read the entire database
> >into memory (or a chunk, if it's divided up) and then say
> >$str = $seq->subseq($start,$end);
> >
> Remember that the exact start and stop depends on the parameter setting
> for blast - falloff etc.
>
> >
> >Is this significantly less efficient than going into the database myself
> >and using seek, tell and read, because using SeqIO objects is much
> >easier, but right now it's very slow.
> >
> >
> Use Bio::DB::Fasta - it is very fast and just as convenient.
>
> Go: perldoc Bio::DB::Fasta. I haven't been able to locate the
> documentation on the website.
>
>  - Morten
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
_______________________________________________
Bioperl-l mailing list
Bioperl-l at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list