[Bioperl-l] Bio::Index::Fasta vs Bio::DB::Fasta
Ewan Birney
birney@ebi.ac.uk
Sat, 12 Jan 2002 10:18:49 +0000 (GMT)
On Fri, 11 Jan 2002, Lincoln Stein wrote:
> Hi Folks,
>
> I've just recently become aware that Bio::Index::Fasta has very heavy
> overlapping functionality with Bio::DB::Fasta, and this is likely to lead to
> some user confusion down the road.
>
> I would remove Bio::DB::Fasta in favor of the Bio::Index version, except that
> I don't think that Bio::Index::Fasta does the thing that first motivated
> Bio::DB::Fasta, which was the ability to retrieve subsequences efficiently.
> I have big (tens of megabyte) fasta files that contain
> whole C. elegans chromosomes, and want to fetch a few base pairs from the
> middle of them without reading the whole record into memory. Can
> Bio::Index::Fasta do this?
I am pretty sure it can't do this (which is why i believe you checked in
DB::Fasta in the first place). Does DB::Fasta make assumptions about line
length so it can SEEK to the right place?
Clearly merging the two pieces would be great. It is not something I am
overly worried about but it would be nice.
Two routes:
(I am assumming that we are still calling it Bio::Index::Fasta...)
(a)
Bio::Index::Fasta gives back a Bio::SeqI complianant object which is
actually a new thing called Bio::Seq::LargeFastaFixedLineLength (silly
name...). This object does not load the sequence into memory but executes
$seq->subseq(100000,1000020);
with a SEEK.
(b) Bio::Index::Fasta will accept gets on slices
Reading the documentation of Bio::DB::Fasta I notice that you have put
nearly every access in (!) ---- I am always *so* impressed by your modules
Lincoln, they nearly always have every route into them first off.
So --- you have carte blanche to rearrange this area. As long as you are
convinced that you wont be effecting exisiting FASTA indexes you can do
what you like with Bio::Index::Fasta before 1.0 ---- it should work
however with existing indexes - (ie, don't change the hash key
representations etc).
If you want to do a more serious reorganisation then it has got to be post
1.0.
Your choice of options and code.
>
> Lincoln
>
>