[Bioperl-l] Bio::Index::Fasta vs Bio::DB::Fasta
Lincoln Stein
lstein@cshl.org
Mon, 14 Jan 2002 18:47:42 -0400
Hi Tony,
I wasn't proposing that Index::Fasta go away, but that DB::Fasta (my
own hack) exit. So relax!
Lincoln
Tony Cox writes:
> On Sat, 12 Jan 2002, Ewan Birney wrote:
>
> Just a note that I have a _lot_ of code and time invested internally in the
> Bio::Index::Fasta modules. It forms a fairly major plank of out internal
> sequence fetching architecture here in Sanger (along with the more complex
> functionality of SRS). Most of the time it is used for "normal" sequence
> fetching (EMBL clones etc) and not for chr-sized DNA chunks where the DB::Fasta
> really wins.It also compliments the Fastq modules that can be used to get
> matching quality data if it exists.
>
> In short does Index::Fasta _have _ to go?
>
> Tony
>
>
> +>On Fri, 11 Jan 2002, Lincoln Stein wrote:
> +>
> +>> Hi Folks,
> +>>
> +>> I've just recently become aware that Bio::Index::Fasta has very heavy
> +>> overlapping functionality with Bio::DB::Fasta, and this is likely to lead to
> +>> some user confusion down the road.
> +>>
> +>> I would remove Bio::DB::Fasta in favor of the Bio::Index version, except that
> +>> I don't think that Bio::Index::Fasta does the thing that first motivated
> +>> Bio::DB::Fasta, which was the ability to retrieve subsequences efficiently.
> +>> I have big (tens of megabyte) fasta files that contain
> +>> whole C. elegans chromosomes, and want to fetch a few base pairs from the
> +>> middle of them without reading the whole record into memory. Can
> +>> Bio::Index::Fasta do this?
> +>
> +>
> +>I am pretty sure it can't do this (which is why i believe you checked in
> +>DB::Fasta in the first place). Does DB::Fasta make assumptions about line
> +>length so it can SEEK to the right place?
> +>
> +>
> +>Clearly merging the two pieces would be great. It is not something I am
> +>overly worried about but it would be nice.
> +>
> +>
> +>Two routes:
> +>
> +>(I am assumming that we are still calling it Bio::Index::Fasta...)
> +>
> +> (a)
> +>
> +> Bio::Index::Fasta gives back a Bio::SeqI complianant object which is
> +>actually a new thing called Bio::Seq::LargeFastaFixedLineLength (silly
> +>name...). This object does not load the sequence into memory but executes
> +>
> +> $seq->subseq(100000,1000020);
> +>
> +> with a SEEK.
> +>
> +>
> +> (b) Bio::Index::Fasta will accept gets on slices
> +>
> +>
> +>Reading the documentation of Bio::DB::Fasta I notice that you have put
> +>nearly every access in (!) ---- I am always *so* impressed by your modules
> +>Lincoln, they nearly always have every route into them first off.
> +>
> +>
> +>
> +>So --- you have carte blanche to rearrange this area. As long as you are
> +>convinced that you wont be effecting exisiting FASTA indexes you can do
> +>what you like with Bio::Index::Fasta before 1.0 ---- it should work
> +>however with existing indexes - (ie, don't change the hash key
> +>representations etc).
> +>
> +>
> +>If you want to do a more serious reorganisation then it has got to be post
> +>1.0.
> +>
> +>
> +>
> +>Your choice of options and code.
> +>
> +>
> +>>
> +>> Lincoln
> +>>
> +>>
> +>
> +>_______________________________________________
> +>Bioperl-l mailing list
> +>Bioperl-l@bioperl.org
> +>http://bioperl.org/mailman/listinfo/bioperl-l
> +>
>
> ******************************************************
> Tony Cox Email:avc@sanger.ac.uk
> Sanger Institute WWW:www.sanger.ac.uk
> Wellcome Trust Genome Campus Webmaster
> Hinxton Tel: +44 1223 834244
> Cambs. CB10 1SA Fax: +44 1223 494919
> ******************************************************
>
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
Positions available at my lab: see http://stein.cshl.org/#hire
========================================================================