[Bioperl-l] Placement of LargePrimarySeq

James Gilbert jgrg@sanger.ac.uk
Mon, 18 Sep 2000 13:29:35 +0100 (BST)


Ewan,

This reminds me that I should put in a fix I've
thought of in SeqIO::fasta to stop the memory
exploding on very large sequences.

	James

On Sun, 17 Sep 2000, Ewan Birney wrote:

> 
> Tomorrow I have to do some comparisons of very large sequence files
> (around chromosome 1 size, if people are interested...). Although I could
> potentially use bioperl sequences on a machine with a huge amount of real
> memory, I decided to make a quick module that stores a sequence a
> file in /tmp/ and then executes the subseq command be using seek and read
> commands.
> 
> I have this object as Bio::LargePrimarySeq. Does anyone have any
> objections about having this object in the Bio:: area directly or should
> I put it somewhere else (bascially, what do people feel about cluttering
> up the top level Bio:: area, or should I make a Bio::Seq:: directory. 
> NB - there might be some other extensions, like Bio::CachePrimarySeq which
> can cache subseq calls to improve performance for LargePrimarySeq and
> the Ensembl database equivalents...)
> 
> 
> I need to write a SeqIO system for making this and also writing out very
> large fasta files. (it should step through the sequence one MB at a time
> using the subseq method, rather than getting the whole thing out as a 
> seq). Options:
> 
> 	(a) make a new Bio::SeqIO::bigfasta module, and ->next_seq would
> make sequences with LargePrimarySeq and ->write_seq would write with
> this subseq method
> 
> 	(b) parameterise Bio::SeqIO::fasta for both of these. (have to 
> handle boring don't use $/ stuff as reading can't put everything between
> '>' as a string, as the whole point is not to have the entire sequence as
> a string in memory)
> 
> I prefer (a) to (b).
> 
> 
> 
> I got to do this tomorrow, so if people have a view, make sure that view
> gets back to me soon....
> 
> 
> 
> 
> Of course this is all main trunk stuff, not on the branch.
> 
> 
> 
> 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919