[Bioperl-l] Placement of LargePrimarySeq
James Gilbert
jgrg@sanger.ac.uk
Mon, 18 Sep 2000 18:02:02 +0100 (BST)
Ewan,
I've looked at the problem, and it isn't where I
thought it was in the code.
I made a test sequence 40Mbp long. I can read it
into a string, but when I try to copy the string,
I get the "Out of memory!" error. (And this is on
a machine with 1Gb RAM).
Perhaps Perl's memory allocator is calculating a
silly number. It might be possible to write a
PrimarySeqI object as a C extension, with a more
conserative memory allocaion scheme.
James
On Mon, 18 Sep 2000, James Gilbert wrote:
> Ewan,
>
> This reminds me that I should put in a fix I've
> thought of in SeqIO::fasta to stop the memory
> exploding on very large sequences.
>
> James
>
> On Sun, 17 Sep 2000, Ewan Birney wrote:
>
> >
> > Tomorrow I have to do some comparisons of very large sequence files
> > (around chromosome 1 size, if people are interested...). Although I could
> > potentially use bioperl sequences on a machine with a huge amount of real
> > memory, I decided to make a quick module that stores a sequence a
> > file in /tmp/ and then executes the subseq command be using seek and read
> > commands.
> >
> > I have this object as Bio::LargePrimarySeq. Does anyone have any
> > objections about having this object in the Bio:: area directly or should
> > I put it somewhere else (bascially, what do people feel about cluttering
> > up the top level Bio:: area, or should I make a Bio::Seq:: directory.
> > NB - there might be some other extensions, like Bio::CachePrimarySeq which
> > can cache subseq calls to improve performance for LargePrimarySeq and
> > the Ensembl database equivalents...)
> >
> >
> > I need to write a SeqIO system for making this and also writing out very
> > large fasta files. (it should step through the sequence one MB at a time
> > using the subseq method, rather than getting the whole thing out as a
> > seq). Options:
> >
> > (a) make a new Bio::SeqIO::bigfasta module, and ->next_seq would
> > make sequences with LargePrimarySeq and ->write_seq would write with
> > this subseq method
> >
> > (b) parameterise Bio::SeqIO::fasta for both of these. (have to
> > handle boring don't use $/ stuff as reading can't put everything between
> > '>' as a string, as the whole point is not to have the entire sequence as
> > a string in memory)
> >
> > I prefer (a) to (b).
> >
> >
> >
> > I got to do this tomorrow, so if people have a view, make sure that view
> > gets back to me soon....
> >
> >
> >
> >
> > Of course this is all main trunk stuff, not on the branch.
> >
> >
> >
> >
> >
> > -----------------------------------------------------------------
> > Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> > <birney@ebi.ac.uk>.
> > -----------------------------------------------------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
>
> James G.R. Gilbert
> The Sanger Centre
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge Tel: 01223 494906
> CB10 1SA Fax: 01223 494919
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge Tel: 01223 494906
CB10 1SA Fax: 01223 494919