[Bioperl-l] Placement of LargePrimarySeq
Ewan Birney
birney@ebi.ac.uk
Sun, 17 Sep 2000 23:07:51 +0100 (BST)
Tomorrow I have to do some comparisons of very large sequence files
(around chromosome 1 size, if people are interested...). Although I could
potentially use bioperl sequences on a machine with a huge amount of real
memory, I decided to make a quick module that stores a sequence a
file in /tmp/ and then executes the subseq command be using seek and read
commands.
I have this object as Bio::LargePrimarySeq. Does anyone have any
objections about having this object in the Bio:: area directly or should
I put it somewhere else (bascially, what do people feel about cluttering
up the top level Bio:: area, or should I make a Bio::Seq:: directory.
NB - there might be some other extensions, like Bio::CachePrimarySeq which
can cache subseq calls to improve performance for LargePrimarySeq and
the Ensembl database equivalents...)
I need to write a SeqIO system for making this and also writing out very
large fasta files. (it should step through the sequence one MB at a time
using the subseq method, rather than getting the whole thing out as a
seq). Options:
(a) make a new Bio::SeqIO::bigfasta module, and ->next_seq would
make sequences with LargePrimarySeq and ->write_seq would write with
this subseq method
(b) parameterise Bio::SeqIO::fasta for both of these. (have to
handle boring don't use $/ stuff as reading can't put everything between
'>' as a string, as the whole point is not to have the entire sequence as
a string in memory)
I prefer (a) to (b).
I got to do this tomorrow, so if people have a view, make sure that view
gets back to me soon....
Of course this is all main trunk stuff, not on the branch.
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------