[Biopython-dev] SeqIO.InterlacedSequenceIterator

Peter Cock p.j.a.cock at googlemail.com
Fri Dec 14 10:58:49 UTC 2012


On Fri, Dec 14, 2012 at 10:07 AM, Lucas Sinclair <lucas.sinclair at me.com> wrote:
> Hello,
>
> Thanks for your response. Yes I looked at Bio.SeqIO.index, it makes
> an index, but it is held in memory. So it must be recomputed every
> time the interpreter is reloaded.

Yes, that is right.

> This step is wasting enough time for me that I would like to compute
> the index on my 50GB file once, and then be done with it. SQLite
> really is the technology of choice for such a problem...

Yes, which is why Bio.SeqIO.index_db() stores the index in SQLite.
The SeqIO chapter in the Tutorial does try to explain this and the
advantages compared to Bio.SeqIO.index(). Have you tried this yet?

> I suppose you agree storing all this sequence information in flat
> ascii files is not piratical.

It may not be optimal, but it is very practical (although at the scale
of next generation sequencing data less so).

Peter



More information about the Biopython-dev mailing list