[Biopython-dev] Merging Bio.SeqIO SFF support?

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Tue Mar 2 14:36:32 UTC 2010


On Tue, Mar 2, 2010 at 8:01 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> Kevin wrote:> My own bias is to encode the quality scores and flowgrams in
> numpy
> > arrays rather than lists, however I understand that the goal is to keep
> > the external dependencies to a minimum (although NumPy is required
> > elsewhere).
>
> Yes, I did wonder about using NumPy here but wanted to ensure that
> the core of Biopython remains without an external dependency here.
>

In addition to not creating many little objects, my leanings toward using
NumPy are also due to the generality of tricks like the following to recode
quality scores to Sanger ASCII-33 format:

    sffqual  =
np.array(rec.letter_annotations['phred_quality'],dtype=np.uint8)
    sffqual += 33
    sffqual  = sffqual.tostring()

That said, the alternatives aren't that slow and small integers are shared
from a pre-allocated pool, so this is not as big a concern.

-Kevin



More information about the Biopython-dev mailing list