[Biopython-dev] Creating a NCBIFastaIterator

Fri Oct 7 16:00:52 UTC 2011

On Fri, Oct 7, 2011 at 4:38 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> Adding my unsolicited opinion here, what do y'all think of this NCBIFasta
> parser being a more general "callback" parser, where a function passed to
> read() or write() translates some arbitrary delimited-text into ...
>
> This would be similar to key_function in SeqIO.to_dict() and would shift the
> responsibility of handling variation in formats to the user. Alternatively,
> a few functions to parse different styles of description lines could be
> included in the module.
>
> Andrew

Hi Andrew,

Interesting idea, although it doesn't fit that well with the current
(deliberately) simple high level Bio.SeqIO.parse/read API,
that doesn't mean we can't do it (see Bio.Phylo.parse).

In this case I fail to see what benefit this gives over the current
situation, where the user can do this themselves with the
current FASTA parser,

e.g. With a function and a generator expression,

records = (do_ncbi_my_way(record) for record in SeqIO.parse(filename, "fasta"))

or more simply within a loop:

for record in SeqIO.parse(filename, "fasta")):
    do_ncbi_my_way(record)
    #Do stuff with record

etc.

Maybe it is down to personal preference of coding style?

I would much prefer a new "fasta-ncbi" parser in SeqIO
that handled all the documented NCBI FASTA identifiers.

I'm being negative here - but please don't let that deter you
from posting ideas. This is a public list and we/I welcome
constructive criticism and alternative ideas to the table.

Regards,

Peter