[Biopython-dev] Creating a NCBIFastaIterator

Keith Hughitt keith.hughitt at gmail.com
Tue Oct 4 11:31:51 UTC 2011


Hi all,

I was thinking recently that it would be nice if the FASTA file reader were
able to check for known formats (e.g. NCBI) and then use that information to
choose better values for name, id, etc.

After some discussion with Peter Cock on GitHub, however, he convinced me
that this would be problematic in terms of backwards compatibility, and that
instead a better approach might be to add a new sub-format ("fasta-ncbi") to
the list of supported format readers.

This could go something like:

1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
files. Add it the the mapping of iterators.
2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
the result by assigning a new id, name, etc (other suggestions?)
3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)
4. Modify code that interacts with NCBI services which return FASTA files
and have it return a NCBIFasterIterator (First use a deprecation/warning to
let users know of the pending change?)

Does this sound like it would be a useful feature? What about the basic
approach outlined above? Any suggestions?

Keith



More information about the Biopython-dev mailing list