[Biopython] Querying NCBI

Fri Oct 23 14:33:35 UTC 2009

On Fri, Oct 23, 2009 at 2:11 PM, Michael S. Koeris
<michael.koeris at gmail.com> wrote:
>
> I am submitting 80 single queries - alternatively i can batch them but then
> when I try to parse them out from the records object I get:
>
>>>> records
> <addinfourl at 24146048 whose fp = <socket._fileobject object at 0x12bc8f0>>

That looks like records is a URL handle object - probably you've
mixed up your variable names.

> I don't know if this is a different object because it's batched
>
>>>> parser = GenBank.RecordParser()
>>>> recordGenBank = parser.parse(records)
> Traceback (most recent call last):
> ...
> line 762, in parse_footer
>    raise ValueError("Premature end of file in sequence data")
> ValueError: Premature end of file in sequence data

That suggests either a parser bug, or simply a network error meaning
the file was truncated.

As you are trying to download 80 queries, I would strongly recommend
you download them directly to files, and then parse the files. This also
means you'll only need to do the downloading once as you work on
the rest of the script (whatever you are trying to do with the data).

Peter