[BioPython] Entrez.efetch
Peter
biopython at maubp.freeserve.co.uk
Wed Oct 8 14:02:54 UTC 2008
On Wed, Oct 8, 2008 at 2:48 PM, Stephan <stephan80 at mac.com> wrote:
>
> Hi guys,
>
> OK, there is two different problems here that Brad and Peter independently
> pointed out to me. Peter, you are right that not closing the file actually
> caused the error. Your hint fixes that, thanks.
Great.
> But that doesnt fix that there is a part of line 3 missing over the download,
> and although I actually updated to the newest cvs-version of biopython as
> Brad suggested (sorry for accidently putting my answer not on the mailing-list)
> that does not fix that line...
This is the issue where you get different GenBank files using
Bio.Entrez.efetch and a "manual download"? First of all what did you
mean by "manual download" - for example FTP (what URL), or from a
browser? Secondly, does this difference to the ACCESSION line (line
3) actually have any ill effects?
To be clear using Bio.Entrez.efetch as in your script, I get this:
LOCUS NC_004353 1351857 bp DNA linear INV 14-MAY-2008
DEFINITION Drosophila melanogaster chromosome 4, complete sequence.
ACCESSION NC_004353
VERSION NC_004353.3 GI:116010290
PROJECT GenomeProject:164
KEYWORDS .
SOURCE Drosophila melanogaster (fruit fly)
ORGANISM Drosophila melanogaster
...
Using FTP from ftp://ftp.ncbi.nih.gov/genomes/Drosophila_melanogaster/CHR_4/NC_004353.gbk
I get something similar but different:
LOCUS NC_004353 1351857 bp DNA linear INV 14-MAY-2008
DEFINITION Drosophila melanogaster chromosome 4, complete sequence.
ACCESSION NC_004353
VERSION NC_004353.3 GI:116010290
KEYWORDS .
SOURCE Drosophila melanogaster (fruit fly)
ORGANISM Drosophila melanogaster
...
Notice the FTP file lacks the PROJECT line, and also differs slightly
in its feature table.
Using the NCBI website I suspect you can get other slight variations
(like the different ACCESSION line you reported).
Peter
More information about the Biopython
mailing list