[Biopython] gbwithparts not working on NCBI RefSeq?
Ivan Erill
ivan.erill at gmail.com
Thu Sep 22 16:04:22 UTC 2016
Hi all,
I am trying to download a full genome record from NCBI Entrez, using
'gbwithparts' to get the full record. However, when I run my code, I get
only the 'header' portion of the record, without either the features or the
sequence at the bottom (even though a simple browser access to the record
(without requesting GenBank (full)) will at least provide the annotation.
If I try the same with the equivalent GenBank accession for the record, I
get the full record (features and sequence).
This is reproducible at least for several other bacterial genomes.
I had previously downloaded RefSeq records using the same type of call, so
I was wondering whether this might be related to NCBI transitioning to
HTTPS, the phasing-out of GI numbers, or both. Before pestering the NCBI
staff, however, I thought I would ask whether there have been any changes
to the BioPython parser that might explain the effect.
Here is the code:
#******************************************************************************
# -*- coding: utf-8 -*-
from Bio import Entrez
Entrez.email ="ivan.erill at gmail.com"
#RefSeq accession for Acetobacterium woodii DSM 1030, complete genome
#NC_016894 / 379009891
ncbi_handle =
Entrez.efetch(db='nuccore',id='379009891',retmode='gbwithparts',\
rettype='gb')
ncbi_record = ncbi_handle.read()
print 'End of RefSeq retrieved record: '
print ncbi_record[-44:]
#this gives me:
#--> End of RefSeq retrieved record:
#--> CONTIG join(CP002987.1:1..4044777)
#--> //
#showing that the record ends with a contig join statement
#using NC_016894 as 'id' gives same behavior
#GenBank accession for Acetobacterium woodii DSM 1030, complete genome
#CP002987 / 375300680
ncbi_handle =
Entrez.efetch(db='nuccore',id='375300680',retmode='gbwithparts',\
rettype='gb')
ncbi_record = ncbi_handle.read()
print 'End of RefSeq retrieved record: '
print ncbi_record[-77:]
#this gives me:
#--> End of RefSeq retrieved record:
#--> 4044721 ttttacctgg taatgttttt ttatattatc aacatttatt cttataaatt
acttgat
#--> //
#showing that the record ends with the complete sequence
#using CP002987 as 'id' gives same behavior
#******************************************************************************
Any insights will be greatly appreciated. Thanks,
Ivan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160922/591d23a8/attachment.html>
More information about the Biopython
mailing list