[Biopython] gbwithparts not working on NCBI RefSeq?

Thu Sep 22 16:47:09 UTC 2016

Hi Ivan,

I think you need to be using:

..., retype="gbwithparts", retmode="text", ...

not:

..., retmode="gbwithparts", rettype="gb", ...

See also https://www.biostars.org/p/79436/

Sadly the NCBI Entrez API does not always give helpful
error messages for things like this :(

Peter

On Thu, Sep 22, 2016 at 5:04 PM, Ivan Erill <ivan.erill at gmail.com> wrote:
> Hi all,
>
> I am trying to download a full genome record from NCBI Entrez, using
> 'gbwithparts' to get the full record. However, when I run my code, I get
> only the 'header' portion of the record, without either the features or the
> sequence at the bottom (even though a simple browser access to the record
> (without requesting GenBank (full)) will at least provide the annotation.
>
> If I try the same with the equivalent GenBank accession for the record, I
> get the full record (features and sequence).
>
> This is reproducible at least for several other bacterial genomes.
>
> I had previously downloaded RefSeq records using the same type of call, so I
> was wondering whether this might be related to NCBI transitioning to HTTPS,
> the phasing-out of GI numbers, or both. Before pestering the NCBI staff,
> however, I thought I would ask whether there have been any changes to the
> BioPython parser that might explain the effect.
>
> Here is the code:
>
> #******************************************************************************
> # -*- coding: utf-8 -*-
> from Bio import Entrez
> Entrez.email ="ivan.erill at gmail.com"
>
> #RefSeq accession for Acetobacterium woodii DSM 1030, complete genome
> #NC_016894 / 379009891
> ncbi_handle =
> Entrez.efetch(db='nuccore',id='379009891',retmode='gbwithparts',\
>                             rettype='gb')
> ncbi_record = ncbi_handle.read()
> print 'End of RefSeq retrieved record: '
> print ncbi_record[-44:]
> #this gives me:
> #--> End of RefSeq retrieved record:
> #--> CONTIG      join(CP002987.1:1..4044777)
> #--> //
> #showing that the record ends with a contig join statement
> #using NC_016894 as 'id' gives same behavior
>
> #GenBank accession for Acetobacterium woodii DSM 1030, complete genome
> #CP002987 / 375300680
> ncbi_handle =
> Entrez.efetch(db='nuccore',id='375300680',retmode='gbwithparts',\
>                             rettype='gb')
> ncbi_record = ncbi_handle.read()
> print 'End of RefSeq retrieved record: '
> print ncbi_record[-77:]
> #this gives me:
> #--> End of RefSeq retrieved record:
> #-->   4044721 ttttacctgg taatgttttt ttatattatc aacatttatt cttataaatt
> acttgat
> #--> //
> #showing that the record ends with the complete sequence
> #using CP002987 as 'id' gives same behavior
> #******************************************************************************
>
>
> Any insights will be greatly appreciated. Thanks,
>
> Ivan
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython