[Biopython] gbwithparts not working on NCBI RefSeq?
Ivan Erill
ivan.erill at gmail.com
Thu Sep 22 17:21:35 UTC 2016
Hi Peter,
Indeed... Works perfectly. I had tried <<retmode="gbwithparts",
rettype="text">> and was wondering why I got unparsed results... Typing
fast makes for bad coding! ;-)
Thanks a ton.
Ivan
On Thu, Sep 22, 2016 at 12:47 PM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:
> Hi Ivan,
>
> I think you need to be using:
>
> ..., retype="gbwithparts", retmode="text", ...
>
> not:
>
> ..., retmode="gbwithparts", rettype="gb", ...
>
> See also https://www.biostars.org/p/79436/
>
> Sadly the NCBI Entrez API does not always give helpful
> error messages for things like this :(
>
> Peter
>
> On Thu, Sep 22, 2016 at 5:04 PM, Ivan Erill <ivan.erill at gmail.com> wrote:
> > Hi all,
> >
> > I am trying to download a full genome record from NCBI Entrez, using
> > 'gbwithparts' to get the full record. However, when I run my code, I get
> > only the 'header' portion of the record, without either the features or
> the
> > sequence at the bottom (even though a simple browser access to the record
> > (without requesting GenBank (full)) will at least provide the annotation.
> >
> > If I try the same with the equivalent GenBank accession for the record, I
> > get the full record (features and sequence).
> >
> > This is reproducible at least for several other bacterial genomes.
> >
> > I had previously downloaded RefSeq records using the same type of call,
> so I
> > was wondering whether this might be related to NCBI transitioning to
> HTTPS,
> > the phasing-out of GI numbers, or both. Before pestering the NCBI staff,
> > however, I thought I would ask whether there have been any changes to the
> > BioPython parser that might explain the effect.
> >
> > Here is the code:
> >
> > #***********************************************************
> *******************
> > # -*- coding: utf-8 -*-
> > from Bio import Entrez
> > Entrez.email ="ivan.erill at gmail.com"
> >
> > #RefSeq accession for Acetobacterium woodii DSM 1030, complete genome
> > #NC_016894 / 379009891
> > ncbi_handle =
> > Entrez.efetch(db='nuccore',id='379009891',retmode='gbwithparts',\
> > rettype='gb')
> > ncbi_record = ncbi_handle.read()
> > print 'End of RefSeq retrieved record: '
> > print ncbi_record[-44:]
> > #this gives me:
> > #--> End of RefSeq retrieved record:
> > #--> CONTIG join(CP002987.1:1..4044777)
> > #--> //
> > #showing that the record ends with a contig join statement
> > #using NC_016894 as 'id' gives same behavior
> >
> > #GenBank accession for Acetobacterium woodii DSM 1030, complete genome
> > #CP002987 / 375300680
> > ncbi_handle =
> > Entrez.efetch(db='nuccore',id='375300680',retmode='gbwithparts',\
> > rettype='gb')
> > ncbi_record = ncbi_handle.read()
> > print 'End of RefSeq retrieved record: '
> > print ncbi_record[-77:]
> > #this gives me:
> > #--> End of RefSeq retrieved record:
> > #--> 4044721 ttttacctgg taatgttttt ttatattatc aacatttatt cttataaatt
> > acttgat
> > #--> //
> > #showing that the record ends with the complete sequence
> > #using CP002987 as 'id' gives same behavior
> > #***********************************************************
> *******************
> >
> >
> > Any insights will be greatly appreciated. Thanks,
> >
> > Ivan
> >
> >
> > _______________________________________________
> > Biopython mailing list - Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20160922/9b48ab9d/attachment.html>
More information about the Biopython
mailing list