[Biopython] Iterate Entrez.esearch in Biopython
Jarrod Scott
jjscott at uwalumni.com
Tue Dec 8 13:33:38 UTC 2015
Hello Peter,
Fantastic! Here is the modified code based on your recommendation with some
example genes/taxa. Seems to work exactly as it should. The nested for loop
was the key.Thanks so much for the assistance.
Jarrod
import sys
import time
from Bio import Entrez
Entrez.email = "jjscott at uwalumni.com"
list_of_species = ["Acrochaetium daviesii", "Anadyomene stellata", "Codium
decorticatum"]
list_of_genes = ["rbcL", "rps3", "tufA", "28S rRNA"]
for species in list_of_species:
for gene in list_of_genes:
terms = '"{0}"[orgn] AND {1}[Gene]'.format(species, gene)
result = Entrez.esearch(db = 'nucleotide', term=terms)
record = Entrez.read(result)
record["Count"]
record["IdList"]
On Tue, Dec 8, 2015 at 4:36 AM, Peter Cock <p.j.a.cock at googlemail.com>
wrote:
> Hi Jarrod
>
> The simplest solution is two for loops, nested. i.e.
>
> # do imports
> # load lists, or set them like this
> list_of_species = ["E. coli", "H. sapiens", "M. tardes"]
> list_of_genes = ["yyy", "zzz", "aaa"]
> for species in list_of_species:
> for gene in list_of_genes:
> # do the search for this species, gene combination
>
> Depending on what you want to do with the results, you
> might record the counts in a dictionary in memory, or
> maybe write them to a file.
>
> Is that enough to make progress or do you need a bit more
> guidance?
>
> Also, when you build the Entrez query string, the species
> name should (I think) be quoted as the full name rather
> than with an abbreviated genus, e.g.
>
> "Homo sapiens"[ORGN] AND yyy[GENE]
>
> To do that in python, the easiest way to get the double
> quotes is to use single quotes for the Python string,
>
> '"{0}"[ORGN] AND {1}[GENE]'.format(species, gene)
>
> That is: single quote, double quote, open braces, zero, ...
>
> Peter
>
>
> On Tue, Dec 8, 2015 at 3:04 AM, Jarrod Scott <jjscott at uwalumni.com> wrote:
> > Greetings all.
> >
> > I have 1) a list of species and 2) a list of genes. I am trying to use
> > Entrez.esearch within Biopython to get a list of accession numbers from
> NCBI
> > for each gene from each species. I wrote a small code that can do it for
> one
> > gene and one species but have been unsuccessful at creating a code to
> > iterate through the lists. Here is an example of the code that works.
> This
> > returns '11' hits which matches a simple GenBank search. Any help on how
> to
> > iterate through two list would be most appreciated.
> >
> > Jarrod
> >
> > import sys
> > import time
> > from Bio import Entrez
> > Entrez.email = "jjscott at uwalumni.com"
> > gene = 'tufA'
> > species = 'Codium decorticatum'
> > terms = "{0}[orgn] AND {1}[Gene]".format(species, gene)
> > handle = Entrez.esearch(db = "nucleotide", term = terms)
> > record = Entrez.read(handle)
> > record["Count"]
> > record["IdList"]
> >
> >
> >
> > Example files:
> >
> > Species:
> >
> > E. coli
> > H. Sapien
> > M. tardes
> >
> > Genes:
> > yyy
> > zzz
> > aaa
> >
> >
> > biopython-1.66
> > Python 2.7.9 :: Anaconda 2.2.0 (x86_64)
> > OS X Yosemite 10.10.2
> >
> > _______________________________________________
> > Biopython mailing list - Biopython at mailman.open-bio.org
> > http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20151208/29f97500/attachment.html>
More information about the Biopython
mailing list