[BioPython] Querying Entrez Gene
Palle Villesen
palle at birc.au.dk
Thu Oct 12 06:59:26 UTC 2006
Luca Beltrame wrote:
> Hello.
> I'm currently in need of querying the Entrez Gene database using a list of IDs
> I have. After searching in the Biopython documentation, I have found no
> indication of whether that is possible or not.
> Is there a way to query NCBI's Entrez Gene database?
> Thanks in advance.
> _______________________________________________
> BioPython mailing list - BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
>
EUtils are also a part of BioPython. You should look in the biopython
tutorial for how to use it. Below is my own small "mass downloader"
utility in python. (Running on a non-administrator install of both
python and biopython).
The basic function/module you need is the HistoryClient that can search
and retrieve large sets - instead of looping through all your ids one at
a time. Anyway - check the tutorial, it's quite good (at least for a
person with the same very basic python knowledge as me).
sincerely,
Palle Villesen, BiRC, DK
Program: gb_search.py
-------------------
#!/web/biopv/usr/local/bin/python
import sys
import time
biopython_path='/web/biopv/usr/local/lib/python'
sys.path.insert(0,biopython_path)
def help():
from Bio.EUtils import Config
dbs=" ".join(Config.databases.keys())
help= """
GenBank retrieve tool.
Usage:
gb_search.py QUERY [RECS] [DB] [FORMAT]
QUERY : the entrez query enclosed in " "
RECS : Number of records/sequences to get at a time (default=20)
DB : Database, (default='nucleotide')
(%s)
Format : Record format (default='fasta', but 'docsum', 'brief', 'gi' -
and many others are available)
""" % dbs
sys.exit(help)
return 0
# Default values
step=20
database="nucleotide"
format="fasta"
time2sleep=3
if len(sys.argv) ==1:
help()
search_term=sys.argv[1]
if len(sys.argv)>2 : step=int(sys.argv[2])
if len(sys.argv)>3 : database=sys.argv[3]
if len(sys.argv)>4 : format=sys.argv[4]
if len(sys.argv)>5 : time2sleep=int(sys.argv[5])
from Bio.EUtils import HistoryClient
s = HistoryClient.HistoryClient().search(search_term,db=database)
print >>sys.stderr, "Getting %s seqs, %s sequences at a time" %
(len(s),step)
i=0
while i<len(s):
print >>sys.stderr, "Getting sequences from ",i,"to",min(i+step,len(s)),
print s[i:i+step].efetch(retmode = "text", rettype = format).read()
if i+step > len(s):
print >>sys.stderr, "..done"
break
print >>sys.stderr, "...done (sleeping %s seconds)" % time2sleep
i+=step
time.sleep(time2sleep)
-------------------------------------
--
-._ _.--'"`'--._ _.--'"`'--._ _.--'"`'--._ _
'-:`.'|`|"':-. '-:`.'|`|"':-. '-:`.'|`|"':-. '.` : '.
'. '. | | | |'. '. | | | |'. '. | | | |'. '.: '. '.
: '. '.| | | | '. '.| | | | '. '.| | | | '. '. : '. `.
' '. `.:_ | :_.' '. `.:_ | :_.' '. `.:_ | :_.' '. `.' `.
`-..,..-' `-..,..-' `-..,..-' ` `
Palle Villesen, Ph.D.
BiRC, Build. 090, University of Aarhus
DK - 8000 Aarhus C, Denmark
palle.retrosearch.dk - +45 61708600
---------------------------------------------------------------------
More information about the Biopython
mailing list