[Biopython] error with entrez id code
Dilara Ally
dilara.ally at gmail.com
Wed Oct 5 23:21:29 UTC 2011
Hi All
I've written a program to identify Entrez gene ids from a blastall that
I performed. The code is as follows:
from Bio import SeqIO
from Bio import Entrez
import os
import os.path
import re
import csv
dirname1="/Users/dally/Desktop/BlastFiles/annotate_me/"
dirname2="/Users/dally/Desktop/BlastFiles/annotated/"
allfiles=os.listdir(dirname1)
fanddir=[os.path.join(dirname1,fname) for fname in allfiles]
OutFileName="Contig_annotation.csv"
c=csv.writer(open(os.path.join(dirname2,OutFileName),"wb"))
for f in fanddir:
print f
InFile=open(f,'rU')
LineNumber=0
for Line in InFile:
print LineNumber#, ':', Line
ElementList=Line.split('\t')
geneid=ElementList[1]
#print geneid
Sections=geneid.split('|')
NewID=Sections[3]
from Bio import Entrez
from Bio import SeqFeature
Entrez.email = "dally at projects.sdsu.edu"
handle=Entrez.efetch(db="nucleotide", id=NewID,rettype="gb") #
rettype="gb" is GenBank format or XML format retmode="xml"
record=SeqIO.read(handle,"genbank")
handle.close()
#print record.id
lineage=record.annotations["taxonomy"]
c.writerow([ElementList[0],ElementList[1],ElementList[2],ElementList[3],ElementList[4],ElementList[5],ElementList[6],ElementList[7],ElementList[8],
ElementList[9],ElementList[10], NewID, record.id, record.description,
record.annotations["source"], lineage[0], lineage[1],lineage[2],
record.annotations["keywords"], ])
LineNumber=LineNumber+1
InFile.close()
The gene identifier looks like this: gi|2252639|gb|AC002292.1|AC002292.
But I"m only interested in the fourth component (AC002292.1)It runs
through a file with approximately 8000-10000 identifiers and then
extracts information from the associated genbank file.
The code seemed to run fine on my first file for the first 1287 lines
but then I got this error
> raceback (most recent call last):
> File "Ally_EntrezID_Search_Final_Script.py", line 38, in <module>
> record=SeqIO.read(handle,"genbank")
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
> line 604, in read
> first = iterator.next()
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
> line 532, in parse
> for r in i:
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
> line 440, in parse_records
> record = self.parse(handle, do_features)
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
> line 423, in parse
> if self.feed(handle, consumer, do_features):
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
> line 400, in feed
> misc_lines, sequence_string = self.parse_footer()
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/GenBank/Scanner.py",
> line 921, in parse_footer
> line = self.handle.readline()
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py",
> line 447, in readline
> data = self._sock.recv(self._rbufsize)
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
> line 533, in read
> return self._read_chunked(amt)
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
> line 586, in _read_chunked
> value.append(self._safe_read(amt))
> File
> "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py",
> line 637, in _safe_read
> raise IncompleteRead(''.join(s), amt)
> httplib.IncompleteRead: IncompleteRead(707 bytes read, 3147 more expected)
I'm new to python and biopython programming. So any advice would be
extremely appreciated.
Thanks.
Dilara
More information about the Biopython
mailing list