[Biopython] Problems with reading Swiss format records (swissprot specific date fields)
Jan T Kim
jttkim at googlemail.com
Mon Mar 4 15:40:07 UTC 2013
Dear All,
trying to parse the attached Swissprot record gives me a stack trace:
Traceback (most recent call last):
File "./swisstest", line 7, in <module>
e = Bio.SeqIO.read(sys.argv[1], 'swiss')
File "/usr/lib/pymodules/python2.7/Bio/SeqIO/__init__.py", line 599, in read
first = iterator.next()
File "/usr/lib/pymodules/python2.7/Bio/SeqIO/__init__.py", line 537, in parse
for r in i:
File "/usr/lib/pymodules/python2.7/Bio/SeqIO/SwissIO.py", line 97, in SwissIterator
annotations['date'] = swiss_record.created[0]
TypeError: 'NoneType' object has no attribute '__getitem__'
The problem is at line 99 (rather than 97)of
https://github.com/biopython/biopython/blob/master/Bio/SeqIO/SwissIO.py :
annotations['date'] = swiss_record.created[0]
without an "if swiss_record.created is not None" test or something
similar. The parse function of Bio.SwissProt initialises the created
instance variable to None, and only if a "DT" record containing the
string "INTEGRATED" (case insensitive) is found, created is set to that
date.
The same kind of problem occurs with the sequence_update variable in the
next statement:
annotations['date_last_sequence_update'] = swiss_record.sequence_update[0]
Would it be sensible to set the 'date' and 'date_last_sequence_update'
entries of the annotations dictionary only if the values are actually
found in the swiss_record? I understand that with a genuine SwissProt
record, they should always be there, but this happened to me when working
on files generated from the refseq protein database using the EMBOSS
seqret program with -osformat=swiss, which doesn't seem like an entirely
exotic use case to me.
Best regards, Jan
--
+- Jan T. Kim -------------------------------------------------------+
| email: jttkim at gmail.com |
| WWW: http://www.jtkim.dreamhosters.com/ |
*-----=< hierarchical systems are for files, not for humans >=-----*
-------------- next part --------------
ID ZP_10312765 Reviewed; 498 AA.
AC ZP_10312765;
DT 27-JUN-2012, entry version 1.
DE hypothetical protein FraQA3DRAFT_6339 [Frankia sp. QA3].
OS Frankia sp. QA3.
RN [1]
RP 1-498
RN [2]
RP 1-498
KW .
FT REGION 1 498 Frankia sp. QA3. QA3. taxon:710111.
FT REGION 1 498 hypothetical protein. 53620.
FT REGION 1 498 FraQA3DRAFT_6339.
FT complement(NZ_CM001489.1:7362098..7363594
FT ). 11.
SQ SEQUENCE 498 AA; 53751 MW; 39E328894991F8AC CRC64;
mhphrvhpsr vhpspehpsp ehlsrehqsr prhataaara arsrpprphr agrrarrddr
crqrsqraac lpggcpttcr dgrraptdrg hgshapgrgp taavpdlavp agcagpgrgg
vgarhrrpaa artapgsqpt aaarrstags rvprgpgrrr sattrrgrrr prdalaarpa
pvrvsvhgps grgpgrarrr pcrirgrchh dapggratap avggaprlvh rcggrrwqra
rpgrggrdgp amptprssvp epgppgprhp rgpsrrpahp hwnptlggrr wpgvhrrdgr
hgahrrrtip rpagrptrgr sgphrpapvr paagrhagng rcrpdhgrir rqppdagpas
rsahthrgsr rlrrpggrps grrsdartgl arrsaagadq twpaprrwrh rrtnhrgrgs
apgrhrsaap ptvpvphpar srpphdhgsg hprthrpgpt ghhaggrrpa rapghaagag
rrrtapmrra rslclpsp
//
More information about the Biopython
mailing list