[Biopython-dev] Problem with SeqIO uniprot-xml on older XML files?
    Peter Cock 
    p.j.a.cock at googlemail.com
       
    Fri Sep 27 15:47:11 UTC 2013
    
    
  
Hi all,
There seems to be a problem parsing older UniProt XML files,
see http://seqanswers.com/forums/showthread.php?t=33921
Could anyone have a look at this? Somehow the start/end
of each record does not seem to be recognised here,
>>> from Bio import SeqIO
>>> r = next(SeqIO.parse("uniref90.xml", "uniprot-xml"))
(takes ages, presumably scanning whole file)
Note the indexing code also breaks:
>>> from Bio import SeqIO
>>> d = SeqIO.index("uniref90.xml", "uniprot-xml")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pc40583/lib/python2.7/site-packages/Bio/SeqIO/__init__.py",
line 808, in index
    key_function, repr, "SeqRecord")
  File "/home/pc40583/lib/python2.7/site-packages/Bio/File.py", line
250, in __init__
    for key, offset, length in offset_iter:
  File "/home/pc40583/lib/python2.7/site-packages/Bio/SeqIO/_index.py",
line 401, in __iter__
    % (start_offset, end_offset))
ValueError: Did not find <accession> line in bytes 283 to 38649
Thanks,
Peter
    
    
More information about the Biopython-dev
mailing list