[Biopython-dev] SwissProt fails to parse the current uniprot_sprot data file?

Peter Cock p.j.a.cock at googlemail.com
Tue Oct 21 06:16:29 UTC 2014


Hi Jinghua,

Yes, this problem has already been reported but not fixed yet:
https://github.com/biopython/biopython/issues/369

It shouldn't be too complicated to modify the code to
cope with both the old and new style lines - do you want
to try?

Thanks for reporting this,

Peter

On Tue, Oct 21, 2014 at 4:45 AM, Jinghua (Frank) Feng
<Jinghua.Feng at adelaide.edu.au> wrote:
> Hello,
>
> It looks like SwissProt can parse old version uniprot_sprot data file, but
> fails with the current version data file. Below is how to replicate the
> error (Biopython version is '1.64').
>
> Regards,
>
> Jinghua
> ----------------------
>
> First download the current uniprot_sprot data file (~72 MB in size) at
> ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/uniprot_sprot_human.dat.gz
>
> Then in IPython, using SwissProt to parse the downloaded data file:
>
> In [1]: from Bio import SwissProt
>
> In [2]: import gzip
>
> In [3]: inhandle = gzip.open('./uniprot_sprot_human.dat.gz')
>
> In [4]: reader = SwissProt.parse(inhandle)
>
> In [5]: for r in reader:
>    ...:     pass
>    ...:
> ---------------------------------------------------------------------------
> AssertionError                            Traceback (most recent call last)
> <ipython-input-5-c04351d992d2> in <module>()
> ----> 1 for r in reader:
>       2     pass
>       3
>
> /usr/local/lib/python2.7/dist-packages/Bio/SwissProt/__init__.pyc in
> parse(handle)
>     115 def parse(handle):
>     116     while True:
> --> 117         record = _read(handle)
>     118         if not record:
>     119             return
>
> /usr/local/lib/python2.7/dist-packages/Bio/SwissProt/__init__.pyc in
> _read(handle)
>     182         elif key == 'RN':
>     183             reference = Reference()
> --> 184             _read_rn(reference, value)
>     185             record.references.append(reference)
>     186         elif key == 'RP':
>
> /usr/local/lib/python2.7/dist-packages/Bio/SwissProt/__init__.pyc in
> _read_rn(reference, rn)
>     407
>     408 def _read_rn(reference, rn):
> --> 409     assert rn[0] == '[' and rn[-1] == ']', "Missing brackets %s" %
> rn
>     410     reference.number = int(rn[1:-1])
>     411
>
> AssertionError: Missing brackets [1] {ECO:0000305,
> ECO:0000312|EMBL:AAK11482.1}
>
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev


More information about the Biopython-dev mailing list