[Biopython-dev] [Bug 1942] GenBank RecordParser fails on particular qualifier structure

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Feb 9 13:52:27 EST 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=1942





------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2006-02-09 13:52 -------
This does seem to work for me using a freshly downloaded NC_007633.gbk that
starts:

LOCUS       NC_007633            1010023 bp    DNA     circular BCT 18-JAN-2006

It has the blank line 7114 you reported in locus MCAP_0327

Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from Bio.GenBank import RecordParser
>>> parser = RecordParser()
>>> record = parser.parse(file('NC_007633.gbk'))
WARNING - Ignoring an unknown line type, PROJECT found:
PROJECT     GenomeProject:16208

>>> print record.features[644]
     CDS             391217..391771
                     /locus_tag="MCAP_0327"
                     /note="Similar non-mycoplasma proteins have and additional
                     120 amino acids at the COOH end; identified by similarity
                     to SP:P54575; match to protein family HMM PF06574"
                     /codon_start=1
                     /transl_table=4
                     /product="riboflavin kinase (flavokinase) domain protein"
                     /protein_id="YP_424312.1"
                     /db_xref="GI:83319941"
                     /db_xref="GeneID:3828958"
                     /translation="MIYINESFNKLKKLNIKKAIITIGNFDGFHIYHQKIINKVIQIA
                     KQENLTSIVMSFDKKIKDNITYTNLATKKQKLDFINNNLSDLDYFFDIKVDDSLIKTT
                     KDQFIDVLINKLNVIKIVEGQDFKFGYLSQGNIDDLIKAFSKKNVIIFKRDNDISSTK
                     IKKLLDENLVDKAQELLGIDLKLK"

The warning about the PROJECT line is a recent change, see bug 1946

I am using the latest version of Bio/GenBank/__init__.py which is revision 1.57
checked in 6 Feb 2006.  This should be the same as yours if you downloaded it
on 8 Feb...

Assuming you have the same genbank file (same date in the LOCUS line) and the
same Bio/GenBank/__init__.py as me, then maybe there is something else
different between our machines, maybe in another part of BioPython.

Or, it could be a Windows/Unix line ending problem?  Or worse, LF vs CR vs
CRLF.  Did you download the file by FTP or via the website?  This might make a
difference if the original file contained a mixture of CR and CRLF.

So far I have only tried this on Windows (and I download the file via the NCBI
website), and BioPython copes with the GenBank file in either windows or unix
format.

I have not (yet) tried it on Linux...

Could you check what happens if you use dos2unix and/or unix2dos on your
GenBank file?

Thanks




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list