[Biopython-dev] [Bug 2838] New: If a SeqRecord containing Genbank information is read from BioSQL, it cannot be written to another BioSQL database

Fri May 22 21:16:07 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2838

           Summary: If a SeqRecord containing Genbank information is read
                    from BioSQL, it cannot be written to another BioSQL
                    database
           Product: Biopython
           Version: 1.49
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: david.wyllie at ndm.ox.ac.uk

I've been trying to annotate some microbial sequences; some are from genbank.
So the proposed series of events was:
1) get sequences from genbank
2) store in BioSQL database called One
3) recover them from BioSql
4) annotate the recovered SeqRecords [this works, but isn't necessary for this
problem to be reproduced - here, I'm making no changes at all to the SeqRecord]
5) store the annotated SeqRecords in a different BioSQL database called Two.

The problem is that Step 5 fails when the original record was recovered from
Genbank.

The traceback (below) indicates a problem with the BioSQL loader in 
_load_bioentry_date

Here is the screen output, including traceback.
The program (attached) first loads a record from Genbank,
writes it to One, recovers it from One; at this point it has changed, in
particular in the way date fields are represented.

 the entrez load has a /date feature which is not a list
 /date=26-MAY-2005
 while the reloaded version has two date fields
 /dates=['26-MAY-2005']
 /date=['26-MAY-2005']  

Whether this is relevant I'm not sure. 

The subsequent write of the recovered version to Two fails.
As a control, I've checked that the original version can be written to Two
successfully.

I'm a novice with Python and Biopython so please accept my apologies if there
is something obvious and very stupid responsible for this.

---------------------------------------------------------------------------
dwyllie at dwyllie:~/programs/Project/src$ python dbtestcase.py
OK, going to recover record 28804743  from genbank....
Record loaded looks like this:
ID: AB098727.1
Name: AB098727
Description: Ceratodon purpureus chloroplast rps11, petD genes for ribosomal
protein S11, cytochromoe b/f complex subunit IV, partial cds.
Number of features: 5
/sequence_version=1
/source=chloroplast Ceratodon purpureus
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta',
'Bryophyta', 'Moss Superclass V', 'Bryopsida', 'Dicranidae', 'Dicranales',
'Ditrichaceae', 'Ceratodon']
/keywords=['']
/references=[<Bio.SeqFeature.Reference instance at 0x2190b90>,
<Bio.SeqFeature.Reference instance at 0x219a5a8>, <Bio.SeqFeature.Reference
instance at 0x219a5f0>, <Bio.SeqFeature.Reference instance at 0x219a6c8>]
/accessions=['AB098727']
/data_file_division=PLN
/date=26-MAY-2005
/organism=Ceratodon purpureus
/gi=28804743
Seq('AATTCGATTTTTTGTTCGTGATGTAACTCCTATGCCTCATAATGGGTGTAGACC...ATA',
IUPACAmbiguousDNA())
========================================================================
Load from Entrez completed, records= 1
Here is the loaded record:
========================================================================
ID: AB098727.1
Name: AB098727
Description: Ceratodon purpureus chloroplast rps11, petD genes for ribosomal
protein S11, cytochromoe b/f complex subunit IV, partial cds.
Number of features: 5
/sequence_version=1
/source=chloroplast Ceratodon purpureus
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta',
'Bryophyta', 'Moss Superclass V', 'Bryopsida', 'Dicranidae', 'Dicranales',
'Ditrichaceae', 'Ceratodon']
/keywords=['']
/references=[<Bio.SeqFeature.Reference instance at 0x2190b90>,
<Bio.SeqFeature.Reference instance at 0x219a5a8>, <Bio.SeqFeature.Reference
instance at 0x219a5f0>, <Bio.SeqFeature.Reference instance at 0x219a6c8>]
/accessions=['AB098727']
/data_file_division=PLN
/date=26-MAY-2005
/organism=Ceratodon purpureus
/gi=28804743
Seq('AATTCGATTTTTTGTTCGTGATGTAACTCCTATGCCTCATAATGGGTGTAGACC...ATA',
IUPACAmbiguousDNA())
========================================================================
Now loading these records into a BioSQL database One.
/var/lib/python-support/python2.6/MySQLdb/__init__.py:34: DeprecationWarning:
the sets module is deprecated
  from sets import ImmutableSet
Creating a new database  One
========================================================================
Load from database One completed, records= 1
========================================================================
Here is the record recovered from database One:
ID: AB098727.1
Name: AB098727
Description: Ceratodon purpureus chloroplast rps11, petD genes for ribosomal
protein S11, cytochromoe b/f complex subunit IV, partial cds.
Number of features: 5
/dates=['26-MAY-2005']
/ncbi_taxid=3225
/date=['26-MAY-2005']
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Bryopsida',
'Dicranidae', 'Dicranales', 'Ditrichaceae', 'Ceratodon', 'Ceratodon purpureus']
/source=['chloroplast Ceratodon purpureus']
/references=[<Bio.SeqFeature.Reference instance at 0x235d9e0>,
<Bio.SeqFeature.Reference instance at 0x235db90>, <Bio.SeqFeature.Reference
instance at 0x235dcf8>, <Bio.SeqFeature.Reference instance at 0x235de60>]
/gi=28804743
/data_file_division=PLN
/keywords=['']
/organism=Ceratodon purpureus
/sequence_version=['1']
/accessions=['AB098727']
DBSeq('AATTCGATTTTTTGTTCGTGATGTAACTCCTATGCCTCATAATGGGTGTAGACC...ATA',
DNAAlphabet())
========================================================================
Creating a new database  Two
Traceback (most recent call last):
  File "dbtestcase.py", line 206, in <module>
    from dbtestcase import AuthDetails
  File "/home/dwyllie/programs/CheckleyProject/src/dbtestcase.py", line 225, in
<module>
    DemonstrateProblem(problemgi,ad)
  File "/home/dwyllie/programs/CheckleyProject/src/dbtestcase.py", line 199, in
DemonstrateProblem
    db2.load(listtoload)
  File "/var/lib/python-support/python2.6/BioSQL/BioSeqDatabase.py", line 430,
in load
    db_loader.load_seqrecord(cur_record)
  File "/var/lib/python-support/python2.6/BioSQL/Loader.py", line 50, in
load_seqrecord
    self._load_bioentry_date(record, bioentry_id)
  File "/var/lib/python-support/python2.6/BioSQL/Loader.py", line 577, in
_load_bioentry_date
    self.adaptor.execute(sql, (bioentry_id, date_id, date))
  File "/var/lib/python-support/python2.6/BioSQL/BioSeqDatabase.py", line 289,
in execute
    self.cursor.execute(sql, args or ())
  File "/var/lib/python-support/python2.6/MySQLdb/cursors.py", line 166, in
execute
    self.errorhandler(self, exc, value)
  File "/var/lib/python-support/python2.6/MySQLdb/connections.py", line 35, in
defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (1064, "You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version for the
right syntax to use near '), 1)' at line 1")
dwyllie at dwyllie:~/programs/Project/src$

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.