[Biopython-dev] [Bug 2833] Features insertion on previous bioentry_id

Mon Jun 1 22:42:05 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2833





------- Comment #15 from cymon.cox at gmail.com  2009-06-01 18:42 EST -------
Here's an attempt to circumvent the RULES in the BioSQL schema on PostgreSQL;
it makes a check for the presence of the RULES, and if they are present insures
that the record is injected in to the bioentry table else raises an
IntegrityError.

A further problem arose with one of the unittest:
======================================================================
FAIL: Make sure can't reimport existing records.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_BioSQL.py", line 474, in test_reload
    err.__class__.__name__ + "\n" + str(err))
AssertionError: OperationalError
currval of sequence "bioentry_pk_seq" is not yet defined in this session


----------------------------------------------------------------------

This was seemingly unrelated to the RULES issue and is a consequence of how
PostgreSQL handles sequences in sessions: because a new session was started (ie
and new suite in the unit test) and the record failed to inject although the
RULES were returning a INSERT 0,0 the bioentry_pk_seq was not incremented and
when adaptor.last_id was called (actually looks for the curr_val of the
sequence) it raised an OperationalError because the next_val() had not been
called so far in the session. Adding a manual call to the next_val() in the
unittest before trying the load ensures that the unittest fails where expected.
(At least I think that is what is happening).

diff --git a/BioSQL/BioSeqDatabase.py b/BioSQL/BioSeqDatabase.py
index 3f58e9c..89d0d99 100644
--- a/BioSQL/BioSeqDatabase.py
+++ b/BioSQL/BioSeqDatabase.py
@@ -330,6 +332,14 @@ class BioSeqDatabase:
         self.adaptor = adaptor
         self.name = name
         self.dbid = self.adaptor.fetch_dbid_by_dbname(name)
+
+        ##Test for presence of RULES in schema
+        self.postgres_rules_present= False
+        if "psycopg" in self.adaptor.conn.__class__.__module__:
+            sql = r"SELECT ev_class FROM pg_rewrite WHERE
rulename='rule_bioentry_i1'"
+            if self.adaptor.execute_and_fetchall(sql):
+                self.postgres_rules_present = True
+
     def __repr__(self):
         return "BioSeqDatabase(%r, %r)" % (self.adaptor, self.name)

@@ -439,5 +449,14 @@ class BioSeqDatabase:
         num_records = 0
         for cur_record in record_iterator :
             num_records += 1
+            if self.postgres_rules_present:
+                self.adaptor.execute("SELECT count(bioentry_id) FROM
bioentry")
+                curr_val = self.adaptor.cursor.fetchone()[0]
             db_loader.load_seqrecord(cur_record)
+            if self.postgres_rules_present:
+                self.adaptor.execute("SELECT count(bioentry_id) FROM
bioentry")
+                after_val = self.adaptor.cursor.fetchone()[0]
+                if curr_val == after_val:
+                    raise self.adaptor.conn.IntegrityError("Duplicate record " 
+                        "detected: record has not been inserted")
         return num_records
diff --git a/Tests/test_BioSQL.py b/Tests/test_BioSQL.py
index 334fe52..bf17ba7
--- a/Tests/test_BioSQL.py
+++ b/Tests/test_BioSQL.py
@@ -581,6 +581,13 @@ class InDepthLoadTest(unittest.TestCase):
         self.assertEqual(db_record.name, record.name)
         self.assertEqual(db_record.description, record.description)
         self.assertEqual(str(db_record.seq), str(record.seq))
+        
+        #We have to manually advance the sequence because when the repeat load
+        #of the record fails and returns INSERT 0,0 because of the RULES the
call
+        #to get the last_id causes an OperationalError because the curr_val
hasnt
+        #been defined for the session ie. next_val() hasnt been called
+        self.db.adaptor.execute(r"select nextval('bioentry_pk_seq')")
+
         #Good... now try reloading it!
         try :
             count = self.db.load([record])

Yeah, its nasty, but I thought I'd put it out there for consideration...

cymon at gyra:~/git/github-master/Tests$ python ./test_BioSQL.py
GenBank file to BioSQL and back to a GenBank file, NC_000932. ... FAIL
GenBank file to BioSQL and back to a GenBank file, NC_005816. ... FAIL
GenBank file to BioSQL and back to a GenBank file, NT_019265. ... ok
GenBank file to BioSQL and back to a GenBank file, arab1. ... ok
GenBank file to BioSQL and back to a GenBank file, cor6_6. ... ok
GenBank file to BioSQL and back to a GenBank file, noref. ... ok
GenBank file to BioSQL and back to a GenBank file, one_of. ... FAIL
GenBank file to BioSQL and back to a GenBank file, protein_refseq2. ... ok
Make sure can't import records with same ID (in one go). ... ok
Make sure can't import a single record twice (in one go). ... ok
Make sure can't import a single record twice (in steps). ... ok
Make sure all records are correctly loaded. ... ok
Make sure can't reimport existing records. ... ok
Indepth check that SeqFeatures are transmitted through the db. ... ok
Make sure can load record into another namespace. ... ok
Load SeqRecord objects into a BioSQL database. ... ok
Get a list of all items in the database. ... ok
Test retrieval of items using various ids. ... ok
Check can add DBSeq objects together. ... ok
Check can turn a DBSeq object into a Seq or MutableSeq. ... ok
Make sure Seqs from BioSQL implement the right interface. ... ok
Check SeqFeatures of a sequence. ... ok
Make sure SeqRecords from BioSQL implement the right interface. ... ok
Check that slices of sequences are retrieved properly. ... ok
GenBank file to BioSQL, then again to a new namespace, NC_000932. ... FAIL
GenBank file to BioSQL, then again to a new namespace, NC_005816. ... ok
GenBank file to BioSQL, then again to a new namespace, NT_019265. ... ok
GenBank file to BioSQL, then again to a new namespace, arab1. ... ok
GenBank file to BioSQL, then again to a new namespace, cor6_6. ... ok
GenBank file to BioSQL, then again to a new namespace, noref. ... ok
GenBank file to BioSQL, then again to a new namespace, one_of. ... ok
GenBank file to BioSQL, then again to a new namespace, protein_refseq2. ... ok

Cheers, C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.