[BioSQL-l] Query Problem.

Hilmar Lapp hlapp at gnf.org
Tue Feb 17 18:09:14 EST 2004


Hm, interesting.

Basically, the trouble maker is that you cannot apply NVL() to LOB 
values. Biosequence.seq is of type CLOB. Normally, for every updated 
column the new value is wrapped in a call to NVL() to prevent 
overriding of values by incomplete objects (which would have certain 
attributes undefined instead of the value that's in the database). So, 
in order to protect column foo, you write the update statement as

	UPDATE blah SET
	...,
	foo = NVL(?, foo),
	...
	WHERE oid = ?

If you bind undef to column foo's parameter, you get a no-change result 
without ever having to cross-check every attribute.

This will work for the sequence so long as it is less than or equal to 
4000 chars long, as then it will simply be treated as a VARCHAR2 and 
Oracle will cast it internally to CLOB when it gets assigned to the seq 
column. If the sequence length is longer than 4000 chars, you can't use 
NVL() on it, so there is special case code for Oracle that will edit 
the generated sql and replace the NVL() call with a direct assignment 
iff any of the object's attribute is defined and longer than 4000 chars 
in length. The get_persistent_slot_values() will return the sequence as 
having value undef if it hasn't changed in order to prevent unnecessary 
retrieval and update of a column of 10s/100s/1000s of kb if it hasn't 
changed. An unchanged sequence should therefore not lead to the NVL() 
clause to be edited away, because its value would be pretended as undef 
to the database update function.

The unknown in this equation is what happens if in the NVL(?, foo) call 
the bound parameter is undef (NULL), but foo has more than 4000 chars. 
It shouldn't actually matter up to 32k chars length, as that is 
Oracle's upper limit for VARCHAR2 in PL/SQL (i.e., while DBI/DBD 
doesn't have its fingers in the game), and the NVL call is executed on 
the server, not the DBI client.

Are you sure your threshold of misbehavior is 4000 and not 32k in 
length?

If you turn on verbosity (call $db->verbose(1) after creating it), what 
parameters do you see get bound for the sequence that gets its sequence 
set to NULL after an update?

	-hilmar

On Tuesday, February 17, 2004, at 08:23  AM, Marc Logghe wrote:

> Hi,
> I was not able to figure it out completely, but found a fix.
> In Bio::DB::BioSQL::BiosequenceAdaptor the marked (#ml) line was added:
>
> sub get_persistent_slot_values {
>     my ($self,$obj,$fkobjs) = @_;
>     my @vals;
>     if($obj->isa("Bio::Seq::RichSeqI")) {
>         @vals = ($obj->seq_version());
>     } else {
>         @vals = (undef);
>     }
> #ml
>     $obj->seq_has_changed(1) if $obj->length > 4000;
>
>     push(@vals,
>          $obj->length(),
>          $obj->alphabet(),
>          $obj->seq_has_changed() ? $obj->seq() : undef);
>     return \@vals;
> }
>
> This forces an 'update' of the sequence if it is larger than 4000 bp.
>
>> -----Original Message-----
>> From: Marc Logghe
>> Sent: dinsdag 17 februari 2004 11:01
>> To: biosql-l at open-bio.org
>> Subject: RE: [BioSQL-l] Query Problem.
>>
>>
>> Hi,
>> I have been trying to track down a nasty bug, but failed :-(
>> Sorry, Hilmar.
>> A biosequence containing more than 4000 nt is fine. No
>> problem in loading that.
>> A colleague of mine (FredericP) found out that as soon as you
>> add a feature to the corresponding bioentry later, and you do
>> an update (store+commit) your biosequence ends up truncated (4000).
>> Hoped that
>> Bio::DB::BioSQL::Oracle::BiosequenceAdaptorDriver::get_sth
>> had to do something with it.
>> It appears that a different sth is chosen depending on e.g.
>> the field size (_upd_sth2 if length > 4000). I tried to force
>> this by changing the line "grep { $_ && (length($_) > 4000);"
>> to "grep { $_ && (length($_) > 4000 || $_ > 4000);" but that
>> was even worse: seq field in biosequence was null.
>> Anyhow, the sequence should not have been updated after all,
>> cos it was not changed.
>> Any clues ?
>> Thanks,
>> Marc
>>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the BioSQL-l mailing list