[BioSQL-l] Help with load_seqdatabase.pl

Hilmar Lapp hlapp at gnf.org
Tue Jan 28 12:50:10 EST 2003


On Tuesday, January 28, 2003, at 11:28  AM, Jansen E Lim wrote:

> Hilmar Lapp wrote:
>
>> This is what versioning is for I believe. Biosql supports version for
>> bioentry as well as separately for biosequence (even though I'm not
>> aware that bioperl supports that, but I may be mistaken). I.e., you 
>> are
>> not required to update a sequence upon version change; you may also
>> leave the old one in place and add the new one.
>
> Does load_seqdatabase.pl do this?  I thought it could only replace the 
> entry?
>

Sure it does. Without any command-line switches though telling it to do 
otherwise it will attempt an insert. If this results in a UK violation 
(existing primary_id(), or existing 
(accession_number(),version(),namespace()), it will trigger a lookup, 
but not even an update, resulting in the new record being ignored 
(because it is considered identical to the existing one).

> I guess I'm too new at this.  Are there loaders that support the 
> versioning
> ability of biosql?  It doesn't seem that load_seqdatabase.pl can do 
> this.
>
>> From a very simplistic point of view, I consider history/old version 
>> of a record to be another entry in the database differing only in 
>> what changed in the new record.
>> So (please bear with my silly example) if I try to add an existing 
>> genbank record in the database and the only difference is a period 
>> (.) in the description line, my assumption is that a new record
>> is created with the old one getting tagged as history.  The only 
>> difference
> between the two records
> is the period (.) appearing in the description line of the current 
> record.  Is
> this possible with biosql?

It is possible with biosql through what I suggested before (basically, 
custom annotation, and custom processing of looked-up records). 
Identity is defined by unique (or alternative) keys - i.e., two records 
are considered identical by the system if they have identical natural 
UKs (natural as opposed to the database-assigned primary key). 
Description is not a part of the UK.

If what you describe is what you want you want to run the script in 
--lookup mode, and provide a closure to --mergeobjs that compares all 
values of $old and $new that you want to compare for identity, and if 
you find any discrepancy, undefine the primary id if necessary and 
increment the version number:

if($new has anything changed from $old) {
	$new->primary_id(undef) if $new->primary_id eq $old->primary_id;
	$new->version($old->version ? $old->version + 1 : 1);
}

>>
>> The problem with this is that distinguishing between current and
>> superseded versions is not as easy a matter as checking for a flag.
>
> Given the above scenario, why wouldn't a simple flag work?  Wouldn't a 
> new entry create new FKs as well?  That should allow me to query for 
> the current bioentry and all of its related info.
>

sure, if you know accession and version, but how do you tell the 
current entry from a superseded one if you don't know a-priori which 
version the current one is going to have? I.e., all you have is the 
accession?


	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list