[BioSQL-l] Help with load_seqdatabase.pl
Hilmar Lapp
hlapp at gnf.org
Tue Jan 28 12:50:10 EST 2003
On Tuesday, January 28, 2003, at 11:28 AM, Jansen E Lim wrote:
> Hilmar Lapp wrote:
>
>> This is what versioning is for I believe. Biosql supports version for
>> bioentry as well as separately for biosequence (even though I'm not
>> aware that bioperl supports that, but I may be mistaken). I.e., you
>> are
>> not required to update a sequence upon version change; you may also
>> leave the old one in place and add the new one.
>
> Does load_seqdatabase.pl do this? I thought it could only replace the
> entry?
>
Sure it does. Without any command-line switches though telling it to do
otherwise it will attempt an insert. If this results in a UK violation
(existing primary_id(), or existing
(accession_number(),version(),namespace()), it will trigger a lookup,
but not even an update, resulting in the new record being ignored
(because it is considered identical to the existing one).
> I guess I'm too new at this. Are there loaders that support the
> versioning
> ability of biosql? It doesn't seem that load_seqdatabase.pl can do
> this.
>
>> From a very simplistic point of view, I consider history/old version
>> of a record to be another entry in the database differing only in
>> what changed in the new record.
>> So (please bear with my silly example) if I try to add an existing
>> genbank record in the database and the only difference is a period
>> (.) in the description line, my assumption is that a new record
>> is created with the old one getting tagged as history. The only
>> difference
> between the two records
> is the period (.) appearing in the description line of the current
> record. Is
> this possible with biosql?
It is possible with biosql through what I suggested before (basically,
custom annotation, and custom processing of looked-up records).
Identity is defined by unique (or alternative) keys - i.e., two records
are considered identical by the system if they have identical natural
UKs (natural as opposed to the database-assigned primary key).
Description is not a part of the UK.
If what you describe is what you want you want to run the script in
--lookup mode, and provide a closure to --mergeobjs that compares all
values of $old and $new that you want to compare for identity, and if
you find any discrepancy, undefine the primary id if necessary and
increment the version number:
if($new has anything changed from $old) {
$new->primary_id(undef) if $new->primary_id eq $old->primary_id;
$new->version($old->version ? $old->version + 1 : 1);
}
>>
>> The problem with this is that distinguishing between current and
>> superseded versions is not as easy a matter as checking for a flag.
>
> Given the above scenario, why wouldn't a simple flag work? Wouldn't a
> new entry create new FKs as well? That should allow me to query for
> the current bioentry and all of its related info.
>
sure, if you know accession and version, but how do you tell the
current entry from a superseded one if you don't know a-priori which
version the current one is going to have? I.e., all you have is the
accession?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list