[BioSQL-l] Help with load_seqdatabase.pl

Jansen E Lim Jansen.Lim at bms.com
Tue Jan 28 14:28:17 EST 2003


Hi Hilmar,

I wasn't sure that the schema supported versioning. Looking at the bioentry and
biosequence tables seem to
suggest that it is supported.  I'm looking in the t\ folder for some examples.

Hilmar Lapp wrote:

> This is what versioning is for I believe. Biosql supports version for
> bioentry as well as separately for biosequence (even though I'm not
> aware that bioperl supports that, but I may be mistaken). I.e., you are
> not required to update a sequence upon version change; you may also
> leave the old one in place and add the new one.

Does load_seqdatabase.pl do this?  I thought it could only replace the entry?

I guess I'm too new at this.  Are there loaders that support the versioning
ability of
biosql?  It doesn't seem that load_seqdatabase.pl can do this.

>From a very simplistic point of view, I consider history/old version of a record
to be another entry in the database differing only in what changed in the new
record.
So (please bear with my silly example) if I try to add an existing genbank record
in the database
and the only difference is a period (.) in the description line, my assumption is
that a new record
is created with the old one getting tagged as history.  The only difference
between the two records
is the period (.) appearing in the description line of the current record.  Is
this possible with biosql?


>
>
> The problem with this is that distinguishing between current and
> superseded versions is not as easy a matter as checking for a flag.

Given the above scenario, why wouldn't a simple flag work?  Wouldn't a new entry
create new FKs as
well?  That should allow me to query for the current bioentry and all of its
related info.

Thanks for your help.

-Jansen

> So,
> queries across many bioentries get kind of involved in order to
> disregard say superseded versions (but it's still possible). Biosql,
> however, is infinitely extensible through its qualifier/value system.
> Using that system, you can assign any flag you choose. E.g.,
>
>         $annval = Bio::Annotation::SimpleValue->new(-tagname => "is_current",
>                                                  -value => "TRUE");
>      $seq->annotation->add_Annotation($annval);
>      # then store in the database
>
> To later retrieve:
>
>         SELECT e.accession, e.entry_version -- or whatever
>         FROM   bioentry e, bioentry_qualifier_value boa, ontology_term o
>         WHERE  boa.bioentry_id = e.bioentry_id
>         AND    boa.ontology_term_id = o.ontology_term_id
>         AND    o.term_name = 'is_current'
>         AND    boa.qualifier_value = 'TRUE';
>
> You can also contrain for this in object-level queries. Let me know if
> you'd like to see an example for how to set up such a query (generally
> speaking, look at t/query.t to get an idea what the procedure looks
> like).
>
>         -hilmar
>
> On Monday, January 27, 2003, at 11:03  AM, Jansen E Lim wrote:
>
> > Hilmar,
> >
> > Thanks for the clarification.  I agree that performing a true update
> > is not
> > the most efficient method.
> > One question regarding the BioSQL schema: is the interest in having it
> > support history?  Thus, when loading
> > revised entries, the previous records are flagged as such (e.g., a
> > status
> > attribute having a value of [current|history].)
> > Historical records are very important with regard to IP issues or just
> > trying to figure out what was known about a
> > particular sequence when we last ran blast, fasta, genscan and other
> > analyses.
> >
> > Thanks for any info.
> >
> > Regards,
> > -Jansen
> >
> > Hilmar Lapp wrote:
> >
> >> Jansen, sorry for the late response. The problem is due to PostgreSQL
> >> handling failures within a transaction differently (than MySQL/InnoDB
> >> and Oracle). The way the adaptor layer works is that those entities
> >> which are practically infinite in number are not looked up before
> >> insert, but instead their presence is detected by an insert failing
> >> the
> >> UK constraint. Comment is such an entity. PostgreSQL, however, aborts
> >> the entire transaction upon such a (handled or not) failure. I have
> >> yet
> >> to write certain functions in PL/PgSQL that will get around that
> >> problem.
> >>
> >> Generally speaking though, updating bioentries through --update is not
> >> very robust, because 1-n and n-n connected relations require more than
> >> a simple update (e.g., the new version of a sequence may have less
> >> features or features with a different key than the old version; a
> >> simple update would leave you with stale features attached to the
> >> bioentry).
> >>
> >> I have found it much more robust to simply delete associations and
> >> FK-connected relations, and re-inserting the new set. So, all that is
> >> really UPDATEd in this case is the bioentry (and biosequence) table.
> >> For an example of how to do this, have a look at
> >> scripts/update-on-new-version.pl, which is a closure you can pass to
> >> the --mergeobjs option of load_seqdatabase.pl. I wrote this to update
> >> RefSeq, and it works well for me.
> >>
> >>         -hilmar
> >>
> >> On Thursday, January 23, 2003, at 11:24  AM, Jansen E Lim wrote:
> >>
> >>> Hello,
> >>>
> >>> I seem to be having trouble using the -lookup option of
> >>> load_seqdatabase.pl script.  In particular, I wanted to see what
> >>> the option
> >>> would
> >>> do as documented as follows:
> >>>             --lookup
> >>>             flag to look-up by unique key first, converting the
> >>> insert
> >>>             into an update if the object is found
> >>>
> >>> I also tried using --lookup 1 without success.  I have no trouble
> >>> using  -noupdate and -remove option with -lookup.
> >>>
> >>> Here's how I invoke the script:  load_seqdatabase.pl  -dbname
> >>> refseq -driver Pg -lookup -format genbank dup.dat
> >>> Here's the error message I get:
> >>>
> >>> DBD::Pg::st execute failed: ERROR:  Cannot insert a duplicate key
> >>> into unique index comment_bioentry_id_key at
> >>> /libpath/Bio/DB/BioSQL/BaseDriver.pm line 564, <GEN0> line 116.
> >>>
> >>> -------------------- WARNING ---------------------
> >>> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,
> >>> values were ("PROVISIONAL REFSEQ: This record
> >>> has not yet been subject to final NCBI review. The reference
> >>> sequence was derived from J04733.1. ","1") FKs (3)
> >>> ERROR:  Cannot insert a duplicate key into unique index
> >>> comment_bioentry_id_key
> >>> ---------------------------------------------------
> >>> NOTICE:  current transaction is aborted, queries ignored until
> >>> end of transaction block
> >>> DBD::Pg::st fetchall_arrayref failed: no statement executing at
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line
> >>> 801, <GEN0> line 116.
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: Could not store NM_012500:
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: create: object (Bio::Annotation::Comment) failed to insert
> >>> or to be found by unique key
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> >>> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:197
> >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> >>> STACK: Bio::DB::Persistent::PersistentObject::store
> >>> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> >>> STACK:
> >>> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:220
> >>>
> >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> >>> STACK: Bio::DB::Persistent::PersistentObject::store
> >>> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> >>> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/SeqAdaptor.pm:179
> >>> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> >>> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:260
> >>> STACK: Bio::DB::Persistent::PersistentObject::store
> >>> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> >>> STACK: ../load_seqdatabase.pl:400
> >>> -----------------------------------------------------------
> >>>
> >>>
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> >>> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> >>> STACK: ../load_seqdatabase.pl:409
> >>> -----------------------------------------------------------
> >>>
> >>> Thanks for helping out.
> >>>
> >>> -Jansen
> >>>
> >>> _______________________________________________
> >>> BioSQL-l mailing list
> >>> BioSQL-l at open-bio.org
> >>> http://open-bio.org/mailman/listinfo/biosql-l
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >
> >
> --
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------



More information about the BioSQL-l mailing list