[Biojava-dev] BioSQL postgre BioEntryRelationship

Andreas Prlic andreas at sdsc.edu
Tue Oct 28 04:40:35 UTC 2008


Hi Richard,

I updated the 1.6 release with your fixes :
http://www.biojava.org/download/bj16/all/biojava-1.6.1-all.jar
Can you please verify and if it is correct update the download page on the wiki?

Andreas

On Thu, Oct 23, 2008 at 6:24 AM, Richard Holland
<dicknetherlands at gmail.com> wrote:
> Andreas - is it possible to rebuild biojava-1.6-all.jar with the following
> fix made to it?
>
> cheers,
> Richard
>
> ---------- Forwarded message ----------
> From: Christian Köberle <ch.koeberle at googlemail.com>
> Date: 2008/10/23
> Subject: Re: [Biojava-dev] BioSQL postgre BioEntryRelationship
> To: Richard Holland <dicknetherlands at gmail.com>
>
>
> Hi Richard,
>
> I found the error in the current download of biojava 6.1
> (http://www.biojava.org/download/bj16/all/biojava-1.6-all.jar) in the file
> src/org/biojavax/bio/db/biosql/pg/BioEntryRelationship.hbm.xml
>
> <hibernate-mapping>
>     <class name="org.biojavax.bio.SimpleBioEntryRelationship"
> table="bioentry_relationship" node="sequenceRelation"
> entity-name="BioEntryRelationship">
>     <id name="id" type="integer" unsaved-value="null"
> column="bioentry_relationship_id" node="@id">
>     <generator class="sequence">
> <param name="sequence">bioentry_relationship_pk_seq</param>
> </generator>
> </id>
> <many-to-one name="term" class="Term" column="term_id" not-null="true"
> cascade="persist,merge,save-update" node="@termId" embed-xml="false"/>
> <many-to-one name="object" class="Feature" column="object_bioentry_id"
> not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
> embed-xml="false"/>
> <many-to-one name="subject" class="BioEntry" column="subject_bioentry_id"
> not-null="true" cascade="persist,merge,save-update"
> node="@subjectBioEntryId" embed-xml="false"/>
> <property name="rank" node="@rank"/>
> </class>
> </hibernate-mapping>
>
> cheers,
> Christian
>
>
> 2008/10/23 Richard Holland <dicknetherlands at gmail.com>
>>
>> Christian,
>>
>> Thanks for your comments.
>>
>> I'm not sure which file you're referring to, or what version of BioJava
>> you have, as the line you quote does not appear in any of the current
>> hbm.xml files in the trunk of SubVersion.
>>
>> Also, the BioEntryRelationship interface and it's implementations do
>> already have getSubject() and getObject() methods which return the parent
>> and child BioEntry instances.
>>
>> The BioEntry interface itself has a getBioEntryRelationships() method
>> which returns all relationships in which it is the object BioEntry. You
>> could use HQL to obtain those for which it is the subject, but you are right
>> that it would be good to have a method that returns the latter. Could you
>> raise a BugZilla request for this?
>>
>> It would be good if you could do some thorough testing of your lazy
>> loading suggestions on some other use cases before we decide whether or not
>> to adopt that approach in future developments. Use cases would include:
>>
>> 1. have a very large database with thousands of related records in it
>> (e.g. load the whole of GenBank). Iterate over all the records in the
>> database and perform a simple read operation on each that hits the modified
>> methods. See if you run out of memory.
>>
>> 2. like 1, but perform a series of repeated read/write operations using
>> the modified methods, with a final commit to attempt to write the results
>> back to see if they still persist correctly.
>>
>> The reason is that the modified methods might cause problems with those
>> people who are processing large volumes of data in their databases. If all
>> related records are loaded at once, even only on demand, instead of one at a
>> time, it will cause memory issues. The trade off is therefore memory vs.
>> speed. We opted for the memory option because it makes life easier for most
>> novice coders to not have to trace out-of-memory exceptions (although they
>> will still occur using the existing methods, but it happens less often).
>>
>> Also, your method reruns the query every time it is called. It probably
>> should cache the results after the first call, to prevent objects being
>> reloaded unnecessarily, and to prevent problems with objects from a previous
>> call being modified then attempted to be overwritten by a subsequent call?
>> Also if Hibernate does not receive the same set back that it auto-loaded as
>> a property via the default get() method when it comes to save the object, it
>> will throw a wobbly and refuse to commit.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/23 Christian Köberle <ch.koeberle at googlemail.com>
>>>
>>> Hi,
>>> I found a bug in the postgre mapping file for BioEntryRelationship.
>>> line:
>>> <many-to-one name="object" class="Feature" column="object_bioentry_id"
>>> not-null="true" cascade="persist,merge,save-update"
>>> node="@objectFeatureId"
>>> embed-xml="false"/>
>>> The value for the attribute class has to be "BioEntry"
>>>
>>> For the BioEntry I miss methodes to have access to subject_bioentry
>>> BioEntryRelationship. I think the BioEntryRelationship. is a parent child
>>> relationship. So it will be nice to have access to both.
>>>
>>> Furthermore the hibernate mapping strategies for the BioSQL is quite slow
>>> and produces a lot of queries to the database. Because for all lists and
>>> set
>>> the lazy fetch mode is disable. In this mode hibernate will execute one
>>> query for each element in a list or set. The faster way is to enable the
>>> lazy fetch mode an use methods to load the list. Each of these methods
>>> executes only one query.
>>> For excample:
>>>
>>> public List<BioEntry> getParents(BioEntry bioEntry){
>>>
>>> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
>>> =:subject";
>>> Query query = session.createQuery(stmt);
>>> query.setParameter("subject", bioEntry);
>>> return query.list();
>>>
>>> }
>>>
>>>
>>> This is factor 2 to 4 faster than the methode
>>> BioEntry..getRelationships()
>>> In case of all dependences of an BioEntry-Object an select with lazy
>>> fetching can be 500 times faster than a select with eager fetching (in
>>> case
>>> of unigene cluster Hs.4 for example).
>>> Here a example for the relationship unigene cluster Hs.2 and the gene
>>> BC067218 (we use BioSQL to store Unigene)
>>>
>>> getParents():
>>> runtime: 14 msec
>>> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
>>> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
>>> bioentry1_.accession as accession89_, bioentry1_.description as
>>> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
>>> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id
>>> as
>>> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
>>> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as
>>> seq93_,
>>> case when bioentry1_1_.bioentry_id is not null then 2 when
>>> bioentry1_.bioentry_id is not null then 0 end as clazz_ from
>>> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
>>> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id
>>> left
>>> outer join unigene.biosequence bioentry1_1_ on
>>> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
>>> joinunigene.biosequence bioentry1_2_ on
>>> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
>>> bioentryre0_.object_bioentry_id=?
>>>
>>>
>>> bioEntry.getRelationships():
>>> runtime: 36 msec
>>> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
>>> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
>>> bioentry0_.accession as accession89_, bioentry0_.description as
>>> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
>>> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id
>>> as
>>> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
>>> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as
>>> seq93_,
>>> case when bioentry0_1_.bioentry_id is not null then 2 when
>>> bioentry0_.bioentry_id is not null then 0 end as clazz_ from
>>> unigene.bioentry bioentry0_ left outer join unigene.biosequence
>>> bioentry0_1_
>>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
>>> unigene.biosequence bioentry0_2_ on
>>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
>>> Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
>>> relationsh0_.bioentry_relationship_id as bioentry1_1_,
>>> relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
>>> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
>>> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
>>> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
>>> relationsh0_ where relationsh0_.object_bioentry_id=?
>>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
>>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
>>> namespace0_.description as descript4_80_0_ from unigene.biodatabase
>>> namespace0_ where namespace0_.biodatabase_id=?
>>> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_,
>>> bioentry0_.name
>>> as name89_0_, bioentry0_.identifier as identifier89_0_,
>>> bioentry0_.accession
>>> as accession89_0_, bioentry0_.description as descript5_89_0_,
>>> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
>>> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
>>> biodatab9_89_0_, bioentry0_1_.version as version93_0_,
>>> bioentry0_1_.length
>>> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq
>>> as
>>> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
>>> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
>>> unigene.bioentry bioentry0_ left outer join unigene.biosequence
>>> bioentry0_1_
>>> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
>>> unigene.biosequence bioentry0_2_ on
>>> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
>>> bioentry0_.bioentry_id=?
>>> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
>>> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
>>> namespace0_.description as descript4_80_0_ from unigene.biodatabase
>>> namespace0_ where namespace0_.biodatabase_id=?
>>> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as
>>> name84_0_,
>>> term0_.identifier as identifier84_0_, term0_.definition as
>>> definition84_0_,
>>> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_
>>> from
>>> unigene.term term0_ where term0_.term_id=?
>>> Hibernate: select ontology0_.ontology_id as ontology1_83_0_,
>>> ontology0_.name
>>> as name83_0_, ontology0_.definition as definition83_0_ from
>>> unigene.ontology
>>> ontology0_ where ontology0_.ontology_id=?
>>> Hibernate: select termset0_.ontology_id as ontology6_1_,
>>> termset0_.term_id
>>> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as
>>> name84_0_,
>>> termset0_.identifier as identifier84_0_, termset0_.definition as
>>> definition84_0_, termset0_.is_obsolete as is5_84_0_,
>>> termset0_.ontology_id
>>> as ontology6_84_0_ from unigene.term termset0_ where
>>> termset0_.ontology_id=?
>>> Hibernate: select tripleset0_.ontology_id as ontology5_1_,
>>> tripleset0_.term_relationship_id as term1_1_,
>>> tripleset0_.term_relationship_id as term1_87_0_,
>>> tripleset0_.subject_term_id
>>> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
>>> tripleset0_.predicate_term_id as predicate4_87_0_,
>>> tripleset0_.ontology_id
>>> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
>>> tripleset0_.ontology_id=?
>>> Hibernate: select rankedcros0_.term_id as term1_0_,
>>> rankedcros0_.dbxref_id
>>> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
>>> rankedcros0_ where rankedcros0_.term_id=?
>>> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym
>>> as
>>> synonym0_ from unigene.term_synonym synonymset0_ where
>>> synonymset0_.term_id=?
>>>
>>> --
>>> Christian Köberle
>>>
>>> _______________________________________________
>>> biojava-dev mailing list
>>> biojava-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>>
>>
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>
>
>
> --
> Christian Köberle
> Schönholzerstr. 5
> 10115 Berlin
> Mobil: 0179 79 35 345
>
>
>
> --
> Richard Holland, BSc MBCS
> Finance Director, Eagle Genomics Ltd
> M: +44 7500 438846 | E: holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>




More information about the biojava-dev mailing list