[Biojava-dev] BioSQL postgre BioEntryRelationship

Richard Holland dicknetherlands at gmail.com
Thu Oct 23 09:45:53 UTC 2008


Christian,

Thanks for your comments.

I'm not sure which file you're referring to, or what version of BioJava you
have, as the line you quote does not appear in any of the current hbm.xml
files in the trunk of SubVersion.

Also, the BioEntryRelationship interface and it's implementations do already
have getSubject() and getObject() methods which return the parent and child
BioEntry instances.

The BioEntry interface itself has a getBioEntryRelationships() method which
returns all relationships in which it is the object BioEntry. You could use
HQL to obtain those for which it is the subject, but you are right that it
would be good to have a method that returns the latter. Could you raise a
BugZilla request for this?

It would be good if you could do some thorough testing of your lazy loading
suggestions on some other use cases before we decide whether or not to adopt
that approach in future developments. Use cases would include:

1. have a very large database with thousands of related records in it (e.g.
load the whole of GenBank). Iterate over all the records in the database and
perform a simple read operation on each that hits the modified methods. See
if you run out of memory.

2. like 1, but perform a series of repeated read/write operations using the
modified methods, with a final commit to attempt to write the results back
to see if they still persist correctly.

The reason is that the modified methods might cause problems with those
people who are processing large volumes of data in their databases. If all
related records are loaded at once, even only on demand, instead of one at a
time, it will cause memory issues. The trade off is therefore memory vs.
speed. We opted for the memory option because it makes life easier for most
novice coders to not have to trace out-of-memory exceptions (although they
will still occur using the existing methods, but it happens less often).

Also, your method reruns the query every time it is called. It probably
should cache the results after the first call, to prevent objects being
reloaded unnecessarily, and to prevent problems with objects from a previous
call being modified then attempted to be overwritten by a subsequent call?
Also if Hibernate does not receive the same set back that it auto-loaded as
a property via the default get() method when it comes to save the object, it
will throw a wobbly and refuse to commit.

cheers,
Richard



2008/10/23 Christian Köberle <ch.koeberle at googlemail.com>

> Hi,
> I found a bug in the postgre mapping file for BioEntryRelationship.
> line:
> <many-to-one name="object" class="Feature" column="object_bioentry_id"
> not-null="true" cascade="persist,merge,save-update" node="@objectFeatureId"
> embed-xml="false"/>
> The value for the attribute class has to be "BioEntry"
>
> For the BioEntry I miss methodes to have access to subject_bioentry
> BioEntryRelationship. I think the BioEntryRelationship. is a parent child
> relationship. So it will be nice to have access to both.
>
> Furthermore the hibernate mapping strategies for the BioSQL is quite slow
> and produces a lot of queries to the database. Because for all lists and
> set
> the lazy fetch mode is disable. In this mode hibernate will execute one
> query for each element in a list or set. The faster way is to enable the
> lazy fetch mode an use methods to load the list. Each of these methods
> executes only one query.
> For excample:
>
> public List<BioEntry> getParents(BioEntry bioEntry){
>
> String stmt = "SLECT r.object FROM BioEntryEelationship r WHERE r.object
> =:subject";
> Query query = session.createQuery(stmt);
> query.setParameter("subject", bioEntry);
> return query.list();
>
> }
>
>
> This is factor 2 to 4 faster than the methode BioEntry..getRelationships()
> In case of all dependences of an BioEntry-Object an select with lazy
> fetching can be 500 times faster than a select with eager fetching (in case
> of unigene cluster Hs.4 for example).
> Here a example for the relationship unigene cluster Hs.2 and the gene
> BC067218 (we use BioSQL to store Unigene)
>
> getParents():
> runtime: 14 msec
> SQL: Hibernate: select bioentry1_.bioentry_id as bioentry1_89_,
> bioentry1_.name as name89_, bioentry1_.identifier as identifier89_,
> bioentry1_.accession as accession89_, bioentry1_.description as
> descript5_89_, bioentry1_.version as version89_, bioentry1_.division as
> division89_, bioentry1_.taxon_id as taxon8_89_, bioentry1_.biodatabase_id
> as
> biodatab9_89_, bioentry1_1_.version as version93_, bioentry1_1_.length as
> length93_, bioentry1_1_.alphabet as alphabet93_, bioentry1_1_.seq as
> seq93_,
> case when bioentry1_1_.bioentry_id is not null then 2 when
> bioentry1_.bioentry_id is not null then 0 end as clazz_ from
> unigene.bioentry_relationship bioentryre0_ inner join unigene.bioentry
> bioentry1_ on bioentryre0_.subject_bioentry_id=bioentry1_.bioentry_id left
> outer join unigene.biosequence bioentry1_1_ on
> bioentry1_.bioentry_id=bioentry1_1_.bioentry_id left outer
> joinunigene.biosequence bioentry1_2_ on
> bioentry1_.bioentry_id=bioentry1_2_.bioentry_id where
> bioentryre0_.object_bioentry_id=?
>
>
> bioEntry.getRelationships():
> runtime: 36 msec
> SQL:Hibernate: select bioentry0_.bioentry_id as bioentry1_89_,
> bioentry0_.name as name89_, bioentry0_.identifier as identifier89_,
> bioentry0_.accession as accession89_, bioentry0_.description as
> descript5_89_, bioentry0_.version as version89_, bioentry0_.division as
> division89_, bioentry0_.taxon_id as taxon8_89_, bioentry0_.biodatabase_id
> as
> biodatab9_89_, bioentry0_1_.version as version93_, bioentry0_1_.length as
> length93_, bioentry0_1_.alphabet as alphabet93_, bioentry0_1_.seq as
> seq93_,
> case when bioentry0_1_.bioentry_id is not null then 2 when
> bioentry0_.bioentry_id is not null then 0 end as clazz_ from
> unigene.bioentry bioentry0_ left outer join unigene.biosequence
> bioentry0_1_
> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
> unigene.biosequence bioentry0_2_ on
> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where bioentry0_.name=?
> Hibernate: select relationsh0_.object_bioentry_id as object3_1_,
> relationsh0_.bioentry_relationship_id as bioentry1_1_,
> relationsh0_.bioentry_relationship_id as bioentry1_95_0_,
> relationsh0_.term_id as term2_95_0_, relationsh0_.object_bioentry_id as
> object3_95_0_, relationsh0_.subject_bioentry_id as subject4_95_0_,
> relationsh0_.rank as rank95_0_ from unigene.bioentry_relationship
> relationsh0_ where relationsh0_.object_bioentry_id=?
> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
> namespace0_.description as descript4_80_0_ from unigene.biodatabase
> namespace0_ where namespace0_.biodatabase_id=?
> Hibernate: select bioentry0_.bioentry_id as bioentry1_89_0_,
> bioentry0_.name
> as name89_0_, bioentry0_.identifier as identifier89_0_,
> bioentry0_.accession
> as accession89_0_, bioentry0_.description as descript5_89_0_,
> bioentry0_.version as version89_0_, bioentry0_.division as division89_0_,
> bioentry0_.taxon_id as taxon8_89_0_, bioentry0_.biodatabase_id as
> biodatab9_89_0_, bioentry0_1_.version as version93_0_, bioentry0_1_.length
> as length93_0_, bioentry0_1_.alphabet as alphabet93_0_, bioentry0_1_.seq as
> seq93_0_, case when bioentry0_1_.bioentry_id is not null then 2 when
> bioentry0_.bioentry_id is not null then 0 end as clazz_0_ from
> unigene.bioentry bioentry0_ left outer join unigene.biosequence
> bioentry0_1_
> on bioentry0_.bioentry_id=bioentry0_1_.bioentry_id left outer join
> unigene.biosequence bioentry0_2_ on
> bioentry0_.bioentry_id=bioentry0_2_.bioentry_id where
> bioentry0_.bioentry_id=?
> Hibernate: select namespace0_.biodatabase_id as biodatab1_80_0_,
> namespace0_.name as name80_0_, namespace0_.authority as authority80_0_,
> namespace0_.description as descript4_80_0_ from unigene.biodatabase
> namespace0_ where namespace0_.biodatabase_id=?
> Hibernate: select term0_.term_id as term1_84_0_, term0_.name as name84_0_,
> term0_.identifier as identifier84_0_, term0_.definition as definition84_0_,
> term0_.is_obsolete as is5_84_0_, term0_.ontology_id as ontology6_84_0_ from
> unigene.term term0_ where term0_.term_id=?
> Hibernate: select ontology0_.ontology_id as ontology1_83_0_,
> ontology0_.name
> as name83_0_, ontology0_.definition as definition83_0_ from
> unigene.ontology
> ontology0_ where ontology0_.ontology_id=?
> Hibernate: select termset0_.ontology_id as ontology6_1_, termset0_.term_id
> as term1_1_, termset0_.term_id as term1_84_0_, termset0_.name as name84_0_,
> termset0_.identifier as identifier84_0_, termset0_.definition as
> definition84_0_, termset0_.is_obsolete as is5_84_0_, termset0_.ontology_id
> as ontology6_84_0_ from unigene.term termset0_ where
> termset0_.ontology_id=?
> Hibernate: select tripleset0_.ontology_id as ontology5_1_,
> tripleset0_.term_relationship_id as term1_1_,
> tripleset0_.term_relationship_id as term1_87_0_,
> tripleset0_.subject_term_id
> as subject2_87_0_, tripleset0_.object_term_id as object3_87_0_,
> tripleset0_.predicate_term_id as predicate4_87_0_, tripleset0_.ontology_id
> as ontology5_87_0_ from unigene.term_relationship tripleset0_ where
> tripleset0_.ontology_id=?
> Hibernate: select rankedcros0_.term_id as term1_0_, rankedcros0_.dbxref_id
> as dbxref2_0_, rankedcros0_.rank as rank0_ from unigene.term_dbxref
> rankedcros0_ where rankedcros0_.term_id=?
> Hibernate: select synonymset0_.term_id as term1_0_, synonymset0_.synonym as
> synonym0_ from unigene.term_synonym synonymset0_ where
> synonymset0_.term_id=?
>
> --
> Christian Köberle
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the biojava-dev mailing list