[Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema)
Richard Holland
holland at eaglegenomics.com
Thu Mar 25 16:27:17 UTC 2010
Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL.
On 25 Mar 2010, at 01:19, Deepak Sheoran wrote:
> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed.
>
>
>
> Thanks
> Deepak Sheoran
>
>
> Hi
> In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database.
>
> Can somebody please have a look on second issue of it and fix it
> "
> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
> "
>
> Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it.
> Have a look on attached files
> 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava
> 2) GenbankRecord.doc ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava
> 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class
> 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class
>
>
> Thanks
> Deepak Sheoran
>
>
>
> -------- Original Message --------
> Subject: Re: Hibernate Exception and suggestion for change in BioSqlSchema
> Date: Tue, 9 Feb 2010 20:34:32 +1300
> From: Richard Holland <holland at eaglegenomics.com>
> To: Deepak Sheoran <sheoran143 at gmail.com>
> CC: biojava-l at biojava.org
>
> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.
>
> However, in answer to your two questions:
>
> 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).
>
> 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).
>
> cheers,
> Richard
>
> On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:
>
> >
> > Hi Richard
> >
> > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
> >
> >
> > Thanks
> > Deepak Sheoran
> > -------- Original Message --------
> > Subject: Hibernate Exception and suggestion for change in BioSqlSchema
> > Date: Wed, 03 Feb 2010 08:07:35 -0600
> > From: Deepak Sheoran
> <sheoran143 at gmail.com>
>
> > To:
> biojava-l at lists.open-bio.org
>
> >
> > Hi guys,
> >
> > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:
> http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>
> > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
> > • The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
> > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
> > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
> > But problem is with below part of that method:
> > …..LineNumber: 114
> > else if (SimpleDocRef.class.isAssignableFrom(clazz))
> > { queryType = "DocRef";
> > // convert List constructor to String representation for query
> > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
> > if (ourParamsList.size()<3) {
> > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
> > } else {
> > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
> > }
> > }
> > ..LineNubmer: 123
> > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
> > ….LineNumber: 447
> > else {
> > try {
> > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
> > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
> > rlistener.getCurrentFeature().addRankedCrossRef(rcr);
> > } catch (ChangeVetoException e) {
> > throw new ParseException(e+", accession:"+accession);
> > }
> > }
> > …..LineNumber:455
> > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column.
> >
> > The only way to get these record in database is:
> > • The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
> > • Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
> >
> > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
> > Reference_id
> > Dbxref_id
> > Location
> > Title
> > Authors
> > crc
> > 216
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > 9E940E01F4BE3CD0
> > 230
> > 18554304
> > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
> > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
> > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
> > D3BC0C17F3F786C9
> > 415
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 60AEDFA0CEEACC38
> > 969
> > 16790744
> > Infect. Immun. 74 (7), 3715-3726 (2006)
> > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
> > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
> > 4B1232999F6E8130
> > 929
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 3E79B40DD2AAA2B7
> > 932
> > 8688087
> > Science 273 (5278), 1058-1073 (1996)
> > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
> > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
> > 094EB3384F8D6DE8
> > 1426
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
> > 357648D8FD8C6C8A
> > 1481
> > 10684935
> > Nucleic Acids Res. 28 (6), 1397-1406 (2000)
> > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
> > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
> > 115411EB2DEE5654
> > 1497
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D5D376EECCD186B
> > 1501
> > 14689165
> > Arch. Microbiol. 181 (2), 144-154 (2004)
> > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
> > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
> > 4D57954EECDED66B
> > 1556
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > 698688FB6DB95247
> > 1559
> > 18060065
> > PLoS ONE 2 (12), E1271 (2007)
> > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
> > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
> > E25E1BA99DB18F3D
> >
> > • The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> > • Which means in richsequence object some feature have location object which have its feature set to null.
> > • My Observation:
> > • Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
> > • After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null.
> > • Below is the screen shot of one of my tests
> > • Settings before trying to persits the richsequence object to database
> >
> > <Mail Attachment.png>
> > •
> > • After trying to persits the richsequence object to database and got in hibernate exception catch
> >
> > • <Mail Attachment.png>
> >
> > • So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
> > • Some extra information to make things more clear to you guys.
> > • Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
> > • LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006
> > • richSequence.feature Index : 2540 and line number in the genbank record : 22115
> > • LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008
> > • richSequence.feature Index : 127 and line number in the genbank record : 2137
> > • LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008
> > • richSequence.feature Index : 389 and line number in the genbank record : 3632
> > • LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008
> > • richSequence.feature Index : 47 and line number in the genbank record : 4841
> > • LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008
> > • richSequence.feature Index : 45 and line number in the genbank record : 442
> > • The complete exception msg :
> > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
> > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
> > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
> > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
> > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
> > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
> > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
> > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
> > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
> > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
> > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
> > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
> > at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
> > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
> > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
> > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
> > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
> > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
> > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
> > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
> > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
> > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
> > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
> >
> >
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E:
> holland at eaglegenomics.com
> http://www.eaglegenomics.com/
>
>
>
> <Biojava_BioPerl_diff.xls><BioSqlRichObjectBuilder.patch><GenbankFormat.patch><GenbankRecord.doc>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the Biojava-l
mailing list