[Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema)

Deepak Sheoran sheoran143 at gmail.com
Thu Mar 25 01:19:29 UTC 2010


I am writing this email again, I didn't get any response weather this 
bugs are patched or are they lost some where on mailing list. I am not 
sure that's why I am writing this back. I don't know how to apply this 
patch So I am counting on you guys to apply theses patch and reply me 
back so I know its fixed.



Thanks
Deepak Sheoran


Hi
In response to bug fix suggested by Richard I have created some patches. 
We need to apply these to fix biojava from processing references from a 
genbank record in a wrong manner which cause more hibernate exceptions. 
After applying patch, reference resolution code will test pubmed or 
medline id, then if no match then test author/title/location, then if 
still no match create a new reference. I even tested it with 
GenbankRelease 175 and I gained almost 3159 more records in my database.

Can somebody please have a look on second issue of it and fix it
"

2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

"

Also I am planning on making a bridge between biosql database loaded 
using bioperl and biojava, here is my some of the investigation can you 
guys suggest some direction on it.
Have a look on attached files
1) Biojava_BioPerl_Diff.xls  ==> it have view of tables where genbank 
record is stored in biosql instance by bioperl and biojava
2) GenbankRecord.doc  ==> its word document having a genbank showing 
where its information goes in biosql using bioperl and biojava
3) BioSqlRichobjectBuilder.patch ==> patch needed for 
BioSqlRichObjectBuild.java class
4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class


Thanks
Deepak Sheoran



-------- Original Message --------
Subject: 	Re: Hibernate Exception and suggestion for change in BioSqlSchema
Date: 	Tue, 9 Feb 2010 20:34:32 +1300
From: 	Richard Holland <holland at eaglegenomics.com>
To: 	Deepak Sheoran <sheoran143 at gmail.com>
CC: 	biojava-l at biojava.org



Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text.

However, in answer to your two questions:

   1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March).

   2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from).

cheers,
Richard

On 9 Feb 2010, at 20:21, Deepak Sheoran wrote:

>
>  Hi Richard
>
>  Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message.
>
>
>  Thanks
>  Deepak Sheoran
>  -------- Original Message --------
>  Subject:	Hibernate Exception and suggestion for change in BioSqlSchema
>  Date:	Wed, 03 Feb 2010 08:07:35 -0600
>  From:	Deepak Sheoran<sheoran143 at gmail.com>
>  To:	biojava-l at lists.open-bio.org
>
>  Hi guys,
>
>  A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html
>  On Richard  suggestion in above link  I am able to resolve some of  issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us.
>  	• The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id.
>  This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object .
>   Now when you tie RichObjectFactory to a  active hibernate session then the class  "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible  for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database.
>  But problem is with below part of that method:
>  …..LineNumber: 114
>  else if (SimpleDocRef.class.isAssignableFrom(clazz))
>   {                queryType = "DocRef";
>                  // convert List constructor to String representation for query
>                  ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true));
>                  if (ourParamsList.size()<3) {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null";
>                  } else {
>                          queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?";
>                  }
>   }
>  ..LineNubmer: 123
>  Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code
>  ….LineNumber: 447
>  else {
>                                          try {
>                                              CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)});
>                                              RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount);
>                                              rlistener.getCurrentFeature().addRankedCrossRef(rcr);
>                                          } catch (ChangeVetoException e) {
>                                              throw new ParseException(e+", accession:"+accession);
>                                          }
>                                      }
>                      …..LineNumber:455
>  Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of  "unique constraint on dbxref_id" column.
>
>  The only way to get these record in database is:
>  		• The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table.  Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them.
>  		• Second solution is slightly difficult to implement, is to change the way  "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)"  make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session.
>
>  Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email)
>  Reference_id
>  Dbxref_id
>  Location
>  Title
>  Authors
>  crc
>  216
>  18554304
>  FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  9E940E01F4BE3CD0
>  230
>  18554304
>  FEMS Microbiol. Ecol. 66 (3), 528-536 (2008)
>  Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model
>  Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H.
>  D3BC0C17F3F786C9
>  415
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  60AEDFA0CEEACC38
>  969
>  16790744
>  Infect. Immun. 74 (7), 3715-3726 (2006)
>  Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences
>  Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A.
>  4B1232999F6E8130
>  929
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  3E79B40DD2AAA2B7
>  932
>  8688087
>  Science 273 (5278), 1058-1073 (1996)
>  Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii
>  Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C.
>  094EB3384F8D6DE8
>  1426
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M.
>  357648D8FD8C6C8A
>  1481
>  10684935
>  Nucleic Acids Res. 28 (6), 1397-1406 (2000)
>  Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39
>  Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C.
>  115411EB2DEE5654
>  1497
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D5D376EECCD186B
>  1501
>  14689165
>  Arch. Microbiol. 181 (2), 144-154 (2004)
>  The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner
>  Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E.
>  4D57954EECDED66B
>  1556
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  698688FB6DB95247
>  1559
>  18060065
>  PLoS ONE 2 (12), E1271 (2007)
>  Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids
>  Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S.
>  E25E1BA99DB18F3D
>
>  	• The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>  		• Which means in richsequence object some feature have location object which have its feature set to null.
>  		• My Observation:
>  			• Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record
>  			• After catching the hibernate exception I went through all the features and either biojava or hibernate  changed the object type of a CompoundRichLocation  to SimpleRichLocation and set the feature variable to null.
>  			• Below is the screen shot of one of my tests
>  				• Settings before trying to persits the richsequence object to database
>
>  <Mail Attachment.png>
>>  		• After trying to persits the richsequence object to database and got in hibernate exception catch
>
>  		•<Mail Attachment.png>
>
>  		• So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening.
>  		• Some extra information to make things more clear to you guys.
>  			• Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object.
>  				• LOCUS       AE001439             1643831 bp    DNA     circular BCT 19-JAN-2006
>  					• richSequence.feature Index : 2540 and line number in the genbank record : 22115
>  				• LOCUS       CP001189             3887492 bp    DNA     circular BCT 16-OCT-2008
>  					• richSequence.feature Index : 127 and line number in the genbank record : 2137
>  				• LOCUS       CP001292              328635 bp    DNA     circular BCT 17-DEC-2008
>  					• richSequence.feature Index : 389 and line number in the genbank record : 3632
>  				• LOCUS       AM279694              238517 bp    DNA     linear   BCT 23-OCT-2008
>  					• richSequence.feature Index : 47 and line number in the genbank record : 4841
>  				• LOCUS       CR931663               18517 bp    DNA     linear   BCT 18-SEP-2008
>  					• richSequence.feature Index : 45 and line number in the genbank record : 442
>  		• The complete exception msg :
>  org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature
>          at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507)
>          at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499)
>          at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218)
>          at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296)
>          at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242)
>          at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219)
>          at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169)
>          at org.hibernate.engine.Cascade.cascade(Cascade.java:130)
>          at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334)
>          at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181)
>          at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187)
>          at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172)
>          at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27)
>          at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70)
>          at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535)
>          at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523)
>          at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78)
>
>

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com
http://www.eaglegenomics.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Biojava_BioPerl_diff.xls
Type: application/vnd.ms-excel
Size: 346624 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0001.xls>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: BioSqlRichObjectBuilder.patch
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0002.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: GenbankFormat.patch
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0003.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GenbankRecord.doc
Type: application/msword
Size: 59392 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20100324/7ecffa4a/attachment-0001.doc>


More information about the Biojava-l mailing list