[Biojava-l] [Biojava-dev] [Fwd: large genbank data]

Rey Vincent Babilonia rvincent at asti.dost.gov.ph
Mon Jul 21 02:35:04 UTC 2008


Dear all,

Here's the complete stack trace:

10:26:14,796  INFO Loader:296 - D:\AE000521.gbk is readable.
10:26:16,046  INFO Loader:340 - Alphabet of AE000521 is Empty Alphabet. 
Skipping...
10:26:16,250  INFO Loader:296 - D:\AE004438.gbk is readable.
10:26:20,750 FATAL Loader:334 - Sequence AE004438 already exists.
10:26:20,921  INFO Loader:296 - D:\AE005174.gbk is readable.
10:26:28,328  INFO Loader:326 - Loading sequence AE005174 with 
identifier 56384585, length 5528445 and alphabet DNA...
org.hibernate.PropertyAccessException: Exception occurred inside getter 
of org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
         at 
org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:148)
         at 
org.hibernate.tuple.entity.AbstractEntityTuplizer.getPropertyValues(AbstractEntityTuplizer.java:256)
         at 
org.hibernate.tuple.entity.PojoEntityTuplizer.getPropertyValues(PojoEntityTuplizer.java:209)
         at 
org.hibernate.persister.entity.AbstractEntityPersister.getPropertyValues(AbstractEntityPersister.java:3581)
         at 
org.hibernate.event.def.DefaultMergeEventListener.copyValues(DefaultMergeEventListener.java:377)
         at 
org.hibernate.event.def.DefaultMergeEventListener.entityIsTransient(DefaultMergeEventListener.java:179)
         at 
org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:123)
         at 
org.hibernate.event.def.DefaultMergeEventListener.onMerge(DefaultMergeEventListener.java:53)
         at org.hibernate.impl.SessionImpl.fireMerge(SessionImpl.java:677)
         at org.hibernate.impl.SessionImpl.merge(SessionImpl.java:661)
         at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:328)
         at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
         at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
Caused by: java.lang.reflect.InvocationTargetException
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
         at java.lang.reflect.Method.invoke(Unknown Source)
         at 
org.hibernate.property.BasicPropertyAccessor$BasicGetter.get(BasicPropertyAccessor.java:145)
         ... 12 more
Caused by: java.lang.NullPointerException
         at 
org.biojavax.bio.seq.SimpleRichSequence.length(SimpleRichSequence.java:91)
         at 
org.biojavax.bio.seq.SimpleRichSequence.getSequenceLength(SimpleRichSequence.java:97)
         ... 17 more
10:26:28,937 ERROR AbstractBatcher:51 - Exception executing batch:
org.hibernate.StaleStateException: Batch update returned unexpected row 
count from update [0]; actual row count: 0; expected: 1
         at 
org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
         at 
org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
         at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
         at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
         at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
         at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
         at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168)
         at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
         at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
         at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
         at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351)
         at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
         at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
10:26:28,937 ERROR AbstractFlushingEventListener:301 - Could not 
synchronize database state with session
org.hibernate.StaleStateException: Batch update returned unexpected row 
count from update [0]; actual row count: 0; expected: 1
         at 
org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
         at 
org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
         at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
         at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
         at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
         at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
         at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168)
         at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
         at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
         at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
         at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351)
         at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
         at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)
Exception in thread "main" org.hibernate.StaleStateException: Batch 
update returned unexpected row count from update [0]; actual row count: 
0; expected: 1
         at 
org.hibernate.jdbc.Expectations$BasicExpectation.checkBatched(Expectations.java:61)
         at 
org.hibernate.jdbc.Expectations$BasicExpectation.verifyOutcome(Expectations.java:46)
         at 
org.hibernate.jdbc.BatchingBatcher.checkRowCounts(BatchingBatcher.java:68)
         at 
org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
         at 
org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:246)
         at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:266)
         at 
org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:168)
         at 
org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
         at 
org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
         at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
         at ph.gov.dost.asti.genbankers.Loader.load(Loader.java:351)
         at ph.gov.dost.asti.genbankers.Loader.<init>(Loader.java:137)
         at ph.gov.dost.asti.genbankers.Loader.main(Loader.java:416)

Richard Holland wrote:
> Hmm in that case it must be something else.
> 
> Your original mail only posted the first couple of lines of the stack
> trace. Could you post the whole thing so we can take a closer look?
> 
> 2008/7/18 Mark Schreiber <markjschreiber at gmail.com>:
>> Was looking on the internet ...
>>
>> So the Java spec says nothing about an upper limit however the sun JDK
>> implements String as a char[] (behind the scenes). Therefore I think
>> that on the Sun JDK with the right amount of RAM you could go to 2^32
>> (except for string literals as mentioned above) which is 4,294,967,296
>> characters. So a string of a sequence should be able to get to about 4
>> billion bases.
>>
>> Of course if you don't assign enough memory to the JVM ( -Xmx4G) you
>> won't be able to get close. Of course even if you can assign that much
>> that doesn't account for all the other Java overhead and all the stuff
>> Hibernate is doing with proxy classes etc.  Also BioSQL usually
>> defines sequence as a CLOB so depending on your DB implementation
>> there may be a limit on that. On a 32 bit machine 4GB is all you can
>> get per CPU so you would have issues trying to do anything bigger.
>>
>> Anyhow I know I have stored human chromosome 1 (approx 1 billion bases
>> in memory).
>>
>>
>>
>> - Mark
>>
>> On Fri, Jul 18, 2008 at 6:45 PM, James Carman
>> <james at carmanconsulting.com> wrote:
>>> That is a limitation for string literals, not any string.  Correct?
>>>
>>> On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland
>>> <dicknetherlands at gmail.com> wrote:
>>>> In order to persist to BioSQL, BioJava has to convert the symbol list
>>>> into a string so that it can pass it to JDBC via Hibernate. Therefore
>>>> the maximum length of a sequence you wish to persist to BioSQL is the
>>>> maximum length of a string in Java, which is 65536 (2^16) if you are
>>>> working in a UTF-8 environment.
>>>>
>>>> 2008/7/18 Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>:
>>>>> Hi Mark,
>>>>>
>>>>> What is the maximum sequence length that a RichSequence can handle?
>>>>>
>>>>> java -Xms1024m -Xmx1256m -jar loader.jar
>>>>> .
>>>>> 16:09:00,173  INFO Loader:296 - D:\AE005174.gbk is readable.
>>>>> 16:09:06,704  INFO Loader:326 - Loading sequence AE005174 with identifier
>>>>> 56384585, length 5528445 and alphabet DNA...
>>>>> org.hibernate.PropertyAccessException: Exception occurred inside getter of
>>>>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
>>>>>
>>>>> Rey Vincent Babilonia wrote:
>>>>>> Hi Mark,
>>>>>>
>>>>>> At first it throws an out of memory exception. My workaround is to
>>>>>> subdivide the sequence file into individual GenBank files.
>>>>>>
>>>>>> The error now is that if a GenBank sequence has an 'empty alphabet', it
>>>>>> does not get loaded to BioSQL. My workaround is to check if
>>>>>> sequence.getAlphabet().getName() is DNA.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Mark Schreiber wrote:
>>>>>>> Hi -
>>>>>>>
>>>>>>> Is the code throwing an exception or running out of memory??
>>>>>>>
>>>>>>> Can you send an example program and the problem you encounter to the
>>>>>>> list.
>>>>>>> - Mark
>>>>>>>
>>>>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia
>>>>>>> <rvincent at asti.dost.gov.ph> wrote:
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: large genbank data
>>>>>>>> Date: Wed, 28 May 2008 18:02:48 +0800
>>>>>>>> From: Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>
>>>>>>>> To: biojava-l at biojava.org
>>>>>>>>
>>>>>>>> hi,
>>>>>>>>
>>>>>>>> anybody tried uploading a large genbank data (e.g.
>>>>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql?
>>>>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and
>>>>>>>> it can't read the sequence (maybe because it has 30000+ sequences).
>>>>>>>>
>>>>>>>> thanks.
>>>>>>>>
>>>>>>>> --
>>>>>>>> /**
>>>>>>>>  * @author   Rey Vincent P. Babilonia
>>>>>>>>  * @number   +63 2 426 9760 local 1302
>>>>>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>>>>>  * @project  Philippine Bioinformatics Solutions
>>>>>>>>  * @program  Philippine e-Science Grid
>>>>>>>>  * @division Research and Development Division
>>>>>>>>  * @agency   Advanced Science and Technology Institute
>>>>>>>>  * @url      http://www.psigrid.gov.ph
>>>>>>>>  */
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> /**
>>>>>>>>  * @author   Rey Vincent P. Babilonia
>>>>>>>>  * @number   +63 2 426 9760 local 1302
>>>>>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>>>>>  * @project  Philippine Bioinformatics Solutions
>>>>>>>>  * @program  Philippine e-Science Grid
>>>>>>>>  * @division Research and Development Division
>>>>>>>>  * @agency   Advanced Science and Technology Institute
>>>>>>>>  * @url      http://www.psigrid.gov.ph
>>>>>>>>  */
>>>>>>>>
>>>>>>>> No virus found in this outgoing message.
>>>>>>>> Checked by AVG.
>>>>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date:
>>>>>>>> 5/28/2008 5:33 PM
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>> --
>>>>> /**
>>>>>  * @author   Rey Vincent P. Babilonia
>>>>>  * @number   +63 2 426 9760 local 1302
>>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>>  * @project  Philippine Bioinformatics Solutions
>>>>>  * @program  Philippine e-Science Grid
>>>>>  * @division Research and Development Division
>>>>>  * @agency   Advanced Science and Technology Institute
>>>>>  * @url      http://www.psigrid.gov.ph
>>>>>  */
>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-- 
/**
  * @author   Rey Vincent P. Babilonia
  * @number   +63 2 426 9760 local 1302
  * @pgp      0x383454CF <at> pgp.mit.edu
  * @project  Philippine Bioinformatics Solutions
  * @program  Philippine e-Science Grid
  * @division Research and Development Division
  * @agency   Advanced Science and Technology Institute
  * @url      http://www.psigrid.gov.ph
  */




More information about the Biojava-l mailing list