[Biojava-dev] [Biojava-l] [Fwd: large genbank data]

Richard Holland dicknetherlands at gmail.com
Fri Jul 18 08:47:08 UTC 2008


In order to persist to BioSQL, BioJava has to convert the symbol list
into a string so that it can pass it to JDBC via Hibernate. Therefore
the maximum length of a sequence you wish to persist to BioSQL is the
maximum length of a string in Java, which is 65536 (2^16) if you are
working in a UTF-8 environment.

2008/7/18 Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>:
> Hi Mark,
>
> What is the maximum sequence length that a RichSequence can handle?
>
> java -Xms1024m -Xmx1256m -jar loader.jar
> .
> 16:09:00,173  INFO Loader:296 - D:\AE005174.gbk is readable.
> 16:09:06,704  INFO Loader:326 - Loading sequence AE005174 with identifier
> 56384585, length 5528445 and alphabet DNA...
> org.hibernate.PropertyAccessException: Exception occurred inside getter of
> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
>
> Rey Vincent Babilonia wrote:
>>
>> Hi Mark,
>>
>> At first it throws an out of memory exception. My workaround is to
>> subdivide the sequence file into individual GenBank files.
>>
>> The error now is that if a GenBank sequence has an 'empty alphabet', it
>> does not get loaded to BioSQL. My workaround is to check if
>> sequence.getAlphabet().getName() is DNA.
>>
>> Thanks.
>>
>> Mark Schreiber wrote:
>>>
>>> Hi -
>>>
>>> Is the code throwing an exception or running out of memory??
>>>
>>> Can you send an example program and the problem you encounter to the
>>> list.
>>> - Mark
>>>
>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia
>>> <rvincent at asti.dost.gov.ph> wrote:
>>>>
>>>> -------- Original Message --------
>>>> Subject: large genbank data
>>>> Date: Wed, 28 May 2008 18:02:48 +0800
>>>> From: Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>
>>>> To: biojava-l at biojava.org
>>>>
>>>> hi,
>>>>
>>>> anybody tried uploading a large genbank data (e.g.
>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql?
>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and
>>>> it can't read the sequence (maybe because it has 30000+ sequences).
>>>>
>>>> thanks.
>>>>
>>>> --
>>>> /**
>>>>  * @author   Rey Vincent P. Babilonia
>>>>  * @number   +63 2 426 9760 local 1302
>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>  * @project  Philippine Bioinformatics Solutions
>>>>  * @program  Philippine e-Science Grid
>>>>  * @division Research and Development Division
>>>>  * @agency   Advanced Science and Technology Institute
>>>>  * @url      http://www.psigrid.gov.ph
>>>>  */
>>>>
>>>>
>>>> --
>>>> /**
>>>>  * @author   Rey Vincent P. Babilonia
>>>>  * @number   +63 2 426 9760 local 1302
>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>  * @project  Philippine Bioinformatics Solutions
>>>>  * @program  Philippine e-Science Grid
>>>>  * @division Research and Development Division
>>>>  * @agency   Advanced Science and Technology Institute
>>>>  * @url      http://www.psigrid.gov.ph
>>>>  */
>>>>
>>>> No virus found in this outgoing message.
>>>> Checked by AVG.
>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date:
>>>> 5/28/2008 5:33 PM
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>>
>
> --
> /**
>  * @author   Rey Vincent P. Babilonia
>  * @number   +63 2 426 9760 local 1302
>  * @pgp      0x383454CF <at> pgp.mit.edu
>  * @project  Philippine Bioinformatics Solutions
>  * @program  Philippine e-Science Grid
>  * @division Research and Development Division
>  * @agency   Advanced Science and Technology Institute
>  * @url      http://www.psigrid.gov.ph
>  */
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



More information about the biojava-dev mailing list