[Biojava-l] [Biojava-dev] [Fwd: large genbank data]

James Carman james at carmanconsulting.com
Fri Jul 18 10:45:50 UTC 2008


That is a limitation for string literals, not any string.  Correct?

On Fri, Jul 18, 2008 at 4:47 AM, Richard Holland
<dicknetherlands at gmail.com> wrote:
> In order to persist to BioSQL, BioJava has to convert the symbol list
> into a string so that it can pass it to JDBC via Hibernate. Therefore
> the maximum length of a sequence you wish to persist to BioSQL is the
> maximum length of a string in Java, which is 65536 (2^16) if you are
> working in a UTF-8 environment.
>
> 2008/7/18 Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>:
>> Hi Mark,
>>
>> What is the maximum sequence length that a RichSequence can handle?
>>
>> java -Xms1024m -Xmx1256m -jar loader.jar
>> .
>> 16:09:00,173  INFO Loader:296 - D:\AE005174.gbk is readable.
>> 16:09:06,704  INFO Loader:326 - Loading sequence AE005174 with identifier
>> 56384585, length 5528445 and alphabet DNA...
>> org.hibernate.PropertyAccessException: Exception occurred inside getter of
>> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
>>
>> Rey Vincent Babilonia wrote:
>>>
>>> Hi Mark,
>>>
>>> At first it throws an out of memory exception. My workaround is to
>>> subdivide the sequence file into individual GenBank files.
>>>
>>> The error now is that if a GenBank sequence has an 'empty alphabet', it
>>> does not get loaded to BioSQL. My workaround is to check if
>>> sequence.getAlphabet().getName() is DNA.
>>>
>>> Thanks.
>>>
>>> Mark Schreiber wrote:
>>>>
>>>> Hi -
>>>>
>>>> Is the code throwing an exception or running out of memory??
>>>>
>>>> Can you send an example program and the problem you encounter to the
>>>> list.
>>>> - Mark
>>>>
>>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia
>>>> <rvincent at asti.dost.gov.ph> wrote:
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject: large genbank data
>>>>> Date: Wed, 28 May 2008 18:02:48 +0800
>>>>> From: Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>
>>>>> To: biojava-l at biojava.org
>>>>>
>>>>> hi,
>>>>>
>>>>> anybody tried uploading a large genbank data (e.g.
>>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql?
>>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and
>>>>> it can't read the sequence (maybe because it has 30000+ sequences).
>>>>>
>>>>> thanks.
>>>>>
>>>>> --
>>>>> /**
>>>>>  * @author   Rey Vincent P. Babilonia
>>>>>  * @number   +63 2 426 9760 local 1302
>>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>>  * @project  Philippine Bioinformatics Solutions
>>>>>  * @program  Philippine e-Science Grid
>>>>>  * @division Research and Development Division
>>>>>  * @agency   Advanced Science and Technology Institute
>>>>>  * @url      http://www.psigrid.gov.ph
>>>>>  */
>>>>>
>>>>>
>>>>> --
>>>>> /**
>>>>>  * @author   Rey Vincent P. Babilonia
>>>>>  * @number   +63 2 426 9760 local 1302
>>>>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>>>>  * @project  Philippine Bioinformatics Solutions
>>>>>  * @program  Philippine e-Science Grid
>>>>>  * @division Research and Development Division
>>>>>  * @agency   Advanced Science and Technology Institute
>>>>>  * @url      http://www.psigrid.gov.ph
>>>>>  */
>>>>>
>>>>> No virus found in this outgoing message.
>>>>> Checked by AVG.
>>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date:
>>>>> 5/28/2008 5:33 PM
>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>>
>>>
>>
>> --
>> /**
>>  * @author   Rey Vincent P. Babilonia
>>  * @number   +63 2 426 9760 local 1302
>>  * @pgp      0x383454CF <at> pgp.mit.edu
>>  * @project  Philippine Bioinformatics Solutions
>>  * @program  Philippine e-Science Grid
>>  * @division Research and Development Division
>>  * @agency   Advanced Science and Technology Institute
>>  * @url      http://www.psigrid.gov.ph
>>  */
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list