[Biojava-l] [Biojava-dev] [Fwd: large genbank data]
Richard Holland
dicknetherlands at gmail.com
Fri Jul 18 08:47:08 UTC 2008
In order to persist to BioSQL, BioJava has to convert the symbol list
into a string so that it can pass it to JDBC via Hibernate. Therefore
the maximum length of a sequence you wish to persist to BioSQL is the
maximum length of a string in Java, which is 65536 (2^16) if you are
working in a UTF-8 environment.
2008/7/18 Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>:
> Hi Mark,
>
> What is the maximum sequence length that a RichSequence can handle?
>
> java -Xms1024m -Xmx1256m -jar loader.jar
> .
> 16:09:00,173 INFO Loader:296 - D:\AE005174.gbk is readable.
> 16:09:06,704 INFO Loader:326 - Loading sequence AE005174 with identifier
> 56384585, length 5528445 and alphabet DNA...
> org.hibernate.PropertyAccessException: Exception occurred inside getter of
> org.biojavax.bio.seq.SimpleRichSequence.sequenceLength
>
> Rey Vincent Babilonia wrote:
>>
>> Hi Mark,
>>
>> At first it throws an out of memory exception. My workaround is to
>> subdivide the sequence file into individual GenBank files.
>>
>> The error now is that if a GenBank sequence has an 'empty alphabet', it
>> does not get loaded to BioSQL. My workaround is to check if
>> sequence.getAlphabet().getName() is DNA.
>>
>> Thanks.
>>
>> Mark Schreiber wrote:
>>>
>>> Hi -
>>>
>>> Is the code throwing an exception or running out of memory??
>>>
>>> Can you send an example program and the problem you encounter to the
>>> list.
>>> - Mark
>>>
>>> On Thu, May 29, 2008 at 9:53 AM, Rey Vincent Babilonia
>>> <rvincent at asti.dost.gov.ph> wrote:
>>>>
>>>> -------- Original Message --------
>>>> Subject: large genbank data
>>>> Date: Wed, 28 May 2008 18:02:48 +0800
>>>> From: Rey Vincent Babilonia <rvincent at asti.dost.gov.ph>
>>>> To: biojava-l at biojava.org
>>>>
>>>> hi,
>>>>
>>>> anybody tried uploading a large genbank data (e.g.
>>>> ftp://bio-mirror.net/biomirror/genbank/gbbct1.seq.gz) to biosql?
>>>> load_seqdatabase.pl of bioperl can do this. i'm switching to biojava and
>>>> it can't read the sequence (maybe because it has 30000+ sequences).
>>>>
>>>> thanks.
>>>>
>>>> --
>>>> /**
>>>> * @author Rey Vincent P. Babilonia
>>>> * @number +63 2 426 9760 local 1302
>>>> * @pgp 0x383454CF <at> pgp.mit.edu
>>>> * @project Philippine Bioinformatics Solutions
>>>> * @program Philippine e-Science Grid
>>>> * @division Research and Development Division
>>>> * @agency Advanced Science and Technology Institute
>>>> * @url http://www.psigrid.gov.ph
>>>> */
>>>>
>>>>
>>>> --
>>>> /**
>>>> * @author Rey Vincent P. Babilonia
>>>> * @number +63 2 426 9760 local 1302
>>>> * @pgp 0x383454CF <at> pgp.mit.edu
>>>> * @project Philippine Bioinformatics Solutions
>>>> * @program Philippine e-Science Grid
>>>> * @division Research and Development Division
>>>> * @agency Advanced Science and Technology Institute
>>>> * @url http://www.psigrid.gov.ph
>>>> */
>>>>
>>>> No virus found in this outgoing message.
>>>> Checked by AVG.
>>>> Version: 8.0.100 / Virus Database: 269.24.2/1471 - Release Date:
>>>> 5/28/2008 5:33 PM
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>>
>
> --
> /**
> * @author Rey Vincent P. Babilonia
> * @number +63 2 426 9760 local 1302
> * @pgp 0x383454CF <at> pgp.mit.edu
> * @project Philippine Bioinformatics Solutions
> * @program Philippine e-Science Grid
> * @division Research and Development Division
> * @agency Advanced Science and Technology Institute
> * @url http://www.psigrid.gov.ph
> */
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
More information about the Biojava-l
mailing list