[Open-bio-l] eyeballs needed -- my biosql install diary

Chris Mungall cjm@fruitfly.org
Mon, 10 Jun 2002 18:53:40 -0700 (PDT)


I've never been able to convince MySQL to store a whole drosophila
chromosome arm (~20mb) no matter how much I play with the parameters.

I wonder how easily pg deals with this?

We need a way to deal with chrom arms that is MySQL friendly. One plan was
to "shred" the sequence into chunks and get the adapters to reassemble.
The plan kind of stalled because it's tied in with the assembly business.
We really need this since loading whole genome assemblies from gb will be
one of the big uses of bioSQL. Sorry, I need to dedicate some time to
moving forward on this, don't know if I will. as usual, who codes this
wins...

On Thu, 30 May 2002, KATAYAMA Toshiaki wrote:

> Hello,
>
> Chris, thank you for your information!
>
> I have changed max_allowed_packet (and sort_buffer, record_buffer) size
> to 16MB, and almost all RefSeq entries could be successfully loaded
> into BioSQL, except for some entries containing >16MB sequence
> (i.e. NC_003282 - C. elegans chromosome IV, 17484798 bp).
>
> Then, I increased these parameter size upto 20MB, however this entry
> could not be loaded.
>
> -----X8-----X8-----
> % perl ./load_seqdatabase.pl -host localhost -sqldb biosql -dbuser root -format genbank rs NC_003282
> Reading NC_003282
> DBD::mysql::st execute failed: MySQL server has gone away at /usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/PrimarySeqAdaptor.pm line 130, <GEN0> line 322491.
> DBD::mysql::st execute failed: MySQL server has gone away at /usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/PrimarySeqAdaptor.pm line 130, <GEN0> line 322491.
> -----X8-----X8-----
>
> I also tried changing biosequence_str from mediumtext to longtext,
> but the same error still occured.  Hmm..
>
> Regards,
> Toshiaki Katayama
> --
> Kanehisa laboratory (Bioinformatics Center)
> Institute for Chemical Research, Kyoto Univ.
> Gokasho, Uji, Kyoto 611-0011, Japan
> TEL +81 774 38 3272, FAX +81 774 38 3269
> http://web.kuicr.kyoto-u.ac.jp/~katayama/
> http://bioruby.org/ (k@bioruby.org)
>
>
> At Wed, 29 May 2002 16:45:04 -0400,
> Chris Dagdigian wrote:
> >
> >
> > Hello,
> >
> > Just in case you have not solved this yet it seems that you may need to
> > alter the MySQL configuration value "max_allowed_packet" to be a fairly
> > large number in order to handle very large sequence objects.
> >
> > Keith Allen reported this on bioperl-l; the specific message is online
> > at http://bioperl.org/pipermail/bioperl-l/2002-May/007987.html
> >
> > Hope this helps!
> >
> > I've incorporated several people's comments into my BioSQL diary and
> > will be putting an updated version online shortly.
> >
> > Regards,
> > Chris
> >
> >
> > KATAYAMA Toshiaki wrote:
> > > Hi,
> > >
> > > Is there any size limitation on biosequence?
> > >
> > > I tried to load recent RefSeq into BioSQL, I've got following errors:
> > >
> > >
> > > -----X8-----X8-----
> > > Reading ../refseq/rscu.gbff
> > > (..snip..)
> > > DBD::mysql::st execute failed: MySQL server has gone away at /usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/PrimarySeqAdaptor.pm line 130, <GEN0> line 1662456.
> > > DBD::mysql::st execute failed: MySQL server has gone away at /usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/PrimarySeqAdaptor.pm line 130, <GEN0> line 1662456.
> > > -----X8-----X8-----
> > >
> > > The line 1662456 of my RefSeq file was the entry NC_000918, which was
> > > A. aeolicus genome with length 1551335 bp.  I have already loaded
> > > part of GenBank (gbvrl*) and Swissprot (thanks to Chris's doc :-) on
> > > my BioSQL server, however, the maximum length in biosql at that time
> > > was around 368k bp.
> > >
> > >
> > > I also want to know how BioSQL stores over the 16MB sequence entry
> > > (i.e. Arabidopsis chromosome in RefSeq) into biosequence table
> > > with MySQL's mediumtext (L < 2^24).
> > >
> > > My silly approach other than BioSQL to store GenBank/RefSeq on MySQL was
> > >   http://bioruby.org/cgi-bin/cvs/reviz/bioruby/sample/
> > > gb2tab.rb and gbtab2mysql.rb (used for http://gb.bioruby.org/),
> > > in this case, I have splitted long sequence into pieces with numbers.
> > >
> > > Furthermore, MySQL's longtext seems long enough, however it didn't
> > > work well when I tried. (I forgot details but, packet size limitation
> > > error or something was occured, my configuration problem?)
> > >
> > >
> > > At Wed, 15 May 2002 19:11:08 -0400,
> > > Chris Dagdigian wrote:
> > >
> > >>http://bioteam.net/dag/BioTeam-HOWTO-1-BIOSQL.html
> > >
> > >
> > > Your document seems very useful.  From this doc:
> > >
> > >
> > >>>Step 10 - What next?
> > >>>
> > >>>Figure out how to export/dump the database and see how quickly we
> > >>>can recreate the database with these raw files instead of
> > >>>laboriously using BioPerl to parse and load objects one at a
> > >>>time. Loading the database is slow and it may be cool to package up
> > >>>tab-delimited biosql exports so that others can load their own
> > >>>databases much faster.
> > >>
> > >
> > > Cool. If there were a repository of this format.
> > >
> > >
> > > Regards,
> > > Toshiaki Katayama
> > > --
> > > Kanehisa laboratory (Bioinformatics Center)
> > > Institute for Chemical Research, Kyoto Univ.
> > > Gokasho, Uji, Kyoto 611-0011, Japan
> > > TEL +81 774 38 3272, FAX +81 774 38 3269
> > > http://web.kuicr.kyoto-u.ac.jp/~katayama/
> > > http://bioruby.org/ (k@bioruby.org)
> > > _______________________________________________
> > > Open-Bio-l mailing list
> > > Open-Bio-l@open-bio.org
> > > http://open-bio.org/mailman/listinfo/open-bio-l
> >
> >
> > --
> > Chris Dagdigian, <dag@sonsorol.org>
> > Life Science IT & Research Computing Freelancer
> > Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
> > Yahoo IM: craffi
> >
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l@open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l
>