[Bioperl-l] Trying to get a mysql DB from genbank flat files
Raphael LaFrance
rafe@scinq.org
Wed, 14 Nov 2001 15:20:56 -0500
Hi all,
I just stumbled onto this group a few days back after reinventing some
of this stuff. Oh well, better late then never.
I was trying to use the genbank ripper via (v0.7.1 & v0.7.2)
shell> load_seqdatabase_test.pl test gbbct1.seq
The only difference with this & the standard (load_seqdatabase.pl)
module is that I filled in the: $host, $sqlname, $dbuser, $dbpass, &
$format variables.
It inserts about 91 records & then dies on the 92nd with
-------------------------------------
load_seqdatabase_test.pl test gbbct1.seq
Reading gbbct1.seq
DBD::mysql::st execute failed: You have an error in your SQL syntax near
')' at line 1 at
/usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/SeqLocationAdaptor.pm
line 214, <GEN0> line 6711.
DBD::mysql::st execute failed: You have an error in your SQL syntax near
')' at line 1 at
/usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/SeqLocationAdaptor.pm
line 214, <GEN0> line 6711.
-------------------------------------
I was trying to track down the offending code myself but I'm new to
Linux, Perl, bioperl, & bioinformatics so it's slow debugging just now.
BTW: The wheel that I reinvented was a Perl proggie that ripped Genbank
flat files & put them into a postgreSQL DB. Very straight forward &
primitive BUT I noticed that there is a bug in the genbank to mysql
parse similar to the one that I encountered when writing my own:
If, while in mysql, you do a:
select * from bioentry;
I get the 91 records OK but some of the "division" fields contain the
value "cir".
This indicates to me that LOCUS record is getting parsed with something
like:
split(/\s+/, $ln);
rather than using
substr($ln, ??, ??);
This record has a bunch junk in unexpected places so I found that using
the substr function yields better results.
I haven't tracked the module down yet so it's just a guess. I'll keep
looking because I'm learning a lot going thru your code.
PS: I'm really glad I found this group.
Thanks
Raphael (Rafe) LaFrance