[Bioperl-l] Trying to get a mysql DB from genbank flat files

Raphael LaFrance rafe@scinq.org
Wed, 14 Nov 2001 15:20:56 -0500


Hi all,

I just stumbled onto this group a few days back after reinventing some
of this stuff. Oh well, better late then never.

I was trying to use the genbank ripper via (v0.7.1 & v0.7.2)
shell> load_seqdatabase_test.pl test gbbct1.seq

The only difference with this & the standard (load_seqdatabase.pl)
module is that I filled in the: $host, $sqlname, $dbuser, $dbpass, &
$format variables.

It inserts about 91 records & then dies on the 92nd with 
-------------------------------------
load_seqdatabase_test.pl test gbbct1.seq
Reading gbbct1.seq
DBD::mysql::st execute failed: You have an error in your SQL syntax near
')' at line 1 at
/usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/SeqLocationAdaptor.pm
line 214, <GEN0> line 6711.
DBD::mysql::st execute failed: You have an error in your SQL syntax near
')' at line 1 at
/usr/local/lib/perl5/site_perl/5.6.1/Bio/DB/SQL/SeqLocationAdaptor.pm
line 214, <GEN0> line 6711.
-------------------------------------

I was trying to track down the offending code myself but I'm new to
Linux, Perl, bioperl, & bioinformatics so it's slow debugging just now.

BTW: The wheel that I reinvented was a Perl proggie that ripped Genbank
flat files & put them into a postgreSQL DB. Very straight forward &
primitive BUT I noticed that there is a bug in the genbank to mysql
parse similar to the one that I encountered when writing my own:
If, while in mysql, you do a:
select * from bioentry;
I get the 91 records OK but some of the "division" fields contain the
value "cir". 

This indicates to me that LOCUS record is getting parsed with something
like:
split(/\s+/, $ln); 
rather than using 
substr($ln, ??, ??); 
This record has a bunch junk in unexpected places so I found that using
the substr function yields better results.

I haven't tracked the module down yet so it's just a guess. I'll keep
looking because I'm learning a lot going thru your code.

PS: I'm really glad I found this group. 

Thanks

Raphael (Rafe) LaFrance