[Bioperl-l] tables missing in mysql biosql instance
Hilmar Lapp
hlapp at gmx.net
Thu Feb 20 09:03:22 EST 2003
On Thursday, February 20, 2003, at 06:09 AM, David Guzman wrote:
> Hi:
>
> I executed DROP on the swissprot - biosql db created yesterday. And
> today I have just repeated the process including the --safe flag, with
> the following command:
>
> [david at mandrake scripts]$ perl load_seqdatabase.pl --host localhost
> --dbname swbiosql --dbuser root --dbpass XXXXXX --driver mysql
> --namespace bioperl --safe --format swiss
> /opt/protdb/swissprot/sprot40.dat
>
> I checked the size of the folder containing the db (331M), is better
> than yesterday (24M), but it should be larger (399M) according to the
> HOWTO (my GBank with MySQL).
I would not go by comparing the size to the original flat file. There
is nothing that provides for a direct correlation or even equality,
except that it can't be 10x smaller of course. Try to count the number
of bioentries and relate to the number of entries in the swissprot file.
> In the screen I obtained similar error
> messages, like:
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SpeciesAdaptor (driver) failed, values
> were ("IAP-IL3","a-particle:Mouse intracisternal:Intracisternal
> A-particles:Retroviridae:Retroid viruses:Viruses","11754","Mouse
> intracisternal a-particle","-") FKs ()
> Duplicate entry 'Mouse intracisternal a-particle--' for key 3
> ---------------------------------------------------
>
Yeah I know about these. The bioperl swissprot parser has a problem
getting this esoteric 'species' right. Do you require these entries?
> -------------------- WARNING ---------------------
> MSG: Could not store P12894:
>
This is what you should watch out for. Capture the output in a log and
then count:
$ grep "MSG: Could not store" my.log | wc -l
The number you should see should be no bigger than 2 digits. The
accession#s in those lines will not be in your BioSQL instance, while
all others should be.
> and ...
>
> DBD::mysql::st execute failed: Duplicate entry '101583-178464' for key
> 1
> at
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm
> line 402, <GEN0> line 6360694.
>
> --lookup flag would help??? (for "Duplicate entry" complain?).
>
No it wouldn't. As I said, ignore these messages, as they are dealt
with. You're going to see a number of them, unfortunately (as many
entries e.g. reference the same dbxref twice). The only really
important message is "MSG: Could not store XXXX".
With --safe the script only dies if an exception is raised outside of
the bioperl-db adaptor code, e.g., if the parser dies. You should be
able to see that by looking at the last stack trace or error message in
your log.
> Then I am checking everything step by step, and I discovered that there
> are 2 tables missing: remote_seqfeature_name and ontology_relationship,
> how can I correct this problem with biosql-schema?.
With the latest before-singapore-change versions you shouldn't need the
remote_seqfeature_name table. The ontology_relationship table is in
sql/ontology/biosqldb-ontology-mysql.sql before Singapore, but the
bioperl API doesn't use it so far. (The table will be in the main DDL
after.)
Ewan, maybe it's not a bad idea to include instructions on how to
interpret your load log into the INSTALL document.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list