[BioSQL-l] Sequence loading in biosql MySQL sooooo long
Eric Pelletier
ericp at genoscope.cns.fr
Thu Oct 23 12:22:01 EDT 2003
Hi all,
I just deployed biosql from cvs few days ago, populated it with
taxonomy and ontology data in a really decent time, but I
experience a very long loading time fro loading sequence data.
The command used is :
perl bioperl/db/scripts/biosql/load_seqdatabase.pl --host masaya7
--dbname biosql --namespace bioperl --format swiss --dbuser ericp
--dbpass xxxxxx --lookup --noupdate Tmp/sprot.dat
The loading is of about 1 complete entry each 25 seconds...
I logged the queries on the MySQL server side, and observed that
only a single query blocks the system for each entry :
SELECT name.name, node.node_rank, node.left_value FROM taxon
node, taxon taxon, taxon_name name WHERE name.taxon_id =
node.taxon_id AND taxon.left_value > node.left_value and
taxon.left_value < node.right_value AND taxon.taxon_id = '1056'
AND name.name_class = 'scientific name' ORDER by node.left_value;
However, the same query, without the ORDER BY statement is quite
fast.
And the indexes seems to be OK for this table :
mysql> show index from taxon;
+-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name
| Collation | Cardinality | Sub_part | Packed | Null | Index_type
| Comment |
+-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| taxon | 0 | PRIMARY | 1 | taxon_id
| A | 193316 | NULL | NULL | | BTREE
| |
| taxon | 0 | ncbi_taxon_id | 1 |
ncbi_taxon_id | A | 193316 | NULL | NULL |
YES | BTREE | |
| taxon | 0 | left_value | 1 | left_value
| A | 193316 | NULL | NULL | YES | BTREE
| |
| taxon | 0 | right_value | 1 | right_value
| A | 193316 | NULL | NULL | YES | BTREE
| |
| taxon | 1 | taxparent | 1 |
parent_taxon_id | A | 96658 | NULL | NULL |
YES | BTREE | |
| taxon | 1 | test | 1 | taxon_id
| A | 193316 | NULL | NULL | | BTREE
| |
| taxon | 1 | test | 2 | left_value
| A | 193316 | NULL | NULL | YES | BTREE
| |
| taxon | 1 | test | 3 | right_value
| A | 193316 | NULL | NULL | YES | BTREE
| |
+-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
If I understand well the benchmarks, I would expect a speed
of about 50 entries per second.
So,
. Does someone experiments the same effect ?
. Does someone has any idea of what may happen here ?
Thanks a lot.
-- Eric Pelletier
More information about the BioSQL-l
mailing list