[BioSQL-l] Sequence loading in biosql MySQL sooooo long

Eric Pelletier ericp at genoscope.cns.fr
Thu Oct 23 12:22:01 EDT 2003


Hi all,


I just deployed biosql from cvs few days ago, populated it with
taxonomy and ontology data in a really decent time, but I
experience a very long loading time fro loading sequence data.


The command used is :

perl bioperl/db/scripts/biosql/load_seqdatabase.pl --host masaya7
--dbname biosql --namespace bioperl --format swiss --dbuser ericp
--dbpass xxxxxx --lookup --noupdate Tmp/sprot.dat


The loading is of about 1 complete entry each 25 seconds...

I logged the queries on the MySQL server side, and observed that
only a single query blocks the system for each entry :

SELECT name.name, node.node_rank, node.left_value FROM taxon
node, taxon taxon, taxon_name name WHERE name.taxon_id =
node.taxon_id AND taxon.left_value > node.left_value and
taxon.left_value < node.right_value AND taxon.taxon_id = '1056'
AND name.name_class = 'scientific name' ORDER by node.left_value;


However, the same query, without the ORDER BY statement is quite
fast.
And the indexes seems to be OK for this table :

mysql> show index from taxon;
+-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name      | Seq_in_index | Column_name
| Collation | Cardinality | Sub_part | Packed | Null | Index_type
| Comment |
+-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| taxon |          0 | PRIMARY       |            1 | taxon_id
| A         |      193316 |     NULL | NULL   |      | BTREE
|         |
| taxon |          0 | ncbi_taxon_id |            1 |
ncbi_taxon_id   | A         |      193316 |     NULL | NULL   |
YES  | BTREE      |         |
| taxon |          0 | left_value    |            1 | left_value
| A         |      193316 |     NULL | NULL   | YES  | BTREE
|         |
| taxon |          0 | right_value   |            1 | right_value
| A         |      193316 |     NULL | NULL   | YES  | BTREE
|         |
| taxon |          1 | taxparent     |            1 |
parent_taxon_id | A         |       96658 |     NULL | NULL   |
YES  | BTREE      |         |
| taxon |          1 | test          |            1 | taxon_id
| A         |      193316 |     NULL | NULL   |      | BTREE
|         |
| taxon |          1 | test          |            2 | left_value
| A         |      193316 |     NULL | NULL   | YES  | BTREE
|         |
| taxon |          1 | test          |            3 | right_value
| A         |      193316 |     NULL | NULL   | YES  | BTREE
|         |
+-------+------------+---------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+




If I understand well the benchmarks, I would expect a speed
of about 50 entries per second.




So,

. Does someone experiments the same effect ?
. Does someone has any idea of what may happen here ?




Thanks a lot.


-- Eric Pelletier


More information about the BioSQL-l mailing list