[BioSQL-l] Indexing of (seqfeature) locations?

Peter biopython at maubp.freeserve.co.uk
Wed Aug 26 12:29:56 UTC 2009


On Wed, Aug 26, 2009 at 1:07 PM, Hilmar Lapp<hlapp at gmx.net> wrote:
>
>
> On Aug 26, 2009, at 6:53 AM, Peter wrote:
>
>> The BioSQL schema includes a few indexes on the location table
>> (e.g. quoting the MySQL schema, but it looks the same on pg too):
>>
>> CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
>> [...]
>> Will these facilitate searches like this?:
>>
>> "SELECT ... WHERE 2000 <= location.start_pos
>> AND location.end_pos <= 5000 AND ..."
>>
>> Or, for this would it help to include:
>>
>> CREATE INDEX seqfeatureloc_start ON location(start_pos);
>> CREATE INDEX seqfeatureloc_start ON location(end_pos);
>
> With a decent RDBMS, having two indexes instead of a compound one will slow
> this query down. What the compound one won't help you with is if your query
> doesn't constrain the leading columns. For example, a compound index on
> (start_pos,end_pos) won't be used if you only constrain end_pos. If you want
> to do that, you need on index on (end_pos) too.

Thanks for your reply Hilmar. Just to make sure I understood, the current
BioSQL indexes are fine for this:

 "SELECT ... WHERE 2000 <= location.start_pos
AND location.end_pos <= 5000 AND ..."

but not so great for:

 "SELECT ... WHERE 2000 <= location.start_pos AND ..."

or,

 "SELECT ... WHERE location.end_pos <= 5000 AND ..."

Nevertheless, that should cover most usage.

Having just two separated indexes on start_pos and end_pos would
speed up queries on just start or end, but would slow down queries
using both.

Presumably having three indexes as follows would cover all these
examples efficiently, but at the cost of two more indexes?:

CREATE INDEX seqfeatureloc_start ON location(start_pos, end_pos);
CREATE INDEX seqfeatureloc_start ON location(start_pos);
CREATE INDEX seqfeatureloc_start ON location(end_pos);

If that is all accurate, the status quo is fine :)

Regards,

Peter



More information about the BioSQL-l mailing list