keithplayer at hotmail.com
Fri Oct 20 02:13:52 UTC 2006
I know that there may be some changes resulting from new GFF3 implementations,
but thought I would see if the following is useful anyway.
I implemented the R-tree binning schema as used by Bio::DB::GFF::Util::Binning
and as mention in this article:
I tested the following query on a normal table (no binning), but it assumes
that you know the longest range in the table. So for example with a table of
human genes, where the longest gene we know of is around 2.4Mb.
SELECT COUNT(*) as count FROM groups WHERE start > max(0,[start-2.4Mb]) AND
g.start < [end] AND g.end > [start] AND g.chromosome = '1'
so for 100Mb:101Mb
SELECT COUNT(*) as count FROM groups WHERE start > 97600000 AND g.start <
101000000 AND g.end > 100000000 AND g.chromosome = '1'
where [start] and [end] define the region of interest. This query outperforms
the R-Tree implementation on all tests that I have performed (for lengths of
200bp to 10Mb across a whole chromsome). Could this be of some practical use?
More information about the Bioperl-l