[Bioperl-l] Bio::DB::SeqFeature::Store::Pg index on "name": why varchar_pattern_ops?

Scott Cain scott at scottcain.net
Thu Apr 8 19:40:47 UTC 2010


>From what I've read, if you have both vanilla and varch_pattern_ops,
the query planner will decide (presumably correctly) which index to
use.

Scott


On Thu, Apr 8, 2010 at 3:05 PM, Robert Buels <rmb32 at cornell.edu> wrote:
> Hmm, well, the vanilla index seems necessary for my pg 8.3.9/ubuntu
> hardy/utf8 locales/UTF8 encoding installation.  If both a
> varchar_pattern_ops and a vanilla index are present, will installations that
> need the varchar_pattern_ops one use it?  Randall, do you have an
> installation you can test that on?
>
> Rob
>
>
> Scott Cain wrote:
>>
>> Hi Rob,
>>
>> I was a little concerned that this might happen.  This change was
>> suggested by Randall on the GBrowse mailing list a few weeks ago, and
>> after some testing on my machine, it seemed to work well.  The problem
>> was that wild card searches on the name column were seqscanning, and
>> the suggested index fixed the problem.  I wonder if the query planner
>> in postgres would be smart enough to use the appropriate index if we
>> created both one with and one without varchar_pattern_ops.
>>
>> The reason for using varchar_pattern_ops specifically was to allow for
>> searching in a non-C locale.  Another option would be to make the
>> creation of the index conditional when the database is created.  It
>> could use the standard C locale setting and if a flag is set, create
>> the index to allow non-C locale searches (and presumably, the default
>> C locale would be the right thing for most users).
>>
>> Scott
>>
>>
>> On Wed, Apr 7, 2010 at 8:23 PM, Robert Buels <rmb32 at cornell.edu> wrote:
>>>
>>> What's the thinking behind the BDBSFS::Pg (ha) index on the "name" table
>>> at
>>> Pg.pm line 282:
>>>
>>>   CREATE INDEX name_name_varchar_patt_ops_idx ON name USING BTREE
>>>   (lower(name) varchar_pattern_ops);
>>>
>>> Why the varchar_pattern_ops operator class?
>>>
>>> I'm testing an installation (backing gbrowse, of course) on Pg 8.3, and
>>> performance was really stinking on the SFS query:
>>>
>>>   SELECT f.id,f.object,f.typeid,f.seqid,f.start,f.end,f.strand
>>>   FROM feature as f, name as n
>>>   WHERE (n.id=f.id AND lower(n.name) = lower($1) AND n.display_name>0)
>>>
>>> so I explained it, giving
>>>
>>>                                     QUERY PLAN
>>>
>>> --------------------------------------------------------------------------------------
>>>  Nested Loop  (cost=0.00..130159.60 rows=11059 width=556)
>>>  ->  Seq Scan on name n  (cost=0.00..53489.41 rows=11059 width=4)
>>>        Filter: ((display_name > 0) AND (lower((name)::text) =
>>> 'foo'::text))
>>>  ->  Index Scan using feature_pkey on feature f  (cost=0.00..6.92 rows=1
>>> width=556)
>>>        Index Cond: (f.id = n.id)
>>>
>>>
>>> Seq scan, not good.  It's not using the name_name_varchar_patt_ops_idx.
>>>  So
>>> I added an index on just lower(name) without the varchar_pattern_ops
>>> business (create index lowername on name ( lower(name) )) and performance
>>> started getting fine again, with explain:
>>>
>>>
>>>                                     QUERY PLAN
>>>
>>> ------------------------------------------------------------------------------------
>>>  Nested Loop  (cost=640.81..175775.44 rows=20521 width=501)
>>>  ->  Bitmap Heap Scan on name n  (cost=640.81..31557.42 rows=20521
>>> width=4)
>>>        Recheck Cond: (lower((name)::text) = 'foo'::text)
>>>        Filter: (display_name > 0)
>>>        ->  Bitmap Index Scan on lowername  (cost=0.00..635.68 rows=20597
>>> width=0)
>>>              Index Cond: (lower((name)::text) = 'foo'::text)
>>>  ->  Index Scan using feature_pkey on feature f  (cost=0.00..7.02 rows=1
>>> width=501
>>>        Index Cond: (f.id = n.id)
>>> (8 rows)
>>>
>>>
>>> This has much better performance in practice, since it's not sequentially
>>> scanning the 4.1M row "name" table.
>>>
>>> This suggests to me that the varchar_pattern_ops class should be removed,
>>> but could one of the original BDBSFS::Pg authors (like Lincoln or Scott)
>>> weigh in on this?
>>>
>>> Rob
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Bioperl-l mailing list