I am running PostgreSQL 9.3. The database is of all of the medical NPIs (national provider numbers). I have created the following indexes:
CREATE INDEX indiv_provider_business_mailing_address_state_name_string_index
ON individuals (provider_business_mailing_address_state_name);
CREATE INDEX indiv_provider_business_practice_location_address_state_name_string_index
ON individuals (provider_business_practice_location_address_state_name);
CREATE INDEX indiv_upper_provider_business_mailing_address_state_name_string_index
ON individuals (UPPER(provider_business_mailing_address_state_name));
CREATE INDEX indiv_upper_provider_business_practice_location_address_state_name_string_index
ON individuals (UPPER(provider_business_practice_location_address_state_name));
but the query still table scans:
Postgres Running
SELECT npi, provider_first_name,
provider_last_name_legal_name,
provider_organization_name_legal_business_name,
provider_first_line_business_mailing_address,
provider_second_line_business_mailing_address,
provider_business_mailing_address_city_name,
provider_business_mailing_address_state_name,
substr(provider_business_mailing_address_postal_code, 1, 5)
AS provider_business_mailing_address_postal_code,
provider_business_mailing_address_telephone_number,
provider_business_mailing_address_fax_number,
provider_first_line_business_practice_location_address,
provider_second_line_business_practice_location_address,
provider_business_practice_location_address_city_name,
provider_business_practice_location_address_state_name,
substr(provider_business_practice_location_address_postal_code, 1, 5)
AS provider_business_practice_location_address_postal_code,
provider_business_practice_location_address_telephone_number,
provider_business_practice_location_address_fax_number
FROM individuals
WHERE ((((provider_business_mailing_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))
OR ((provider_business_practice_location_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))
LIMIT 100
Explain output:
Postgres Limit (cost=0.00..400.58 rows=100 width=141)
-> Seq Scan on individuals (cost=0.00..1210010.61 rows=302063 width=141)
Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))
Any idea why?
UPDATE:
I changed the query as suggested by Craig Ringer (https://dba.stackexchange.com/a/118850/7924), but it is still table scanning:
Postgres Running
SELECT npi,
provider_first_name,
provider_last_name_legal_name,
provider_organization_name_legal_business_name,
provider_first_line_business_mailing_address,
provider_second_line_business_mailing_address,
provider_business_mailing_address_city_name,
provider_business_mailing_address_state_name,
substr(provider_business_mailing_address_postal_code, 1, 5)
AS provider_business_mailing_address_postal_code,
provider_business_mailing_address_telephone_number,
provider_business_mailing_address_fax_number,
provider_first_line_business_practice_location_address,
provider_second_line_business_practice_location_address,
provider_business_practice_location_address_city_name,
provider_business_practice_location_address_state_name,
substr(provider_business_practice_location_address_postal_code, 1, 5)
AS provider_business_practice_location_address_postal_code,
provider_business_practice_location_address_telephone_number,
provider_business_practice_location_address_fax_number
FROM individuals
WHERE ((((provider_business_mailing_address_state_name = ?))
OR ((provider_business_practice_location_address_state_name = ?))))
LIMIT 100
Explain output:
Postgres Limit (cost=0.00..400.58 rows=100 width=141)
-> Seq Scan on individuals (cost=0.00..1210010.61 rows=302063 width=141)
Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))
-
1Because you use OR, even if you find the right values in one index, any other row might match on hte other column. Try to use UNION instead, using each part of the WHERE clause in a separate query.jkavalik– jkavalik2015年10月21日 19:15:18 +00:00Commented Oct 21, 2015 at 19:15
1 Answer 1
Your expression indexes aren't used, or usable.
You're indexing an expression that doesn't match what you're querying. The expression is on the parameter side, e.g.:
UPPER(regexp_replace(?, '\s+', ' ', 'g'))
PostgreSQL is constant-folding that based on the substituted parameter into, in the case you've supplied, the literal 'PA'
. Then looking it up against the column provider_business_mailing_address_state_name
.
If you intended to search against the uppercased, trimmed version of the provider_business_mailing_address_state_name
column, like your expression index implies, you'd have to use the expression in the SQL too, e.g.
OR ((UPPER(regexp_replace(provider_business_practice_location_address_state_name, '\s+', ' ', 'g')) = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))
but frankly, in this case I suggest normalizing the data in the table in-place instead of relying on an expression index.
-
I'd give +2 for the normalization hint.András Váczi– András Váczi2015年10月22日 06:23:19 +00:00Commented Oct 22, 2015 at 6:23