0

I am running PostgreSQL 9.3. The database is of all of the medical NPIs (national provider numbers). I have created the following indexes:

CREATE INDEX indiv_provider_business_mailing_address_state_name_string_index
 ON individuals (provider_business_mailing_address_state_name);
CREATE INDEX indiv_provider_business_practice_location_address_state_name_string_index
 ON individuals (provider_business_practice_location_address_state_name);
CREATE INDEX indiv_upper_provider_business_mailing_address_state_name_string_index
 ON individuals (UPPER(provider_business_mailing_address_state_name));
CREATE INDEX indiv_upper_provider_business_practice_location_address_state_name_string_index
 ON individuals (UPPER(provider_business_practice_location_address_state_name));

but the query still table scans:

Postgres Running
 SELECT npi, provider_first_name,
 provider_last_name_legal_name,
 provider_organization_name_legal_business_name,
 provider_first_line_business_mailing_address,
 provider_second_line_business_mailing_address,
 provider_business_mailing_address_city_name,
 provider_business_mailing_address_state_name,
 substr(provider_business_mailing_address_postal_code, 1, 5)
 AS provider_business_mailing_address_postal_code,
 provider_business_mailing_address_telephone_number,
 provider_business_mailing_address_fax_number,
 provider_first_line_business_practice_location_address,
 provider_second_line_business_practice_location_address,
 provider_business_practice_location_address_city_name,
 provider_business_practice_location_address_state_name,
 substr(provider_business_practice_location_address_postal_code, 1, 5)
 AS provider_business_practice_location_address_postal_code,
 provider_business_practice_location_address_telephone_number,
 provider_business_practice_location_address_fax_number
 FROM individuals
 WHERE ((((provider_business_mailing_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))
 OR ((provider_business_practice_location_address_state_name = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))
 LIMIT 100

Explain output:

Postgres Limit (cost=0.00..400.58 rows=100 width=141)
 -> Seq Scan on individuals (cost=0.00..1210010.61 rows=302063 width=141)
 Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))

Any idea why?

UPDATE:

I changed the query as suggested by Craig Ringer (https://dba.stackexchange.com/a/118850/7924), but it is still table scanning:

Postgres Running
 SELECT npi,
 provider_first_name,
 provider_last_name_legal_name,
 provider_organization_name_legal_business_name,
 provider_first_line_business_mailing_address,
 provider_second_line_business_mailing_address,
 provider_business_mailing_address_city_name,
 provider_business_mailing_address_state_name,
 substr(provider_business_mailing_address_postal_code, 1, 5)
 AS provider_business_mailing_address_postal_code,
 provider_business_mailing_address_telephone_number,
 provider_business_mailing_address_fax_number,
 provider_first_line_business_practice_location_address,
 provider_second_line_business_practice_location_address,
 provider_business_practice_location_address_city_name,
 provider_business_practice_location_address_state_name,
 substr(provider_business_practice_location_address_postal_code, 1, 5)
 AS provider_business_practice_location_address_postal_code,
 provider_business_practice_location_address_telephone_number,
 provider_business_practice_location_address_fax_number
 FROM individuals
 WHERE ((((provider_business_mailing_address_state_name = ?))
 OR ((provider_business_practice_location_address_state_name = ?))))
 LIMIT 100

Explain output:

Postgres Limit (cost=0.00..400.58 rows=100 width=141)
 -> Seq Scan on individuals (cost=0.00..1210010.61 rows=302063 width=141)
 Filter: ((provider_business_mailing_address_state_name = 'PA'::text) OR (provider_business_practice_location_address_state_name = 'PA'::text))
asked Oct 21, 2015 at 18:23
1
  • 1
    Because you use OR, even if you find the right values in one index, any other row might match on hte other column. Try to use UNION instead, using each part of the WHERE clause in a separate query. Commented Oct 21, 2015 at 19:15

1 Answer 1

3

Your expression indexes aren't used, or usable.

You're indexing an expression that doesn't match what you're querying. The expression is on the parameter side, e.g.:

UPPER(regexp_replace(?, '\s+', ' ', 'g'))

PostgreSQL is constant-folding that based on the substituted parameter into, in the case you've supplied, the literal 'PA'. Then looking it up against the column provider_business_mailing_address_state_name.

If you intended to search against the uppercased, trimmed version of the provider_business_mailing_address_state_name column, like your expression index implies, you'd have to use the expression in the SQL too, e.g.

OR ((UPPER(regexp_replace(provider_business_practice_location_address_state_name, '\s+', ' ', 'g')) = UPPER(regexp_replace(?, '\s+', ' ', 'g'))))))

but frankly, in this case I suggest normalizing the data in the table in-place instead of relying on an expression index.

answered Oct 22, 2015 at 2:44
1
  • I'd give +2 for the normalization hint. Commented Oct 22, 2015 at 6:23

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.