2

I'm using PostgreSQL 9.3. I want to understand if I have an option to make a constraint unique across the entire table vs. unique across a subset of the table (i.e. by using 2 columns in the unique constraint, I restrict the uniqueness), which one is better for lookups?

Consider this table where a unique alphanumeric code is allotted to each student of the class.

CREATE TABLE sc_table (
 name text NOT NULL,
 code text NOT NULL,
 class_id integer NOT NULL,
 CONSTRAINT class_fk FOREIGN KEY (class_id) REFERENCES class (id),
 CONSTRAINT sc_uniq UNIQUE (code)
);

Currently the code is unique across the entire table. However the specification says that it is sufficient for the code to be unique across the class only. For my design requirements there's no restriction either way.

However if I change the constraint to be unique for a given class only, how would it affect lookup by code?

Or, in other words, which of the following combination of constraint & lookup is the best speed wise:

-- 1. unique across entire table, lookup by value
CONSTRAINT sc_uniq UNIQUE (code) 
SELECT * FROM sc_table WHERE code='alpha-2-beta'
-- 2. unique across entire table, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code) 
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta' 
-- 3. unique per class, lookup by value
CONSTRAINT sc_uniq UNIQUE (code, class_id) 
SELECT * FROM sc_table WHERE code='alpha-2-beta'
-- 4. unique per class, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code, class_id) 
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta'

Question: My understanding is that 2 is better than 1 & 4 is better than 3. But which one's better between 1-vs-3 & 2-vs-4?

Update : Adding output of explain analyze. 3 is bad because there's no index for the lookup. 2 seems to be the best but the table is too small to conclude that.

-- 1
"Index Scan using sc_uniq on sc_table (cost=0.15..8.17 rows=1 width=72) (actual time=0.041..0.044 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
"Total runtime: 0.096 ms"
-- 2
"Index Scan using sc_uniq on sc_table (cost=0.15..8.17 rows=1 width=72) (actual time=0.024..0.026 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
" Filter: (class_id = 1)"
"Total runtime: 0.056 ms"
-- 3
"Bitmap Heap Scan on sc_table2 (cost=4.18..12.64 rows=4 width=72) (actual time=0.052..0.053 rows=1 loops=1)"
" Recheck Cond: (code = 'code1'::text)"
" -> Bitmap Index Scan on sc_uniq2 (cost=0.00..4.18 rows=4 width=0) (actual time=0.039..0.039 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
"Total runtime: 0.121 ms"
-- 4
"Index Scan using sc_uniq2 on sc_table2 (cost=0.15..8.17 rows=1 width=72) (actual time=0.036..0.039 rows=1 loops=1)"
" Index Cond: ((code = 'code1'::text) AND (class_id = 1))"
"Total runtime: 0.093 ms"
asked Dec 20, 2014 at 18:24

3 Answers 3

1

Your combinations in order of typical performance:

1. > 2. > 4. ( > 3.)

3. is invalid. If rows are only unique per (code, class_id), the lookup by code alone can return multiple rows and is different from the rest.

2. is pointless. If code is unique, there is no point in adding another predicate on class_id - except to verify that a given code actually belongs to a given class_id (and get no row otherwise).

Only 1. and 4. make sense and I would go with 1., of course. Unless you have additional requirements for the values of code, it's much more efficient to have one unique column. You could also make it the PK. Queries are simpler (one predicate instead of two), the (automatically created) unique index is potentially smaller (the most important factor here!), the lookup is typically slightly faster.

UPDATEs are also potentially more expensive for 2., where more columns trigger index updates. An UPDATE changing only code_id is cheaper for 1..

Your test result for 1. is counter-intuitive, maybe an artifact of your specific setup. Maybe you didn't prewarm the cache? Or some other random factor. It's pretty obvious from the EXPLAIN output: the only difference between 1. and 2. is the additional Filter: (class_id = 1) step. Nothing to gain here, you can only lose (even if very little in this case). 2. is typically a bit slower than 1. And 4. is also typically a bit slower than 1.

answered Dec 20, 2014 at 23:26
2
  • For the case 2 above, won't an additional filter on class_id reduce the search space, and hence be better than 1 ? Commented Jan 14, 2015 at 13:16
  • @user4150760: No. Nothing is faster than a single-column unique index. The relevant effect of including another column will be the growing index size, which makes everything a bit slower. Of course, if speed is essential, use a simple integer type PK instead of the slower string you display for code. Commented Jan 16, 2015 at 4:30
1

Since you are on Postgres 9.3, you are missing asking one of the most important questions, which is which columns do you really need the query to return? Do you really need *? If not, you may consider designing an index that can perform an Index Only Scan and avoid visiting the table at all. This would be the primary reason in your case to go with a composite key. For instance, if you wanted to look up name by code, you may try out this index and query:

CREATE UNIQUE INDEX sc_lookup ON sc_table (code, name);
SELECT name FROM sc_table WHERE code = 'alpha-2-beta';

This would normally give you an Index Only Scan. There are more factors to consider as Erwin points out well, but the potential for an Index Only Scan can be one of the most significant performance benefits if your lookup scenario fits.

answered Oct 1, 2015 at 13:26
0

According to the postgresql documentation here, single column indexes are preferred for performance reasons:

Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves space and time.

So if you can get away with indexing one column, then do so. But I don't think it's that much more to index a combination of two columns, so if a situation calls for it don't shy away from it.

answered Jun 26, 2024 at 22:43
1
  • 1
    As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. Commented Jun 27, 2024 at 10:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.