Performance of single vs multiple column unique constraint

Question 1

I'm using PostgreSQL 9.3. I want to understand if I have an option to make a constraint unique across the entire table vs. unique across a subset of the table (i.e. by using 2 columns in the unique constraint, I restrict the uniqueness), which one is better for lookups?

Consider this table where a unique alphanumeric code is allotted to each student of the class.

CREATE TABLE sc_table (
 name text NOT NULL,
 code text NOT NULL,
 class_id integer NOT NULL,
 CONSTRAINT class_fk FOREIGN KEY (class_id) REFERENCES class (id),
 CONSTRAINT sc_uniq UNIQUE (code)
);

Currently the code is unique across the entire table. However the specification says that it is sufficient for the code to be unique across the class only. For my design requirements there's no restriction either way.

However if I change the constraint to be unique for a given class only, how would it affect lookup by code?

Or, in other words, which of the following combination of constraint & lookup is the best speed wise:

-- 1. unique across entire table, lookup by value
CONSTRAINT sc_uniq UNIQUE (code) 
SELECT * FROM sc_table WHERE code='alpha-2-beta'
-- 2. unique across entire table, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code) 
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta' 
-- 3. unique per class, lookup by value
CONSTRAINT sc_uniq UNIQUE (code, class_id) 
SELECT * FROM sc_table WHERE code='alpha-2-beta'
-- 4. unique per class, lookup by value & class
CONSTRAINT sc_uniq UNIQUE (code, class_id) 
SELECT * FROM sc_table WHERE class_id=1 AND code='alpha-2-beta'

Question: My understanding is that 2 is better than 1 & 4 is better than 3. But which one's better between 1-vs-3 & 2-vs-4?

Update : Adding output of explain analyze. 3 is bad because there's no index for the lookup. 2 seems to be the best but the table is too small to conclude that.

-- 1
"Index Scan using sc_uniq on sc_table (cost=0.15..8.17 rows=1 width=72) (actual time=0.041..0.044 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
"Total runtime: 0.096 ms"
-- 2
"Index Scan using sc_uniq on sc_table (cost=0.15..8.17 rows=1 width=72) (actual time=0.024..0.026 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
" Filter: (class_id = 1)"
"Total runtime: 0.056 ms"
-- 3
"Bitmap Heap Scan on sc_table2 (cost=4.18..12.64 rows=4 width=72) (actual time=0.052..0.053 rows=1 loops=1)"
" Recheck Cond: (code = 'code1'::text)"
" -> Bitmap Index Scan on sc_uniq2 (cost=0.00..4.18 rows=4 width=0) (actual time=0.039..0.039 rows=1 loops=1)"
" Index Cond: (code = 'code1'::text)"
"Total runtime: 0.121 ms"
-- 4
"Index Scan using sc_uniq2 on sc_table2 (cost=0.15..8.17 rows=1 width=72) (actual time=0.036..0.039 rows=1 loops=1)"
" Index Cond: ((code = 'code1'::text) AND (class_id = 1))"
"Total runtime: 0.093 ms"

Question 2

Your combinations in order of typical performance:

1. > 2. > 4. ( > 3.)

3. is invalid. If rows are only unique per (code, class_id), the lookup by code alone can return multiple rows and is different from the rest.

2. is pointless. If code is unique, there is no point in adding another predicate on class_id - except to verify that a given code actually belongs to a given class_id (and get no row otherwise).

Only 1. and 4. make sense and I would go with 1., of course. Unless you have additional requirements for the values of code, it's much more efficient to have one unique column. You could also make it the PK. Queries are simpler (one predicate instead of two), the (automatically created) unique index is potentially smaller (the most important factor here!), the lookup is typically slightly faster.

UPDATEs are also potentially more expensive for 2., where more columns trigger index updates. An UPDATE changing only code_id is cheaper for 1..

Your test result for 1. is counter-intuitive, maybe an artifact of your specific setup. Maybe you didn't prewarm the cache? Or some other random factor. It's pretty obvious from the EXPLAIN output: the only difference between 1. and 2. is the additional Filter: (class_id = 1) step. Nothing to gain here, you can only lose (even if very little in this case). 2. is typically a bit slower than 1. And 4. is also typically a bit slower than 1.

Question 3

For the case 2 above, won't an additional filter on class_id reduce the search space, and hence be better than 1 ?

Question 4

@user4150760: No. Nothing is faster than a single-column unique index. The relevant effect of including another column will be the growing index size, which makes everything a bit slower. Of course, if speed is essential, use a simple integer type PK instead of the slower string you display for code.

Question 5

Since you are on Postgres 9.3, you are missing asking one of the most important questions, which is which columns do you really need the query to return? Do you really need *? If not, you may consider designing an index that can perform an Index Only Scan and avoid visiting the table at all. This would be the primary reason in your case to go with a composite key. For instance, if you wanted to look up name by code, you may try out this index and query:

CREATE UNIQUE INDEX sc_lookup ON sc_table (code, name);
SELECT name FROM sc_table WHERE code = 'alpha-2-beta';

This would normally give you an Index Only Scan. There are more factors to consider as Erwin points out well, but the potential for an Index Only Scan can be one of the most significant performance benefits if your lookup scenario fits.

Question 6

According to the postgresql documentation here, single column indexes are preferred for performance reasons:

Multicolumn indexes should be used sparingly. In most situations, an index on a single column is sufficient and saves space and time.

So if you can get away with indexing one column, then do so. But I don't think it's that much more to index a combination of two columns, so if a situation calls for it don't shy away from it.

Question 7

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

score 1 · Accepted Answer · 2014-12-20 23:26:16Z

Your combinations in order of typical performance:

1. > 2. > 4. ( > 3.)

3. is invalid. If rows are only unique per (code, class_id), the lookup by code alone can return multiple rows and is different from the rest.

2. is pointless. If code is unique, there is no point in adding another predicate on class_id - except to verify that a given code actually belongs to a given class_id (and get no row otherwise).

Only 1. and 4. make sense and I would go with 1., of course. Unless you have additional requirements for the values of code, it's much more efficient to have one unique column. You could also make it the PK. Queries are simpler (one predicate instead of two), the (automatically created) unique index is potentially smaller (the most important factor here!), the lookup is typically slightly faster.

UPDATEs are also potentially more expensive for 2., where more columns trigger index updates. An UPDATE changing only code_id is cheaper for 1..

Your test result for 1. is counter-intuitive, maybe an artifact of your specific setup. Maybe you didn't prewarm the cache? Or some other random factor. It's pretty obvious from the EXPLAIN output: the only difference between 1. and 2. is the additional Filter: (class_id = 1) step. Nothing to gain here, you can only lose (even if very little in this case). 2. is typically a bit slower than 1. And 4. is also typically a bit slower than 1.

For the case 2 above, won't an additional filter on class_id reduce the search space, and hence be better than 1 ?
@user4150760: No. Nothing is faster than a single-column unique index. The relevant effect of including another column will be the growing index size, which makes everything a bit slower. Of course, if speed is essential, use a simple integer type PK instead of the slower string you display for code.

Stack Exchange Network

Performance of single vs multiple column unique constraint

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Performance of single vs multiple column unique constraint

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions