Postgres: Are secondary indexes included in ACID?

Question 1

Are non-unique indexes / indices covered under the Consistency clause in aCid? (same for other attributes of an index that do not place constraints on the data) I am seeing certain performance issues (benefits, actually) in Postgres that make me wonder if they are.

Given that indexes / indices are not first-class objects (i.e. you can't access them directly in Postgres, nor can you request their use), I see no reason at all why Postgres would be REQUIRED to support this. I can find no definition of ACID that says "indexes have to absolutely complete and not be hacked up before the transaction can finish".

Under certain conditions that do not place restrictions on the insert (such as the index not being unique), the index could essentially be "invalidated" (i.e. "don't use it until I'm finished reindexing"), or flags can be set that says "the index does not cover the following ranges".

If Postgres played this trick, the copy from operator could be made exceedingly swift (which is what I am seeing), similarly for massive insert counts in a transaction.

I'm not just making this up...

While Redshift is a bad example, Amazon weasels out of Consistency by playing tricks with how it stores the (one and only) sort key (essentially a primary index-ish construct in Redshift). Until one performs a vacuum command, the primary key just keeps getting worse and worse and your database starts becoming a black hole: queries go in, but no results come out.

Clearly, an internalized vacuum regimen would prevent the Redshift silliness that often occurs during mass imports.

Question 2

The name is "PostgreSQL" or "Postgres" for short. Not "PostGres" and not "PSQL". And I do not understand what you are asking exactly.

Question 3

@ErwinBrandstetter Thanks for the correction! The question was if indices are guaranteed to be up-to-date after an massive insert in Postgres, or if they're just caponed until they have completed reindexing. Consider a case where an insert changes a portion of the index: the transaction could then be held off until complete reindexing, or the index could be invalidated for at least a portion of its range: aside from queries (hopefully) using that index, the only thing seen should be a slowdown in queries: the results would still be consistent.

Question 4

Note that there is no technical difference between the primary key index and any other index on a table. The term "secondary index" doesn't really apply (or make sense) in Postgres

Question 5

Are non-unique indexes / indices covered under the Consistency clause in aCid?

Yes. Any violation of that would be considered a bug in PostgreSQL.

The docs you quoted are cases where postgres might have to temporarily scan the heap instead of doing an index-only scan or otherwise do extra work to get a consistent result.

For example, both BRIN and GIN indexes accumulate batches of pending changes and then do batch updates. When the indexes are used in queries this queue is also scanned to make sure that a current, up-to-date and consistent view is seen.

If an index is currently invalid it'll be skipped by the planner and won't be used by queries.

Redshift isn't really PostgreSQL, it just happens to share the same front-end and protocol. Drawing comparisons based on Redshift will typically just create confusion.

Question 6

Cool. thecobvious followup wod be, with the same level of performance? s

Question 7

ugh, hit return too soon. Same level of perfoemance, as opposed to, say, bifurcating the index until the next vacuum?

Question 8

Postgres won't ever duplicate an index like that for normal operations. About the only things that do that are operations that perform full-table rewrites or full index rewrites, like vacuum full, cluster, some forms of alter table, etc. Indexes are MVCC like the heap, so they can accumulate dead tuples. This is also true for unique indexes, though, since postgres always uses heaps, it doesn't have index-organised ("clustered") tables.

Question 9

@MarkGerolimatos Redshift isn't PostgreSQL. Its backend is completely different.

Question 10

@MarkGerolimatos Bulk inserts will grow an index (more I/O) and cause I/O contention. They'll also tend to contend for cache memory. Bulk updates will bloat the index, creating dead pages that may still need to be read. Dead pages on the heap too. So slowdowns are expected, but are not going to be related to locking. Look up the MVCC chapter in the PostgreSQL docs. And yes, invalid indexes may be skipped over, but indexes do not become invalid under normal conditions.

Craig Ringer Craig Ringer 57.9k6 gold badges162 silver badges193 bronze badges · Answer 1 · 2017-05-23 02:37:53Z

5

Are non-unique indexes / indices covered under the Consistency clause in aCid?

Yes. Any violation of that would be considered a bug in PostgreSQL.

The docs you quoted are cases where postgres might have to temporarily scan the heap instead of doing an index-only scan or otherwise do extra work to get a consistent result.

For example, both BRIN and GIN indexes accumulate batches of pending changes and then do batch updates. When the indexes are used in queries this queue is also scanned to make sure that a current, up-to-date and consistent view is seen.

If an index is currently invalid it'll be skipped by the planner and won't be used by queries.

Redshift isn't really PostgreSQL, it just happens to share the same front-end and protocol. Drawing comparisons based on Redshift will typically just create confusion.

Share

Improve this answer

edited May 23, 2017 at 2:45

answered May 23, 2017 at 2:37

Craig Ringer's user avatar

Craig Ringer Craig Ringer

57.9k6 gold badges162 silver badges193 bronze badges

7

Cool. thecobvious followup wod be, with the same level of performance? s

Mark Gerolimatos
– Mark Gerolimatos

2017年05月23日 02:43:54 +00:00
Commented May 23, 2017 at 2:43
ugh, hit return too soon. Same level of perfoemance, as opposed to, say, bifurcating the index until the next vacuum?

Mark Gerolimatos
– Mark Gerolimatos

2017年05月23日 02:44:55 +00:00
Commented May 23, 2017 at 2:44
1

Postgres won't ever duplicate an index like that for normal operations. About the only things that do that are operations that perform full-table rewrites or full index rewrites, like vacuum full, cluster, some forms of alter table, etc. Indexes are MVCC like the heap, so they can accumulate dead tuples. This is also true for unique indexes, though, since postgres always uses heaps, it doesn't have index-organised ("clustered") tables.

Craig Ringer
– Craig Ringer

2017年05月23日 02:46:15 +00:00
Commented May 23, 2017 at 2:46
2

@MarkGerolimatos Redshift isn't PostgreSQL. Its backend is completely different.

Craig Ringer
– Craig Ringer

2017年05月23日 02:53:13 +00:00
Commented May 23, 2017 at 2:53
2

@MarkGerolimatos Bulk inserts will grow an index (more I/O) and cause I/O contention. They'll also tend to contend for cache memory. Bulk updates will bloat the index, creating dead pages that may still need to be read. Dead pages on the heap too. So slowdowns are expected, but are not going to be related to locking. Look up the MVCC chapter in the PostgreSQL docs. And yes, invalid indexes may be skipped over, but indexes do not become invalid under normal conditions.

Craig Ringer
– Craig Ringer

2017年05月23日 03:17:04 +00:00
Commented May 23, 2017 at 3:17

| Show 2 more comments

Stack Exchange Network

Postgres: Are secondary indexes included in ACID?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Postgres: Are secondary indexes included in ACID?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions