git.postgresql.org Git - postgresql.git/commitdiff

git projects / postgresql.git / commitdiff

Fix and clarify comments on replacement selection.

author Heikki Linnakangas <heikki.linnakangas@iki.fi>

2016年9月15日 08:51:43 +0000 (11:51 +0300)

committer Heikki Linnakangas <heikki.linnakangas@iki.fi>

2016年9月15日 08:51:43 +0000 (11:51 +0300)

These were modified by the patch to only use replacement selection for the
first run in an external sort.

src/backend/utils/sort/tuplesort.c patch | blob | blame | history

diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c

index d600670d26d368b24bb4652c47a04a0b0a1ba44e..16ceb30b2737f97e417890ec976905f9302ac31a 100644 (file)

--- a/src/backend/utils/sort/tuplesort.c

+++ b/src/backend/utils/sort/tuplesort.c

@@ -13,26 +13,26 @@

* See Knuth, volume 3, for more than you want to know about the external

* sorting algorithm. Historically, we divided the input into sorted runs

* using replacement selection, in the form of a priority tree implemented

- * as a heap (essentially his Algorithm 5.2.3H -- although that strategy is

- * often avoided altogether), but that can now only happen first the first

- * run. We merge the runs using polyphase merge, Knuth's Algorithm

+ * as a heap (essentially his Algorithm 5.2.3H), but now we only do that

+ * for the first run, and only if the run would otherwise end up being very

+ * short. We merge the runs using polyphase merge, Knuth's Algorithm

* 5.4.2D. The logical "tapes" used by Algorithm D are implemented by

* logtape.c, which avoids space wastage by recycling disk space as soon

* as each block is read from its "tape".

- * We never form the initial runs using Knuth's recommended replacement

- * selection data structure (Algorithm 5.4.1R), because it uses a fixed

- * number of records in memory at all times. Since we are dealing with

- * tuples that may vary considerably in size, we want to be able to vary

- * the number of records kept in memory to ensure full utilization of the

- * allowed sort memory space. So, we keep the tuples in a variable-size

- * heap, with the next record to go out at the top of the heap. Like

- * Algorithm 5.4.1R, each record is stored with the run number that it

- * must go into, and we use (run number, key) as the ordering key for the

- * heap. When the run number at the top of the heap changes, we know that

- * no more records of the prior run are left in the heap. Note that there

- * are in practice only ever two distinct run numbers, due to the greatly

- * reduced use of replacement selection in PostgreSQL 9.6.

+ * We do not use Knuth's recommended data structure (Algorithm 5.4.1R) for

+ * the replacement selection, because it uses a fixed number of records

+ * in memory at all times. Since we are dealing with tuples that may vary

+ * considerably in size, we want to be able to vary the number of records

+ * kept in memory to ensure full utilization of the allowed sort memory

+ * space. So, we keep the tuples in a variable-size heap, with the next

+ * record to go out at the top of the heap. Like Algorithm 5.4.1R, each

+ * record is stored with the run number that it must go into, and we use

+ * (run number, key) as the ordering key for the heap. When the run number

+ * at the top of the heap changes, we know that no more records of the prior

+ * run are left in the heap. Note that there are in practice only ever two

+ * distinct run numbers, because since PostgreSQL 9.6, we only use

+ * replacement selection to form the first run.

* In PostgreSQL 9.6, a heap (based on Knuth's Algorithm H, with some small

* customizations) is only used with the aim of producing just one run,

This is the main PostgreSQL git repository.

RSS Atom