I created the following table and inserted some values into it as follows:
CREATE TABLE query_all_lexeme (
payload text,
normalized tsvector GENERATED ALWAYS AS (to_tsvector('english', payload)) STORED
);
INSERT INTO query_all_lexeme (payload)
VALUES ('fat cats ate rats');
INSERT INTO query_all_lexeme (payload)
VALUES ('summarize the functions and operators that are provided for full text searching');
INSERT INTO query_all_lexeme (payload)
VALUES ('Constructs a phrase query');
INSERT INTO query_all_lexeme (payload)
SELECT
'Constructs a phrase query this is a test'
FROM
generate_series(1, 10000);
INSERT INTO query_all_lexeme (payload)
SELECT
'Constructs a phrase query this is a test'
FROM
generate_series(1, 100);
Then I created a gin index:
CREATE INDEX query_all_lexeme_vector ON query_all_lexeme USING gin (normalized);
And then I run the query below to get gin index info:
SELECT * FROM
gin_metapage_info (get_raw_page ('query_all_lexeme_vector', 0)) \gx
Result:
+-[ RECORD 1 ]-----+------------+
| pending_head | 4294967295 |
| pending_tail | 4294967295 |
| tail_free_size | 0 |
| n_pending_pages | 0 |
| n_pending_tuples | 0 |
| n_total_pages | 14 |
| n_entry_pages | 1 |
| n_data_pages | 12 |
| n_entries | 15 |
| version | 2 |
+------------------+------------+
WITH cte AS (
SELECT
flags,
p
FROM
generate_series(1, 13) AS p,
gin_page_opaque_info (get_raw_page ('query_all_lexeme_vector', p)))
SELECT
array_agg(p)
FROM
cte
WHERE
flags::text = '{data,leaf,compressed}';
return
+----------------------+
| array_agg |
+----------------------+
| {3,4,6,7,9,10,12,13} |
+----------------------+
In the following query, I should expect at least one row for the column gin_tid_vs_table_tid
value that is true. However, the column values are all false.
WITH cte AS (
SELECT
(unnest(normalized)).lexeme AS elements
, array_agg(ctid) AS ctids
FROM
query_all_lexeme
GROUP BY
1
)
SELECT
elements
, pg_typeof(ctids)
, ctids = (
SELECT
tids
FROM
gin_leafpage_items (get_raw_page ('query_all_lexeme_vector' , 3))
ORDER BY
1
LIMIT 1) AS gin_tid_vs_table_tid
FROM
cte;
I have already run vacuum analyze. Now the data is very stable (only SELECT
), and the tid
value is stabilized. So why the last query's column gin_tid_vs_table_tid
has false values?
My logic is like gin
index stored lexemes
and lexemes
' corresponding tid
(physical tuple location). So the gin
index tid
should be equal as array_agg(ctid)
with the same lexeme
.
1 Answer 1
Your logic would only apply if one index entry were guaranteed to contain all the ctids for a given lexeme. There is no such guarantee, and can't be because index tuples have a strictly bounded size which is far less than enough to contain all possible ctids.
Maybe you could switch from array equality to overlaps or contains (&&
, @>
).
-
but the flags is compressed? also in the menu: A GIN index stores a set of (key, posting list) pairs, where a posting list is a set of row IDs in which the key occurs. The same row ID can appear in multiple posting lists, since an item can contain more than one key. Each key value is stored only once, so a GIN index is very compact for cases where the same key appears many times.jian– jian2022年06月27日 04:36:39 +00:00Commented Jun 27, 2022 at 4:36