Decrease size of database (with an expression index)?

Question 1

This is my current table definition in a Postgres 10.1-1 database:

CREATE TYPE CUSTOMER_TYPE AS ENUM
('enum1', 'enum2', 'enum3', '...', 'enum15'); -- max length of enum names ~15
CREATE TABLE CUSTOMER(
 CUSTOMER_ONE TEXT PRIMARY KEY NOT NULL, -- max 35 char String
 ATTRIBUTE_ONE TEXT UNIQUE, -- max 35 char String
 ATTRIBUTE_TWO TEXT, -- 1-80 char String
 PRIVATEKEYTYPE CUSTOMER_TYPE -- see enum
);

It results in about 4.3x more database size compared to the size of the inserted data. (50 MB, 700.000 lines --> database size is 210 MB)

Attribute_One is computed as hash(Customer_One).

Requirements: fast searches (using algorithms) for columns CUSTOMER_ONE and ATTRIBUTE_ONE. (That's why I think I need an index.)

Typical search query:

select * from customer
where Customer_One='XXX' OR Attribute_One='XXX';

Each SELECT can find a maximum of 1 or 0 matching rows in millions of rows.

Is it possible to further decrease the DB size? I have been told to use an expression index but don't fully understand how this works. A short explanation with an example index or other solution would be great

Is the insert speed effected by those indexes? The faster the better. (To be clear: search speed is more important than insert speed.)

Question 2

You omitted important details. In a comment to your your previous question you mentioned: Attribute_One is calculated from Customer_One. How exactly? And what "algorithms" do you use in your "fast searches"? Show a typical example query. Also: your version of Postgres, please.

Question 3

Attribute_One = Hash(Customer_One); I dont use any specific search algorithms, I want Postgres to use them (and be able to do it). Each select can find a maximum of 1 or 0 matching rows in millions of rows. PG version v.10.1-1. Typical search: select * from customer where Customer_One='XXX' OR Attribute_One='XXX'. Thank you :)

Question 4

All defining information should go into your question, not comments.

Question 5

Done! Sorry, wasnt aware of that.

Question 6

If hash() is an IMMUTABLE function (which should be the case for a function called "hash"!) you can omit storing the functionally dependent attribute_one in the table altogether and add an expression index to support queries on the expression hash(customer_one):

CREATE TABLE customer (
 privatekeytype customer_type -- move the enum to 1st pos to save some more 
 , customer_one text PRIMARY KEY
 , attribute_two text
);

Expression index:

CREATE INDEX customer_attribute_one_idx ON customer (hash(customer_one));

This is exactly as big (identical) as the index supporting your original UNIQUE constraint on the redundant column attribute_one.

Query:

SELECT *
FROM customer 
WHERE 'XXX' IN (customer_one, hash(customer_one));

Testing with EXPLAIN you'll see index or bitmap index scans like:

-> BitmapOr (cost=5.34..5.34 rows=5 width=0)
 -> Bitmap Index Scan on customer_pkey (cost=0.00..2.66 rows=1 width=0)
 Index Cond: ('XXX'::text = customer.customer_one)
 -> Bitmap Index Scan on customer_attribute_one_idx (cost=0.00..2.68 rows=4 width=0)
 Index Cond: ('XXX'::text = hash(customer.customer_one))

About the same performance as with the redundant table column or faster since the table is smaller, yet - which helps overall performance in various ways.

Moving the enum to first position saves a few bytes of alignment padding per row as explained in my previous answer:

Why is my database 12 times bigger than inserted data?

Why does the function have to be IMMUTABLE? See:

score 1 · Accepted Answer · 2017-12-15 13:59:18Z

If hash() is an IMMUTABLE function (which should be the case for a function called "hash"!) you can omit storing the functionally dependent attribute_one in the table altogether and add an expression index to support queries on the expression hash(customer_one):

CREATE TABLE customer (
 privatekeytype customer_type -- move the enum to 1st pos to save some more 
 , customer_one text PRIMARY KEY
 , attribute_two text
);

Expression index:

CREATE INDEX customer_attribute_one_idx ON customer (hash(customer_one));

This is exactly as big (identical) as the index supporting your original UNIQUE constraint on the redundant column attribute_one.

Query:

SELECT *
FROM customer 
WHERE 'XXX' IN (customer_one, hash(customer_one));

Testing with EXPLAIN you'll see index or bitmap index scans like:

-> BitmapOr (cost=5.34..5.34 rows=5 width=0)
 -> Bitmap Index Scan on customer_pkey (cost=0.00..2.66 rows=1 width=0)
 Index Cond: ('XXX'::text = customer.customer_one)
 -> Bitmap Index Scan on customer_attribute_one_idx (cost=0.00..2.68 rows=4 width=0)
 Index Cond: ('XXX'::text = hash(customer.customer_one))

About the same performance as with the redundant table column or faster since the table is smaller, yet - which helps overall performance in various ways.

Moving the enum to first position saves a few bytes of alignment padding per row as explained in my previous answer:

Why is my database 12 times bigger than inserted data?

Why does the function have to be IMMUTABLE? See:

Stack Exchange Network

Decrease size of database (with an expression index)?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Decrease size of database (with an expression index)?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions