I have table with one billion rows and more than 50 columns. I need to reduce size and speed up queries, backup, exports, etc. Some columns contain f.e. only hundreds of distinct values which are long URLs (text data type), used application names and similar duplicate information.
Is there some PG tool, script for PostgreSQL 9.3+ which can easily for selected columns create dictionaries of distinct values to other tables and after that update original values with SmallInt identificator from that dictionary? Do I have to write SQL for that manually?
TableOriginal
1;VeryLongURLText
2;VeryLongURLText
3;LoooongURLText
4;LoooongURLText
5;LoooongURLText
TableDictionary
1;VeryLongURLText
2;LoooongURLText
TableUpdated
1;1
2;1
3;2
3;2
3;2
Thank you.
-
postgresql.org/docs/9.5/static/sql-createdomain.html is Create domain the function which should be used?dkocich– dkocich2016年05月19日 07:30:57 +00:00Commented May 19, 2016 at 7:30
-
1A domain is a kind of data type. It's a short hand for commonly used column types where you can e.g. enforce check constraints that should be applied to all columns that store the same type of information.user1822– user18222016年05月19日 08:11:18 +00:00Commented May 19, 2016 at 8:11
-
For a billion rows you will better create all of your dictionaries first, and then make a copy of your original table using all of the dictionaries. Also, consider using smallint (i.e. int2) rather than int4 for dictionaries with few expected values.Ezequiel Tolnay– Ezequiel Tolnay2016年05月19日 08:55:03 +00:00Commented May 19, 2016 at 8:55
1 Answer 1
Do I have to write SQL for that manually?
Yes, but it's not that hard:
create table original (id integer, url text);
insert into original
values
(1,'VeryLongURLText'),
(2,'VeryLongURLText'),
(3,'LoooongURLText'),
(4,'LoooongURLText'),
(5,'LoooongURLText');
create the dictionary
create table dictionary (id serial, url text);
insert into dictionary (url)
select distinct url
from original;
This creates the table with the following content:
id | data
---+----------------
1 | LoooongURLText
2 | VeryLongURLText
Now create a new table based on the dictionary:
create table compressed
as
select o.id, o.some_column, o.other_column, d.id as dictionary_id
from original o
join dictionary d on o.url = d.url;
As your goal is to reduce the space overhead it's better to create new table with the dictionary id rather then alter
ing the existing one. This will also be a lot faster then updating all rows from the existing table (with a billion rows this will however still take some time)
-
In your
compressed
table you still have the URL. You should not useo.*
but name all fields except theurl
field. @DavidK Do not forget to change your applications to use this new structure.Marco– Marco2016年05月19日 11:29:48 +00:00Commented May 19, 2016 at 11:29 -
@marco: you are right, corrected ;)user1822– user18222016年05月19日 11:35:22 +00:00Commented May 19, 2016 at 11:35
Explore related questions
See similar questions with these tags.