We have a PostgreSQL database that has grown significantly in size recently, from about 340GB to 571GB over the past couple of months, and we are not tracking any significant change in user behavior over that time. Our primary DBA has made a couple of recommendations, with his chief recommendation being to export the entire database and then re-import it, which from his tests on a second server cloned from our primary requires about 3 hours of downtime, and gets the size down to only 300GB.
My two main areas of concern would be finding out where this significant growth is coming from (using du -h I can at least see it's in the /data directory with no significant growth in tablespace or pg_wal), and understanding just how importing and exporting the database can get us almost 300GB of space recovery without actually losing any production data.
2 Answers 2
The first thing I'd do is change into the data directory and run
du -sk *
This will show you in which of the subdirectories a lot of disk space is used. You can drill down by descending deeper and repeating the command.
Typically, the increase in disk usage comes from one of two causes:
WAL in
pg_wal
cannot be removed. This could be because the archiver has a problem (look atpg_stat_archiver
) or you have a stale replication slot (look atpg_replication_slots
).Some tables or indexes are bloated.
If you created a copy of the database with
pg_dump
/restore, you are halfway to the solution. Run something like this on both databases:SELECT oid::regclass AS object, relkind, pg_relation_size(oid) AS size FROM pg_class ORDER BY size DESC;
Compare the output on both sides an watch for tables and indexes that are considerably larger on the original database.
Fix the bloat by examining the possible causes. Once you have done that, get rid of the bload with
VACUUM (FULL)
(attention, this requires down time).
-
Doesn't vacuum full only restore space from the tablespace directory? We did find one old archive schema that had significant size, and ran a VACUUM FULL on that schema only, and that did restore 30GB of data, but only from the tablespace directory, it didn't reduce any space from the data directory.awestover89– awestover892022年07月05日 13:26:58 +00:00Commented Jul 5, 2022 at 13:26
-
Use
du
to figure out where the disk space is used.Laurenz Albe– Laurenz Albe2022年07月05日 13:35:23 +00:00Commented Jul 5, 2022 at 13:35 -
@awestover89 The tablespace directory is usually just a subdirectory of the data directory. You can do things differently, but if you have you should explain what you did and why and where the various space measurements you reported came from.jjanes– jjanes2022年07月05日 19:58:32 +00:00Commented Jul 5, 2022 at 19:58
Ultimately, we used the following to determine the issue:
We did an export and import onto a test database server so we had a copy of the database at it's full size, and a copy of the database at the smaller, post import size.
We then ran the following query to identify the largest tables:
SELECT schemaname as table_schema, relname as table_name,
pg_size_pretty(pg_relation_size(relid)) as data_size
FROM pg_catalog.pg_stat_all_tables
ORDER BY pg_relation_size(relid) DESC;
This clearly showed that on the main system the table pg_catalog.pg_largeobject was just over 200GB while on the test system following the export and import it was 0 bytes.
We are now working on a plan to better manage the growth of pg_largeobject.
Explore related questions
See similar questions with these tags.
du
, but hard for us to guess.