I have a large PostgreSQL database over 500GB in size which is too large. Is there anyway to compress the database down to a more manageable size? I have attempted to do this with SquashFS and the database compressed down to 177GB however PostgreSQL requires that the database have write access and Squashed systems are Read-Only. Do more experienced database users have any suggestions to accomplish this goal?
The database holds GIS data for the planet and will be used locally on a deployed system. Currently it sits on a 1TB SSD, however, I am trying to avoid slapping in an additional hard drive simply to accommodate a large database. The database performs as desired with no issue, I would simply like to compress it down to a more manageable size and avoid placing it on a separate drive.
-
1Why is it too large? If you are running out of disk space, then you need to add more disk to the system. If there's data that's no longer needed, you should find a way to archive it. Otherwise, you presumably have a performance problem that may or may not be helped by a smaller DB.RDFozz– RDFozz2018年02月05日 21:23:03 +00:00Commented Feb 5, 2018 at 21:23
-
3The database holds GIS data for the planet and will be used locally on a deployed system. Currently it sits on a 1TB SSD however, i am trying to avoid slapping in an additional hard drive simply to accommodate a large database. The database performs as desired with no issue, I would simply like to compress it down to a more manageable size and avoid placing it on a separate drive.MH21209– MH212092018年02月05日 21:42:22 +00:00Commented Feb 5, 2018 at 21:42
-
7This is a pretty reasonable question. It shouldn't be Closed.Evan Carroll– Evan Carroll2018年02月06日 00:45:24 +00:00Commented Feb 6, 2018 at 0:45
-
1Worth mentioning that PostgreSQL 14 supports LZ4 in core. blog.dbi-services.com/postgresql-14-lz4-compression-for-toastNemo– Nemo2021年10月19日 22:31:52 +00:00Commented Oct 19, 2021 at 22:31
2 Answers 2
File system
A very popular method of doing this is with the file system. BTRFS and ZFS works under the database at the file system level. Both can be used in a loopback device so you can provide a compressed tablespace without having another partition. There are caveats with this, if the tablespace fails it may take your cluster too.
ZFS
ZFS is the big one here. It's what I would go for.
See also
- https://blog.2ndquadrant.com/pg-phriday-postgres-zfs/
- https://blogs.oracle.com/observatory/zfs-compression-a-win-win (popular on Oracle too)
Btrfs
Btrfs is a strong contender but it's been in active development for a very long time, and the lack of major distros picking it up as a default has many people questioning whether or not it's ready for "prime time."
PostgreSQL
GIS Methods (PostGIS)
For Polygons, one method is to simplify polygons by dropping vertices with
ST_Simplify
.For Points, one method is spatial clustering.
Both of these result in a loss of information. PostGIS, like most of the features of the database, doesn't have a transparent "magic compression" option.
cstore_fdw
There is also cstore_fdw
which is a columnar store that offers compression. It has a different performance profile so ymmv.
-
3This is exactly what i was looking for. I am currently re-deploying my database on top of ZFS. Thank You.MH21209– MH212092018年02月06日 21:50:43 +00:00Commented Feb 6, 2018 at 21:50
-
In my random experience with importing one PostgreSQL dump (about 1.2 billion rows) into a database with LZ4 compression and wal_compression enabled, and otherwise default settings for PosgreSQL 14, using btrfs with zstd level 3 or 7 compression on some HDD was about as fast as ext4 on SSD. (I only tested on OVH.)Nemo– Nemo2021年10月21日 10:10:01 +00:00Commented Oct 21, 2021 at 10:10
-
@Nemo "With LZ4 compression", at which layer do you do this compression? Are the columns lz4-compressed? Then, is it still possible to have fast prefix search queries? (My question about this: stackoverflow.com/questions/74199484/…)Basj– Basj2022年10月30日 15:50:45 +00:00Commented Oct 30, 2022 at 15:50
-
@Basj I used lz4 compression in psql settings postgresql.org/docs/current/… , then zstd compression at the fs level for whatever psql didn't compress. I'm afraid I didn't test query performance extensively. The queries I tried were faster than before trying compression, because they weren't capped by I/O.Nemo– Nemo2022年11月03日 22:08:17 +00:00Commented Nov 3, 2022 at 22:08
-
@Nemo If I start with a fresh new PostgreSQL install, what are the steps to enable compression for a particular database/table?Basj– Basj2022年11月05日 11:28:41 +00:00Commented Nov 5, 2022 at 11:28
The anwer by Evan lists most of the methods. But since your data has GIS, it's probably sensor-like data that comes in over time, i.e. time-series data. In that case, you may want to consider TimescaleDB because time-based compression will get you the best compression ratio and probably also the best query speed.
You can find a more technical description here
Disclaimer: I work for Timescale and helped develop this feature.
-
1Please read my comment here - does it apply to you? In your first two answers on this forum, you have proposed TimescaleDB as a solution - nothing wrong with that, but you may want to be clear about your relationship with the company/product!Vérace– Vérace2024年02月29日 13:57:05 +00:00Commented Feb 29, 2024 at 13:57
-
1That's quite a logical leap from GIS to time-series data.mustaccio– mustaccio2024年02月29日 14:30:19 +00:00Commented Feb 29, 2024 at 14:30
-
@mustaccio I've rarely seen GIS used for static locations or fact tables, rather it's often moving objects that change locations over time (maybe that's just my bias). If things move, and you are tracking the history, that's a time-series dataset imho.cevian– cevian2024年02月29日 15:08:49 +00:00Commented Feb 29, 2024 at 15:08