16

I have a large PostgreSQL database over 500GB in size which is too large. Is there anyway to compress the database down to a more manageable size? I have attempted to do this with SquashFS and the database compressed down to 177GB however PostgreSQL requires that the database have write access and Squashed systems are Read-Only. Do more experienced database users have any suggestions to accomplish this goal?

The database holds GIS data for the planet and will be used locally on a deployed system. Currently it sits on a 1TB SSD, however, I am trying to avoid slapping in an additional hard drive simply to accommodate a large database. The database performs as desired with no issue, I would simply like to compress it down to a more manageable size and avoid placing it on a separate drive.

ypercubeTM
99.7k13 gold badges217 silver badges306 bronze badges
asked Feb 5, 2018 at 20:40
4
  • 1
    Why is it too large? If you are running out of disk space, then you need to add more disk to the system. If there's data that's no longer needed, you should find a way to archive it. Otherwise, you presumably have a performance problem that may or may not be helped by a smaller DB. Commented Feb 5, 2018 at 21:23
  • 3
    The database holds GIS data for the planet and will be used locally on a deployed system. Currently it sits on a 1TB SSD however, i am trying to avoid slapping in an additional hard drive simply to accommodate a large database. The database performs as desired with no issue, I would simply like to compress it down to a more manageable size and avoid placing it on a separate drive. Commented Feb 5, 2018 at 21:42
  • 7
    This is a pretty reasonable question. It shouldn't be Closed. Commented Feb 6, 2018 at 0:45
  • 1
    Worth mentioning that PostgreSQL 14 supports LZ4 in core. blog.dbi-services.com/postgresql-14-lz4-compression-for-toast Commented Oct 19, 2021 at 22:31

2 Answers 2

12

File system

A very popular method of doing this is with the file system. BTRFS and ZFS works under the database at the file system level. Both can be used in a loopback device so you can provide a compressed tablespace without having another partition. There are caveats with this, if the tablespace fails it may take your cluster too.

ZFS

ZFS is the big one here. It's what I would go for.

See also

Btrfs

Btrfs is a strong contender but it's been in active development for a very long time, and the lack of major distros picking it up as a default has many people questioning whether or not it's ready for "prime time."

PostgreSQL

GIS Methods (PostGIS)

Both of these result in a loss of information. PostGIS, like most of the features of the database, doesn't have a transparent "magic compression" option.

cstore_fdw

There is also cstore_fdw which is a columnar store that offers compression. It has a different performance profile so ymmv.

answered Feb 6, 2018 at 0:44
5
  • 3
    This is exactly what i was looking for. I am currently re-deploying my database on top of ZFS. Thank You. Commented Feb 6, 2018 at 21:50
  • In my random experience with importing one PostgreSQL dump (about 1.2 billion rows) into a database with LZ4 compression and wal_compression enabled, and otherwise default settings for PosgreSQL 14, using btrfs with zstd level 3 or 7 compression on some HDD was about as fast as ext4 on SSD. (I only tested on OVH.) Commented Oct 21, 2021 at 10:10
  • @Nemo "With LZ4 compression", at which layer do you do this compression? Are the columns lz4-compressed? Then, is it still possible to have fast prefix search queries? (My question about this: stackoverflow.com/questions/74199484/…) Commented Oct 30, 2022 at 15:50
  • @Basj I used lz4 compression in psql settings postgresql.org/docs/current/… , then zstd compression at the fs level for whatever psql didn't compress. I'm afraid I didn't test query performance extensively. The queries I tried were faster than before trying compression, because they weren't capped by I/O. Commented Nov 3, 2022 at 22:08
  • @Nemo If I start with a fresh new PostgreSQL install, what are the steps to enable compression for a particular database/table? Commented Nov 5, 2022 at 11:28
-1

The anwer by Evan lists most of the methods. But since your data has GIS, it's probably sensor-like data that comes in over time, i.e. time-series data. In that case, you may want to consider TimescaleDB because time-based compression will get you the best compression ratio and probably also the best query speed.

You can find a more technical description here

Disclaimer: I work for Timescale and helped develop this feature.

answered Feb 29, 2024 at 13:37
3
  • 1
    Please read my comment here - does it apply to you? In your first two answers on this forum, you have proposed TimescaleDB as a solution - nothing wrong with that, but you may want to be clear about your relationship with the company/product! Commented Feb 29, 2024 at 13:57
  • 1
    That's quite a logical leap from GIS to time-series data. Commented Feb 29, 2024 at 14:30
  • @mustaccio I've rarely seen GIS used for static locations or fact tables, rather it's often moving objects that change locations over time (maybe that's just my bias). If things move, and you are tracking the history, that's a time-series dataset imho. Commented Feb 29, 2024 at 15:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.