1

I have finally gotten a Purpose Built Database machine for a project i'm working on.

The server is a 4 Core Xeon with 64GB of Ram and a Raid10 of 10krpm drives.

I have just got the database moved over to the new machine; the performance with the same code are worse than when it was running on a VM.

I'm looking for suggestions on what settings to adjust to what values.

Currently, I've upped shared_buffers to 60GB and the kernel settings needed to make that change.

temp_buffers is 32MB

work_mem is 5MB

I'm working on doing some stuff I'd like to get loaded in quickly so I have synchronous_commit set to off.

Can anyone point me in the right direction as to how to improve the speed? I had it running quite a bit faster on a slower machine with much less memory and drives shared with the machines that were making calls to it, so I'm not really sure what the issues are.

Update: 2013年03月06日 Performance is falling off a cliff shortly after a run starts. Not sure what to do. Settings

shared_buffers = 12GB
temp_buffers=32MB
work_mem = 5MB
maintenance_work_mem = 256MB
fsync = off
synchronous_commit = off
wal_buffers = 32MB
checkpoint_segments = 256
checkpoint_completion_target = .9
effective_cache_size 50GB
auto_vacuum = on
autovacuum_naptime = 1min

The task is a long script that's taking data from a copied in table and normalizing it into the database. So big reads occasionally to pick up 1000 rows or more, then lots of little reads to de-duplicate the record and find IDs etc, then some inserts along the way that are needed, and finally lots of inserts at the end. Then Repeat.

Any Suggestions? or ideas what's falling off? This is one of my slower queries, I'd love ideas of how to speed it up.

EXPLAIN (ANALYZE, BUFFERS) select provider_id, count(list_alias.name_part_id)
from list_alias
where provider_id in (1,4,5,6,7,8)
and name_part_id in (5,7,8,3,111)
group by provider_id
order by count(list_alias.name_part_id) desc
limit(1)

The output.

"Limit (cost=31.62..31.62 rows=1 width=8) (actual time=0.157..0.157 rows=0 loops=1)"
" Buffers: shared hit=17 read=1"
" -> Sort (cost=31.62..31.62 rows=1 width=8) (actual time=0.153..0.153 rows=0 loops=1)"
" Sort Key: (count(name_part_id))"
" Sort Method: quicksort Memory: 25kB"
" Buffers: shared hit=17 read=1"
" -> GroupAggregate (cost=0.00..31.61 rows=1 width=8) (actual time=0.147..0.147 rows=0 loops=1)"
" Buffers: shared hit=17 read=1"
" -> Index Scan using "list_alias provider_id" on list_alias (cost=0.00..31.59 rows=1 width=8) (actual time=0.146..0.146 rows=0 loops=1)"
" Index Cond: (provider_id = ANY ('{1,4,5,6,7,8}'::integer[]))"
" Filter: (name_part_id = ANY ('{5,7,8,3,111}'::integer[]))"
" Buffers: shared hit=17 read=1"
"Total runtime: 0.238 ms"

Edit2: More Info:

"application_name";"pgAdmin III - Query Tool";"client"
"autovacuum";"on";"configuration file"
"autovacuum_naptime";"1min";"configuration file"
"checkpoint_completion_target";"0.9";"configuration file"
"checkpoint_segments";"256";"configuration file"
"DateStyle";"ISO, MDY";"configuration file"
"default_text_search_config";"pg_catalog.english";"configuration file"
"effective_cache_size";"50GB";"configuration file"
"external_pid_file";"/var/run/postgresql/9.2-main.pid";"configuration file"
"fsync";"off";"configuration file"
"lc_messages";"en_US.UTF-8";"configuration file"
"lc_monetary";"en_US.UTF-8";"configuration file"
"lc_numeric";"en_US.UTF-8";"configuration file"
"lc_time";"en_US.UTF-8";"configuration file"
"listen_addresses";"*";"configuration file"
"log_line_prefix";"%t ";"configuration file"
"maintenance_work_mem";"256MB";"configuration file"
"max_connections";"100";"configuration file"
"max_stack_depth";"2MB";"environment variable"
"port";"5432";"configuration file"
"shared_buffers";"12GB";"configuration file"
"ssl";"off";"configuration file"
"ssl_cert_file";"/etc/ssl/certs/ssl-cert-snakeoil.pem";"configuration file"
"ssl_key_file";"/etc/ssl/private/ssl-cert-snakeoil.key";"configuration file"
"synchronous_commit";"off";"configuration file"
"temp_buffers";"32MB";"configuration file"
"unix_socket_directory";"/var/run/postgresql";"configuration file"
"wal_buffers";"32MB";"configuration file"
"work_mem";"5MB";"configuration file"

Any ideas on why the database would be fine for a very short period of time when a process starts, then fall off the cliff very quickly? It's almost like it's moving to disk, but it doesn't seem to be when these queries are EXPLAIN BUFFERS 'd.

asked Mar 4, 2013 at 22:28
6
  • Shared buffers need not be to set that high, IMHO. Increasing effective_cache_size will make you rely on the OS's disk cache instead. Also keep some core reserved for the binaries (50-100M per client, depending on your work_mem settings and expected number of clients) Commented Mar 4, 2013 at 22:34
  • Is there any chance you enabled unsafe, data-eating options like fsync=off on the old host? Or was the old host a slower CPU with less ram but with an SSD, perhaps? Commented Mar 4, 2013 at 23:56
  • I find pgtune quite helpful for an initial setup: pgfoundry.org/projects/pgtune Commented Mar 5, 2013 at 11:07
  • Old system was a Similar CPU (slightly slower, but both Quad Core Xeon), had shared 7200rpm drives, hence the upgrade. Fsync is off for this initial load in because I'm loading about 60GB of Data into the dbase and doing tons of reads and writes in the process, not a copy or something. Commented Mar 5, 2013 at 17:17
  • What kind of raid controllers are on the new and old machine? And do they have a battery backup unit? Commented Mar 7, 2013 at 6:56

2 Answers 2

2

If this is a single purpose machine, I would drop shared_buffers to a small fraction of what you have and increase effective_cache_size to 60GB. My reasoning is that the PostgreSQL caching is very featureful but it is also slower than the OS cache. On a multi-purpose machine, the Pg cache can be seen as reserved memory for Pg caching and that's quite helpful, but on a single purpose machine, there are significant tradeoffs between Pg cache and OS cache and they don't go all one way. The best use of the PostgreSQL cache is to keep the most commonly used data so that it never expires from the OS disk cache. Keep in mind that maintaining a buffer pool is expensive comparatively speaking and so getting the Pg buffer pool cache to the right size (not necessarily the highest size you can!) is critical.

A good place to start is actually with your old VM's PostgreSQL settings but upping effective_cache_size appropriately. This ensures you aren't having unexpected tradeoffs and can more easily compare your existing use cases. From there you need to look at your use cases and determine what values to tweak (this applies to work_mem also. If work_mem was sufficient on the old instance, there is very little to be gained from increasing it here).

Finally it would be helpful if you could run EXPLAIN (ANALYZE, BUFFERS) [query] on both systems if that is still not up to speed, so we can compare plans and offer better recommendations.

answered Mar 5, 2013 at 2:31
0
1

This answer may be a bit late, but for me the timing you get is good enough (based on the expected query plan). Probably I guess you may want to remove the LIMIT keyword to see the plan on the worst case.

There is no constraint here, unless the disk holding the datafile is HDD, whilst I expect a good SSD one. Based on the query content, with your server specification, the only improvement I can think of is materialized view or even the application cache (LRU cache).

Just a highlight that I don't think you should disable synchronous_commit and fsync there for performance. It is there for data integrity and safety.

But assuming you want to get better performance without replace the disk or upgrade the RAM, there are three solutions you can try:

  • Materialized view: Just remember to perform update if necessary
  • Column re-alignment to reduce waste padding, this could save you some tons of disk space, especially if you hold long VARCHAR, TEXT or JSON there.
  • pg_repack or FULL VACUUM: This may help if your table is sparse and not vacuum to reclaim storage

You can try the first or the combination of two solutions below to reduce the little read of paging. Another option is to perform range partitioning, though I don't know if PostgreSQL 9.2 have that feature.

answered Dec 25, 2024 at 11:02
1
  • Just for why we need column re-alignment (which I prefer), it is because you can get more tuples within one page to cache in the "shared_buffers". Just for your concern, it does improve the storage efficiency and SELECT statement, but it also meant that you can lock more tuples within that page Commented Dec 25, 2024 at 15:45

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.