I started getting out-of-memory errors on os postgresql-9.1 server running on a x64 Debian with 4GB of RAM, dedicated to postgresql.
Top returns something like this, which seems that the problem is not with the available memory on the machine itself.
Mem: 4113160k total, 3235968k used, 877192k free, 71028k buffers
Swap: 379896k total, 20k used, 379876k free, 2865136k cached
And the config:
max_connections = 100 # (change requires restart)
ssl = true # (change requires restart)
shared_buffers = 1500MB # min 128kB
work_mem = 2MB
maintenance_work_mem = 128MB
effective_cache_size = 3GB
log_min_duration_statement = 2000
log_lock_waits = on
track_counts = on
autovacuum = on
The query that is logged here is made by OpenFire, but the interesting part is that OpenFire database is almost empty. Still maybe this could be part of a bigger transation that could take so much memory (like trying to populate the list of users from ldap). Now no table has more than 50 rows.
2013年07月19日 13:29:46 UTC ERROR: out of memory
2013年07月19日 13:29:46 UTC DETAIL: Failed on request of size 536870912.
2013年07月19日 13:29:46 UTC STATEMENT: SELECT bytes from ofRRDs where id = 1ドル
TopMemoryContext: 68688 total in 10 blocks; 2184 free (7 chunks); 66504 used
TopTransactionContext: 8192 total in 1 blocks; 7328 free (0 chunks); 864 used
TableSpace cache: 8192 total in 1 blocks; 3216 free (0 chunks); 4976 used
Operator lookup cache: 24576 total in 2 blocks; 11888 free (5 chunks); 12688 used
MessageContext: 8192 total in 1 blocks; 6976 free (0 chunks); 1216 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
smgr relation table: 24576 total in 2 blocks; 9808 free (4 chunks); 14768 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 15360 total in 4 blocks; 6304 free (0 chunks); 9056 used
ExecutorState: 452800624 total in 4 blocks; 18040 free (9 chunks); 452782584 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Relcache by OID: 24576 total in 2 blocks; 13872 free (3 chunks); 10704 used
CacheMemoryContext: 817840 total in 20 blocks; 53136 free (1 chunks); 764704 used
unnamed prepared statement: 8192 total in 1 blocks; 1088 free (0 chunks); 7104 used
ofpresence_pk: 2048 total in 1 blocks; 776 free (0 chunks); 1272 used
pg_toast_470250828_index: 2048 total in 1 blocks; 664 free (0 chunks); 1384 used
ofrrds_pk: 2048 total in 1 blocks; 776 free (0 chunks); 1272 used
ofuserflag_etime_idx: 2048 total in 1 blocks; 776 free (0 chunks); 1272 used
ofuserflag_stime_idx: 2048 total in 1 blocks; 776 free (0 chunks); 1272 used
ofuserflag_pk: 2048 total in 1 blocks; 712 free (0 chunks); 1336 used
pg_index_indrelid_index: 2048 total in 1 blocks; 728 free (0 chunks); 1320 used
pg_db_role_setting_databaseid_rol_index: 2048 total in 1 blocks; 712 free (0 chunks); 1336 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_foreign_data_wrapper_name_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_enum_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_class_relname_nsp_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
pg_foreign_server_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_statistic_relid_att_inh_index: 3072 total in 2 blocks; 1496 free (2 chunks); 1576 used
pg_cast_source_target_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
pg_language_name_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_collation_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1496 free (2 chunks); 1576 used
pg_index_indexrelid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_opclass_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_foreign_data_wrapper_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_ts_dict_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_conversion_default_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1496 free (2 chunks); 1576 used
pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_enum_typid_label_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_ts_config_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_user_mapping_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_foreign_table_relid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_type_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_constraint_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_ts_parser_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_operator_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_namespace_nspname_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_ts_template_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1496 free (2 chunks); 1576 used
pg_default_acl_role_nsp_obj_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_collation_name_enc_nsp_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_type_typname_nsp_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_opfamily_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_class_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1544 free (2 chunks); 1528 used
pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1656 free (3 chunks); 1416 used
pg_proc_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_language_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_namespace_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1496 free (2 chunks); 1576 used
pg_foreign_server_name_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_conversion_oid_index: 3072 total in 2 blocks; 1768 free (4 chunks); 1304 used
pg_user_mapping_user_server_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_authid_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_auth_members_member_role_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_tablespace_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_database_datname_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_auth_members_role_member_index: 3072 total in 2 blocks; 1704 free (4 chunks); 1368 used
pg_database_oid_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
pg_authid_rolname_index: 3072 total in 2 blocks; 1720 free (3 chunks); 1352 used
MdSmgr: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used
LOCALLOCK hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
Timezones: 83472 total in 2 blocks; 3744 free (0 chunks); 79728 used
ErrorContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
2013年07月19日 13:30:45 UTC ERROR: out of memory
2013年07月19日 13:30:45 UTC DETAIL: Failed on request of size 536870912.
2013年07月19日 13:30:45 UTC STATEMENT: SELECT bytes from ofRRDs where id = 1ドル
2 Answers 2
A possible explanation would be that the requested ofRRDs.bytes
columns is very big and the system fails to find a contiguous piece of about 512Mb of RAM to generate its textual representation.
A similar problem is explained in this message from pgsql-general. Excerpt:
The textual representation of arbitrary bytea data is normally several times the size of the raw bits (worst case is 5x bigger, typical case perhaps half that). In addition to that you have to consider that there are likely to be several copies of the string floating around in your process' memory space.
This was before Postgres 9.x and its default hex format for bytea textual representation, but still the size multiplier is 2x for PG 9.1
With only 4Gb of RAM and 1.5Gb of it taken for shared_buffers, it's plausible to end up in a legit out-of-memory condition.
But see the result of:
SELECT id,octet_length(bytes) FROM ofRRDs ORDER BY 2;
to check the actual contents sizes in this column.
-
The query returned ~10 records with lenghts between 76MB and 150MB. So I guess that may be cause by the poor implementation of openfire and that more memory to the server may solve the problem. The question is how much should I add? It's a VM so it should be easy to add more. Any other changes to make? Thankssorin– sorin2013年07月24日 10:57:21 +00:00Commented Jul 24, 2013 at 10:57
-
It's not easy to predict how much more memory is needed, because of the fragmentation issue. I think I would try +2G of RAM if possible, set
shared_buffers
to 500M only, and have the swap space as big as the total RAM.Daniel Vérité– Daniel Vérité2013年07月25日 13:46:35 +00:00Commented Jul 25, 2013 at 13:46
I am not sure if it helps here, but I have had problems when work_mem was too big. With 16MB or bigger I got into such troubles. I reduced to 8MB but now had out of memory condition again. So reduced further to 4M.
The issue with work_mem is, while much more could be useful for sorting a lot of these work_mem areas can be allocated for different plan steps and so, it all adds up. I'm not sure that work_mem tweaking is a great idea as it is really hard to predict, and extremely painful and wasteful to arrive in an out of memory condition if you have an expensive analytical transaction running for hours only to die of this very unpredictable situation.
In my case the total memory used by the job was 1.3 GB on an 8 GB server, with very little activity going on at the same time.
top
says there are 2.8Gb cached + 0.877Gb of free mem. Add 1.5Gb of shared_buffers, and not even counting the rest of the system this amounts already to more than the physical RAM. Isn't that weird?