9

Enterprise 1.14.1.0, nginx, php-fpm, solr, redis (2 instances - one for fpc, and one for everything else)

We're completely at our wit's end - we have tweaked every configuration setting that seems plausible... and the issue persists every single day. Any ideas would be greatly appreciated.

Our site experiences redis locking up and dumping errors like this for several minutes, then things clear up and resume back to normal.

 a:4:{i:0;s:24:"read error on connection";i:1;s:1196:"#0 /var/www/magento/lib/Credis/Client.php(440): Credis_Client->__call('select', Array)
#1 /var/www/magento/lib/Cm/Cache/Backend/Redis.php(117): Credis_Client->select(0)
#2 /var/www/magento/lib/Zend/Cache.php(153): Cm_Cache_Backend_Redis->__construct(Array)
#3 /var/www/magento/lib/Zend/Cache.php(94): Zend_Cache::_makeBackend('Cm_Cache_Backen...', Array, true, true)
#4 /var/www/magento/app/code/local/Extendware/EWCacheBackend/Model/Override/Mage/Core/Cache.php(25): Zend_Cache::factory()
#5 /var/www/magento/app/code/core/Mage/Core/Model/Config.php(1348): ExtendwareInternal_EWCacheBackend_Model_Override_Mage_Core_Cache->__construct(Array)
#6 /var/www/magento/app/Mage.php(463): Mage_Core_Model_Config->getModelInstance('core/cache', Array)
#7 /var/www/magento/app/code/core/Mage/Core/Model/App.php(401): Mage::getModel('core/cache', Array)
#8 /var/www/magento/app/code/core/Mage/Core/Model/App.php(295): Mage_Core_Model_App->_initCache(Array)
#9 /var/www/magento/app/code/core/Mage/Core/Model/App.php(337): Mage_Core_Model_App->baseInit(Array)
#10 /var/www/magento/app/Mage.php(684): Mage_Core_Model_App->run(Array)
#11 /var/www/magento/index.php(87): Mage::run('default', 'store')
#12 {main}";s:3:"url";s:48:"/";s:11:"script_name";s:10:"/index.php";}
  • Magento Support has confirmed proper configuration from the Magento side (they logged in, investigated, ran tests, and claim it's "server issue" not a Magento issue)
  • Our host (Peer1) has some Magento specialists who have spent many hours working with us to diagnose the problem... and can't yet find any specific issues with resourcing of RAM, redis config, etc, etc.
  • Our RAM has been doubled to 64 GB, and we are currently sitting with about 20GB of RAM sitting free.
  • PHP, PHP-FPM are consuming a LOT of CPU
  • This is all on the web server, the db server is separate and running fine

Here's a snapshot of Top during normal operations (when errors are not occuring)

top - 12:13:47 up 2 days, 16:05, 2 users, load average: 6.51, 7.33, 7.26
Tasks: 639 total, 13 running, 625 sleeping, 0 stopped, 1 zombie
Cpu(s): 47.7%us, 1.1%sy, 0.0%ni, 50.9%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 65937660k total, 41186844k used, 24750816k free, 551968k buffers
Swap: 8388604k total, 0k used, 8388604k free, 13040128k cached
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
29720 nginx 20 0 1927m 553m 12m R 99.6 0.9 615:37.52 php 
 8638 nginx 20 0 1532m 146m 9236 R 98.3 0.2 4:01.43 php-fpm 
 8618 nginx 20 0 1582m 195m 8268 R 96.3 0.3 4:06.96 php-fpm 
 8664 nginx 20 0 1521m 144m 6932 S 96.0 0.2 4:12.06 php-fpm 
 8699 nginx 20 0 1582m 196m 9508 R 94.0 0.3 3:33.90 php-fpm 
 8740 nginx 20 0 1581m 194m 8188 R 94.0 0.3 4:05.63 php-fpm 
 8735 nginx 20 0 1583m 196m 9204 R 90.4 0.3 4:27.89 php-fpm 
 8634 nginx 20 0 1582m 195m 8080 R 81.4 0.3 4:07.25 php-fpm 
 8727 nginx 20 0 1529m 141m 9076 S 73.8 0.2 4:54.04 php-fpm 
 8597 nginx 20 0 1529m 143m 8020 S 62.6 0.2 3:56.57 php-fpm 
 8743 nginx 20 0 1518m 132m 9204 R 51.0 0.2 4:10.01 php-fpm 
 8601 nginx 20 0 1517m 132m 9284 S 48.7 0.2 4:04.32 php-fpm 
 8611 nginx 20 0 1530m 143m 8104 S 38.7 0.2 4:12.82 php-fpm 
 8614 nginx 20 0 1511m 124m 8216 R 28.1 0.2 3:26.96 php-fpm 
 8714 nginx 20 0 1532m 145m 8260 S 27.5 0.2 4:15.64 php-fpm 
 8684 nginx 20 0 1528m 142m 8232 R 22.8 0.2 4:01.39 php-fpm 
 8700 nginx 20 0 1519m 142m 5216 R 22.8 0.2 4:44.06 php-fpm 
 8691 nginx 20 0 1532m 146m 8208 R 19.5 0.2 4:32.76 php-fpm 
 8622 nginx 20 0 1533m 146m 8216 S 16.2 0.2 4:57.07 php-fpm 
 9134 redis 20 0 880m 175m 888 S 4.6 0.3 36:39.15 redis-server 
 9112 redis 20 0 1192m 354m 880 S 2.6 0.6 57:50.85 redis-server 
 8566 nginx 20 0 93340 5084 1784 S 1.0 0.0 0:02.75 nginx 

Here's our Magento Config for Redis:

<!-- redis cache -->
<cache>
 <backend>Cm_Cache_Backend_Redis</backend>
 <backend_options>
 <server>127.0.0.1</server>
 <port>6379</port>
 <database>0</database>
 <password></password>
 <force_standalone>0</force_standalone> 
 <connect_retries>1</connect_retries>
 <automatic_cleaning_factor>0</automatic_cleaning_factor>
 <compress_data>1</compress_data>
 <compress_tags>1</compress_tags>
 <compress_threshold>20480</compress_threshold>
 <compression_lib>gzip</compression_lib>
 <persistent></persistent>
 <read_timeout>10</read_timeout>
 </backend_options>
</cache>
<!-- redis full page cache -->
<full_page_cache>
 <backend>Cm_Cache_Backend_Redis</backend>
 <backend_options>
 <server>127.0.0.1</server>
 <port>6380</port>
 <database>1</database>
 <password></password>
 <force_standalone>0</force_standalone>
 <connect_retries>1</connect_retries>
 <automatic_cleaning_factor>0</automatic_cleaning_factor>
 <compress_data>0</compress_data>
 <lifetimelimit>57600</lifetimelimit>
 <persistent></persistent>
 <read_timeout>10</read_timeout>
 </backend_options>
</full_page_cache>

I've reviewed the Redis slow log and found quite a few (seemingly) massive saves that are taking a while... note that ONLY the full-page-cache instance has any entries in the slow log. Are these normal? These are the full page cache instance. Here's two examples of the 250 entries in the slow log:

 7) 1) (integer) 356
 2) (integer) 1429287105
 3) (integer) 40685
 4) 1) "DEL"
 2) "zc:k:e84_REQEST_91D0FF548C77B6F58148AF6A7F586C58"
 3) "zc:k:e84_REQEST_8212780F50033B79092F556EE14C1E6E"
 4) "zc:k:e84_REQEST_86DCFE6E580707552F38B507272FD2FA"
 5) "zc:k:e84_REQEST_FE69215CAA18AA2D33DE6DE2E31F7491"
 6) "zc:k:e84_1ED6001E13E4EEA9AB6E19EEC6742A89_1_3_745"
 7) "zc:k:e84_REQEST_45ACA3435B0016847D4B060C6436B366"
 8) "zc:k:e84_REQEST_985ED12A38AFD74402B159AF8B1B5741"
 9) "zc:k:e84_REQEST_11022B655050F20ED54B4AE51EF12CE6"
 10) "zc:k:e84_REQEST_26CA3F7EA8612325E845B3B236E0CCFE"
 11) "zc:k:e84_REQEST_646FBF4016B630643F3D06C2162721F6_METADATA"
 12) "zc:k:e84_REQEST_29C5D2EC342997382A7DB7DF7DDD2F6D"
 13) "zc:k:e84_REQEST_F8A6D1AF029997A51F3D1A4E2F3CB725"
 14) "zc:k:e84_REQEST_8DBB0D20DDFC426B089F52432757C47E"
 15) "zc:k:e84_REQEST_5EA7B23B0B309045EB6A81EF4B9E6A66_METADATA"
 16) "zc:k:e84_REQEST_EA531324268EF209504816504B114FAC"
 17) "zc:k:e84_REQEST_7C659A3F9178A62AED9B0499AEED0130"
 18) "zc:k:e84_REQEST_6807CA64C0FF1C29B4069BF3B472252B"
 19) "zc:k:e84_8C636CBFA6DB338DD11F600771E54FAC_1_3_43_288"
 20) "zc:k:e84_REQEST_E062EAC26446FACE3025FC6AE1E632A9"
 21) "zc:k:e84_7D5DE581F27790730C0E000F9D69F2E3_1_3_53_53"
 22) "zc:k:e84_REQEST_C4F854BA432A2884E49A1BA245D756A5"
 23) "zc:k:e84_REQEST_922C03E387E66DF9EBBCE0186BD67C3C"
 24) "zc:k:e84_REQEST_503B90ED3CBE43438A0C740A68B1E576_METADATA"
 25) "zc:k:e84_REQEST_7B9B9FAAE5FA78DB6934DC0658BC5D83"
 26) "zc:k:e84_REQEST_E8FE8236DDA57C9830B7843623310E86"
 27) "zc:k:e84_REQEST_5826188B159CBCD0E7E9BFBE3BB1D599"
 28) "zc:k:e84_REQEST_37F7CBE45FEF49E4D7F38ECE4B760996"
 29) "zc:k:e84_REQEST_AACD280C3AB5E27760E80A4D1888AD1F"
 30) "zc:k:e84_REQEST_47CE15DD5D5391B1413F5ECF150533DB"
 31) "zc:k:e84_REQEST_E261C23C996BC8D45111F2A23DDED93A"
 32) "... (1633 more arguments)"
10) 1) (integer) 353
 2) (integer) 1429285945
 3) (integer) 37193
 4) 1) "DEL"
 2) "zc:k:e84_REQEST_92BAB7E07D28861B459120B0CDCF0AD7"
 3) "zc:k:e84_REQEST_32FC6F43EABA88A2E665630C189A751C"
 4) "zc:k:e84_REQEST_55E596F3198540E5E249CB2A4C2690D1"
 5) "zc:k:e84_REQEST_B5F8BD16ED9D77245DC1FE0201FE1B1B"
 6) "zc:k:e84_REQEST_CBBFFAEAABE47237EF465B6D00BE39F7"
 7) "zc:k:e84_REQEST_D5BD191B4E0389F1C60271CCD33163BE"
 8) "zc:k:e84_1ED6001E13E4EEA9AB6E19EEC6742A89_1_3_745"
 9) "zc:k:e84_REQEST_E498601DC7D6D799B358A8B5A86FC783"
 10) "zc:k:e84_REQEST_6A6D07696354D06F1414F3F7A5A3967A"
 11) "zc:k:e84_REQEST_EB6E820EA5D8598C008BA4A94C2AC904"
 12) "zc:k:e84_REQEST_EEACD0C782C80044325B4E04D32E0F34"
 13) "zc:k:e84_REQEST_8CD8334B055C58AD92208DB9BE0F9D1D_METADATA"
 14) "zc:k:e84_REQEST_70EFDB6D70BE88D367B0C48629E561FE"
 15) "zc:k:e84_REQEST_B9FF6FA7A3CFE8B1F1521A73DF71D308"
 16) "zc:k:e84_REQEST_14310D5749525D4828E6B6D2F090A366"
 17) "zc:k:e84_REQEST_0CE5E095669A57D17FD780A3BA72A84F_METADATA"
 18) "zc:k:e84_REQEST_28B2C39A828640D0B1A49E77A56F6A09"
 19) "zc:k:e84_REQEST_17EF76BC89E5EE5DDDB0D752514BBCC4"
 20) "zc:k:e84_REQEST_76B91839CA4F29C45E4628C67A8FEFDE"
 21) "zc:k:e84_REQEST_F4D7B7D455BF29C8A281D6CC387A86D9"
 22) "zc:k:e84_7878AC51A7ECCF1F57E2E74B6FF3CD20_1_3_10_750"
 23) "zc:k:e84_REQEST_90B19F2BBC6EA90C7AC1F91242C316A8"
 24) "zc:k:e84_8C636CBFA6DB338DD11F600771E54FAC_1_3_43_288"
 25) "zc:k:e84_REQEST_0C95AC5E87D3F74F62A1FBE5B822A9B2"
 26) "zc:k:e84_7D5DE581F27790730C0E000F9D69F2E3_1_3_53_53"
 27) "zc:k:e84_REQEST_3B852462504D3226F5F73D387C0C250F"
 28) "zc:k:e84_REQEST_5A5403A3603FD208332F9CBD631C8B4B_METADATA"
 29) "zc:k:e84_REQEST_B8152798A9D1F3C4C45C876FC9F2942A"
 30) "zc:k:e84_REQEST_2E12C5B2DD8B55BB5D4343DF9241EA29"
 31) "zc:k:e84_REQEST_B241A180FBB7E511A74C2129667C2BAD"
 32) "... (3272 more arguments)"
6
  • 1
    Did you have a look at github.com/colinmollenhour/Cm_Cache_Backend_Redis/issues/18 Commented Apr 18, 2015 at 15:36
  • first of all - upgrade redis and phpredis, then change compression lib to lzf or snappy, then PHP-FPM are consuming a LOT of CPU because of a bad code, or you have some evil loops running somewhere, run profiler or database monitoring, and FYI - at PEER1 there are no specialists, unfortunately, all they can do to force customers to buy a huge expensive servers in the hope that a huge amount of memory will help cope with the problems ... Commented Apr 20, 2015 at 18:42
  • Could I see your /etc/php-fpm.d/www.conf ? Commented Jan 22, 2016 at 11:49
  • That is why while running in production, people usually have more than 1 server. Usually 1 WEB with a lot of CPUs, 1 DB with a lot of RAM and 1 CACHE (redis, varnish...) with a lot of CPUs and a quite lot of RAM. Commented Apr 20, 2016 at 1:56
  • Did you try to raise <connect_retries>1</connect_retries> to a higher number like 3? Commented Feb 3, 2017 at 23:43

1 Answer 1

1

You might try checking for network connection limits. We've seen that happen before where something opens a TON of connections to Redis, and stops further connections, then goes away on its own. The error messages are deceptive because they won't flat out say that and it takes some poking around to actually figure that out.

I.e. Redis can only have so many open connections, as may a given service on your particular setup, so if you hit that it won't appear in any Magento/Redis logs directly.

answered May 29, 2018 at 15:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.