Poll error 110 (Connection timed out) - Error 503 Backend fetch failed

Question 1

These are my settings:

cat /etc/default/varnish

# Configuration file for varnish SysV init script
#
# /etc/init.d/varnish expects the variables $DAEMON_OPTS, $NFILES and $MEMLOCK
# to be set from this shell script fragment.
#
# Note: If systemd is installed, this file is obsolete and ignored. Please see 
# /usr/share/doc/varnish/examples/varnish.systemd-drop-in.conf
# Maximum number of open files (for ulimit -n)
NFILES=131072
# Maximum locked memory size (for ulimit -l)
# Used for locking the shared memory log in memory. If you increase log size,
# you need to increase this number as well
MEMLOCK=82000
# Default varnish instance name is the local nodename. Can be overridden with
# the -n switch, to have more instances on a single server.
# You may need to uncomment this variable for alternatives 1 and 3 below.
# INSTANCE=$(uname -n)
# This file contains 4 alternatives, please use only one.
## Alternative 1, Minimal configuration, no VCL
#
# Listen on port 6081, administration on localhost:6082, and forward to
# content server on localhost:8080. Use a 1GB fixed-size cache file.
#
# This example uses the INSTANCE variable above, which you need to uncomment.
#
# DAEMON_OPTS="-a :6081 \
# -T localhost:6082 \
# -b localhost:8080 \
# -u varnish -g varnish \
# -S /etc/varnish/secret \
# -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"
## Alternative 2, Configuration with VCL
#
# Listen on port 6081, administration on localhost:6082, and forward to
# one content server selected by the vcl file, based on the request.
#
DAEMON_OPTS="-a :6081 \
 -T localhost:6082 \
 -f /etc/varnish/default.vcl \
 -S /etc/varnish/secret \
 -s malloc,1G \
 -p http_resp_hdr_len=700000 \
 -p http_resp_size=1000000"
## Alternative 3, Advanced configuration
#
# This example uses the INSTANCE variable above, which you need to uncomment.
#
# See varnishd(1) for more information.
#
# # Main configuration file. You probably want to change it :)
# VARNISH_VCL_CONF=/etc/varnish/default.vcl
#
# # Default address and port to bind to
# # Blank address means all IPv4 and IPv6 interfaces, otherwise specify
# # a host name, an IPv4 dotted quad, or an IPv6 address in brackets.
# VARNISH_LISTEN_ADDRESS=
# VARNISH_LISTEN_PORT=6081
#
# # Telnet admin interface listen address and port
# VARNISH_ADMIN_LISTEN_ADDRESS=127.0.0.1
# VARNISH_ADMIN_LISTEN_PORT=6082
#
# # Cache file location
# VARNISH_STORAGE_FILE=/var/lib/varnish/$INSTANCE/varnish_storage.bin
#
# # Cache file size: in bytes, optionally using k / M / G / T suffix,
# # or in percentage of available disk space using the % suffix.
# VARNISH_STORAGE_SIZE=1G
#
# # File containing administration secret
# VARNISH_SECRET_FILE=/etc/varnish/secret
# 
# # Backend storage specification
# VARNISH_STORAGE="file,${VARNISH_STORAGE_FILE},${VARNISH_STORAGE_SIZE}"
#
# # Default TTL used when the backend does not specify one
# VARNISH_TTL=120
#
# # DAEMON_OPTS is used by the init script. If you add or remove options, make
# # sure you update this section, too.
# DAEMON_OPTS="-a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \
# -f ${VARNISH_VCL_CONF} \
# -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \
# -t ${VARNISH_TTL} \
# -S ${VARNISH_SECRET_FILE} \
# -s ${VARNISH_STORAGE}"
#
## Alternative 4, Do It Yourself
#
# DAEMON_OPTS=""

cat /etc/varnish/default.vcl

vcl 4.1;
import std;
probe healthcheck {
 .url = "/health_check.php";
 .timeout = 5s; # Czas oczekiwania na odpowiedź
 .interval = 5s; # Częstotliwość prób zdrowotności
 .window = 10; # Ilość prób do przeanalizowania
 .threshold = 5; # Ilość sukcesów potrzebna do uznania backendu za zdrowy
}
backend default {
 .host = "127.0.0.1";
 .port = "8080";
 .first_byte_timeout = 20s; # Zwiększono z 15s
 .connect_timeout = 10s;
 .between_bytes_timeout = 10s; # Zwiększono z 5s
 .probe = healthcheck;
}
acl unwanted {
 "114.119.0.0"/16;
}
sub vcl_recv {
 if (client.ip ~ unwanted) {
 return(synth(403, "Access denied"));
 }
}
acl purge {
 "localhost";
 "127.0.0.1";
 "::1";
}
sub vcl_recv {
 if (req.url ~ "\?$") {
 set req.url = regsub(req.url, "\?$", "");
 }
 set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");
 
 set req.url = std.querysort(req.url);
 
 unset req.http.proxy;
 if (!req.http.X-Forwarded-Proto && (std.port(server.ip) == 443)) {
 set req.http.X-Forwarded-Proto = "https";
 }
 
 if (std.healthy(req.backend_hint)) {
 set req.grace = 300s;
 }
 
 if (req.method == "PURGE") {
 if (client.ip !~ purge) {
 return (synth(405, "Method not allowed"));
 }
 if (!req.http.X-Magento-Tags-Pattern && !req.http.X-Pool) {
 return (synth(400, "X-Magento-Tags-Pattern or X-Pool header required"));
 }
 if (req.http.X-Magento-Tags-Pattern) {
 ban("obj.http.X-Magento-Tags ~ " + req.http.X-Magento-Tags-Pattern);
 }
 if (req.http.X-Pool) {
 ban("obj.http.X-Pool ~ " + req.http.X-Pool);
 }
 return (synth(200, "Purged"));
 }
 
 if (req.method != "GET" &&
 req.method != "HEAD" &&
 req.method != "PUT" &&
 req.method != "POST" &&
 req.method != "PATCH" &&
 req.method != "TRACE" &&
 req.method != "OPTIONS" &&
 req.method != "DELETE") {
 return (pipe);
 }
 
 if (req.method != "GET" && req.method != "HEAD") {
 return (pass);
 }
 if (req.url ~ "/checkout") {
 return (pass);
 }
 if (req.url ~ "^/(pub/)?(health_check.php)$") {
 return (pass);
 }
 std.collect(req.http.Cookie);
 if (req.url ~ "(\?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|fbclid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=") {
 set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|fbclid|mc_[a-z]+|utm_[a-z]+|_bta_[a-z]+)=[-_A-z0-9+()%.]+&?", "");
 set req.url = regsub(req.url, "[?|&]+$", "");
 }
 if (req.url ~ "/graphql" && req.http.Authorization ~ "^Bearer") {
 return (pass);
 }
 return (hash);
}
sub vcl_hash {
 if (req.http.cookie ~ "X-Magento-Vary=") {
 hash_data(regsub(req.http.cookie, "^.*?X-Magento-Vary=([^;]+);*.*$", "1円"));
 } else {
 hash_data("");
 }
 
 hash_data(req.http.X-Forwarded-Proto);
 if (req.url ~ "/graphql") {
 hash_data(req.http.Store);
 hash_data(req.http.Content-Currency);
 }
}
sub vcl_backend_response {
 set beresp.grace = 3d;
 if (beresp.http.content-type ~ "text") {
 set beresp.do_esi = true;
 }
 if (bereq.url ~ "\.js$" || beresp.http.content-type ~ "text") {
 set beresp.do_gzip = true;
 }
 
 if (beresp.http.X-Magento-Debug) {
 set beresp.http.X-Magento-Cache-Control = beresp.http.Cache-Control;
 }
 if (beresp.status != 200 && beresp.status != 404) {
 set beresp.ttl = 120s;
 set beresp.uncacheable = true;
 return (deliver);
 } elseif (beresp.http.Cache-Control ~ "private") {
 set beresp.uncacheable = true;
 set beresp.ttl = 86400s;
 return (deliver);
 }
 if (beresp.ttl > 0s && (bereq.method == "GET" || bereq.method == "HEAD")) {
 unset beresp.http.set-cookie;
 }
 
 if (beresp.ttl <= 0s ||
 beresp.http.Surrogate-control ~ "no-store" ||
 (!beresp.http.Surrogate-Control &&
 beresp.http.Cache-Control ~ "no-cache|no-store") ||
 beresp.http.Vary == "*") {
 set beresp.ttl = 120s;
 set beresp.uncacheable = true;
 }
 return (deliver);
}
sub vcl_deliver {
 if (resp.http.X-Magento-Debug) {
 if (obj.uncacheable) {
 set resp.http.X-Magento-Cache-Debug = "UNCACHEABLE";
 } else if (obj.hits) {
 set resp.http.X-Magento-Cache-Debug = "HIT";
 set resp.http.Grace = req.http.grace;
 } else {
 set resp.http.X-Magento-Cache-Debug = "MISS";
 }
 } else {
 unset resp.http.Age;
 }
 if (resp.http.Cache-Control !~ "private" && req.url !~ "^/(pub/)?(media|static)/") {
 set resp.http.Pragma = "no-cache";
 set resp.http.Expires = "-1";
 set resp.http.Cache-Control = "no-store, no-cache, must-revalidate, max-age=0";
 }
 
 unset resp.http.X-Magento-Debug;
 unset resp.http.X-Magento-Tags;
 unset resp.http.X-Powered-By;
 unset resp.http.Server;
 unset resp.http.X-Varnish;
 unset resp.http.Via;
 unset resp.http.Link;
}

cat /etc/apache2/apache2.conf

# This is the main Apache server configuration file. It contains the
# configuration directives that give the server its instructions.
# See http://httpd.apache.org/docs/2.4/ for detailed information about
# the directives and /usr/share/doc/apache2/README.Debian about Debian specific
# hints.
#
#
# Summary of how the Apache 2 configuration works in Debian:
# The Apache 2 web server configuration in Debian is quite different to
# upstream's suggested way to configure the web server. This is because Debian's
# default Apache2 installation attempts to make adding and removing modules,
# virtual hosts, and extra configuration directives as flexible as possible, in
# order to make automating the changes and administering the server as easy as
# possible.
# It is split into several files forming the configuration hierarchy outlined
# below, all located in the /etc/apache2/ directory:
#
# /etc/apache2/
# |-- apache2.conf
# | -- ports.conf
# |-- mods-enabled
# | |-- *.load
# | -- *.conf
# |-- conf-enabled
# | -- *.conf
# -- sites-enabled
# -- *.conf
#
#
# * apache2.conf is the main configuration file (this file). It puts the pieces
# together by including all remaining configuration files when starting up the
# web server.
#
# * ports.conf is always included from the main configuration file. It is
# supposed to determine listening ports for incoming connections which can be
# customized anytime.
#
# * Configuration files in the mods-enabled/, conf-enabled/ and sites-enabled/
# directories contain particular configuration snippets which manage modules,
# global configuration fragments, or virtual host configurations,
# respectively.
#
# They are activated by symlinking available configuration files from their
# respective *-available/ counterparts. These should be managed by using our
# helpers a2enmod/a2dismod, a2ensite/a2dissite and a2enconf/a2disconf. See
# their respective man pages for detailed information.
#
# * The binary is called apache2. Due to the use of environment variables, in
# the default configuration, apache2 needs to be started/stopped with
# /etc/init.d/apache2 or apache2ctl. Calling /usr/bin/apache2 directly will not
# work with the default configuration.
# Global configuration
#
#
# ServerRoot: The top of the directory tree under which the server's
# configuration, error, and log files are kept.
#
# NOTE! If you intend to place this on an NFS (or otherwise network)
# mounted filesystem then please read the Mutex documentation (available
# at <URL:http://httpd.apache.org/docs/2.4/mod/core.html#mutex>);
# you will save yourself a lot of trouble.
#
# Do NOT add a slash at the end of the directory path.
#
#ServerRoot "/etc/apache2"
#
# The accept serialization lock file MUST BE STORED ON A LOCAL DISK.
#
#Mutex file:${APACHE_LOCK_DIR} default
#
# The directory where shm and other runtime files will be stored.
#
DefaultRuntimeDir ${APACHE_RUN_DIR}
#
# PidFile: The file in which the server should record its process
# identification number when it starts.
# This needs to be set in /etc/apache2/envvars
#
PidFile ${APACHE_PID_FILE}
#
# Timeout: The number of seconds before receives and sends time out.
#
Timeout 300
#
# KeepAlive: Whether or not to allow persistent connections (more than
# one request per connection). Set to "Off" to deactivate.
#
KeepAlive On
#
# MaxKeepAliveRequests: The maximum number of requests to allow
# during a persistent connection. Set to 0 to allow an unlimited amount.
# We recommend you leave this number high, for maximum performance.
#
MaxKeepAliveRequests 100
#
# KeepAliveTimeout: Number of seconds to wait for the next request from the
# same client on the same connection.
#
KeepAliveTimeout 5
# These need to be set in /etc/apache2/envvars
User ${APACHE_RUN_USER}
Group ${APACHE_RUN_GROUP}
#
# HostnameLookups: Log the names of clients or just their IP addresses
# e.g., www.apache.org (on) or 204.62.129.132 (off).
# The default is off because it'd be overall better for the net if people
# had to knowingly turn this feature on, since enabling it means that
# each client request will result in AT LEAST one lookup request to the
# nameserver.
#
HostnameLookups Off
# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <VirtualHost>
# container, error messages relating to that virtual host will be
# logged here. If you *do* define an error logfile for a <VirtualHost>
# container, that host's errors will be logged there and not here.
#
ErrorLog ${APACHE_LOG_DIR}/error.log
#
# LogLevel: Control the severity of messages logged to the error_log.
# Available values: trace8, ..., trace1, debug, info, notice, warn,
# error, crit, alert, emerg.
# It is also possible to configure the log level for particular modules, e.g.
# "LogLevel info ssl:warn"
#
LogLevel warn
# Include module configuration:
IncludeOptional mods-enabled/*.load
IncludeOptional mods-enabled/*.conf
# Include list of ports to listen on
Include ports.conf
# Sets the default security model of the Apache2 HTTPD server. It does
# not allow access to the root filesystem outside of /usr/share and /var/www.
# The former is used by web applications packaged in Debian,
# the latter may be used for local directories served by the web server. If
# your system is serving content from a sub-directory in /srv you must allow
# access here, or in any related virtual host.
<Directory />
 Options FollowSymLinks
 AllowOverride None
 Require all denied
</Directory>
<Directory /usr/share>
 AllowOverride None
 Require all granted
</Directory>
<Directory /var/www/>
 Options Indexes FollowSymLinks
 AllowOverride None
 Require all granted
</Directory>
#<Directory /srv/>
# Options Indexes FollowSymLinks
# AllowOverride None
# Require all granted
#</Directory>
# AccessFileName: The name of the file to look for in each directory
# for additional configuration directives. See also the AllowOverride
# directive.
#
AccessFileName .htaccess
#
# The following lines prevent .htaccess and .htpasswd files from being
# viewed by Web clients.
#
<FilesMatch "^\.ht">
 Require all denied
</FilesMatch>
#
# The following directives define some format nicknames for use with
# a CustomLog directive.
#
# These deviate from the Common Log Format definitions in that they use %O
# (the actual bytes sent including headers) instead of %b (the size of the
# requested file), because the latter makes it impossible to detect partial
# requests.
#
# Note that the use of %{X-Forwarded-For}i instead of %h is not recommended.
# Use mod_remoteip instead.
#
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
# Include of directories ignores editors' and dpkg's backup files,
# see README.Debian for details.
# Include generic snippets of statements
IncludeOptional conf-enabled/*.conf
# Include the virtual host configurations:
IncludeOptional sites-enabled/*.conf
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
<IfModule mpm_prefork_module>
 StartServers 5
 MinSpareServers 5
 MaxSpareServers 10
 MaxRequestWorkers 150
 MaxConnectionsPerChild 0
</IfModule>
<IfModule mpm_worker_module>
 StartServers 2
 MinSpareThreads 25
 MaxSpareThreads 75
 ThreadLimit 64
 ThreadsPerChild 25
 MaxRequestWorkers 150
 MaxConnectionsPerChild 0
</IfModule>

cat mods-available/mpm_prefork.conf

# prefork MPM
# StartServers: number of server processes to start
# MinSpareServers: minimum number of server processes which are kept spare
# MaxSpareServers: maximum number of server processes which are kept spare
# MaxRequestWorkers: maximum number of server processes allowed to start
# MaxConnectionsPerChild: maximum number of requests a server process serves
<IfModule mpm_prefork_module>
 StartServers 5
 MinSpareServers 5
 MaxSpareServers 10
 MaxRequestWorkers 150
 MaxConnectionsPerChild 0
</IfModule>
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet

Nevertheless, every now and then I have

varnishlog -g raw -i Backend_health
0 Backend_health - default Still healthy 4---X-RH 10 5 10 1.311902 1.455547 "HTTP/1.1 200 OK"
0 Backend_health - default Still healthy 4---X-RH 10 5 10 2.522449 1.722273 "HTTP/1.1 200 OK"
0 Backend_health - default Still healthy 4---Xr-- 9 5 10 0.000000 1.722273 "Poll error 110 (Connection timed out)"
0 Backend_health - default Still healthy 4---Xr-- 8 5 10 0.000000 1.722273 "Poll error 110 (Connection timed out)"
0 Backend_health - default Still healthy 4---X-RH 8 5 10 2.742180 1.977249 "HTTP/1.1 200 OK"
0 Backend_health - default Still healthy 4---X-RH 8 5 10 0.087281 1.504757 "HTTP/1.1 200 OK"
0 Backend_health - default Still healthy 4---X-RH 8 5 10 0.074608 1.147220 "HTTP/1.1 200 OK"
0 Backend_health - default Still healthy 4---X-RH 8 5 10 0.079414 0.880269 "HTTP/1.1 200 OK"
0 Backend_health - default Still healthy 4---X-RH 8 5 10 0.067216 0.677005 "HTTP/1.1 200 OK"

In the browser:

Error 503 Backend fetch failed
Backend fetch failed
Guru Meditation:
XID: 1966465
Varnish cache server

varnishlog

* << BeReq >> 1966465 
- Begin bereq 1736720 fetch
- VCL_use boot
- Timestamp Start: 1719331681.322400 0.000000 0.000000
- BereqMethod GET
- BereqURL /ideus-03028-late-led-5m-300-ww-ip65-p-59027.html
- BereqProtocol HTTP/1.1
- BereqHeader CF-RAY: 899636a37abdbbd5-WAW
- BereqHeader X-Forwarded-Proto: https
- BereqHeader CF-Visitor: {"scheme":"https"}
- BereqHeader user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0
- BereqHeader accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8
- BereqHeader accept-language: pl,en-US;q=0.7,en;q=0.3
- BereqHeader dnt: 1
- BereqHeader referer: https://xxx.xx/site.html
- BereqHeader upgrade-insecure-requests: 1
- BereqHeader sec-fetch-dest: document
- BereqHeader sec-fetch-mode: navigate
- BereqHeader sec-fetch-site: same-origin
- BereqHeader sec-fetch-user: ?1
- BereqHeader priority: u=0, i
- BereqHeader cookie: private_content_version=cece6d9b21e6527e5d1aeaae9f00f81d; PHPSESSID=nhkbsgol7f7mb86ee9ljve01n4; form_key=wEHBw9X442DhwoBd; X-Magento-Vary=6319496fa1054416624a77cd57a575af3e22da67; mage-cache-sessid=true; section_data_ids={%22messages%22:171932861
- BereqHeader CF-Connecting-IP: 83.24.15.24
- BereqHeader CDN-Loop: cloudflare
- BereqHeader CF-IPCountry: PL
- BereqHeader X-Forwarded-For: 83.24.15.24, 162.158.172.138
- BereqHeader Host: xxx.xx
- BereqHeader Accept-Encoding: gzip
- BereqHeader X-Varnish: 1966465
- VCL_call BACKEND_FETCH
- VCL_return fetch
- BackendOpen 39 default 127.0.0.1 8080 127.0.0.1 39580 connect
- Timestamp Bereq: 1719331681.322694 0.000293 0.000293
- FetchError first byte timeout
- BackendClose 39 default close
- Timestamp Beresp: 1719331701.341864 20.019463 20.019170
- Timestamp Error: 1719331701.341869 20.019469 0.000005
- BerespProtocol HTTP/1.1
- BerespStatus 503
- BerespReason Backend fetch failed
- BerespHeader Date: 2024年6月25日 16:08:21 GMT
- BerespHeader Server: Varnish
- VCL_call BACKEND_ERROR
- BerespHeader content-type: text/html; charset=utf-8
- BerespHeader Retry-After: 5
- VCL_return deliver
- Storage malloc Transient
- Length 284
- BereqAcct 1627 0 1627 0 0 0
- End

I am looking for the cause of the errors:

Error 503 Backend fetch failed

Question 2

The current situation: timeouts

The connection timeout that is set in your backend (.connect_timeout = 10s;) is 10 seconds. However, the logs show that some of the health checks require more than 10 seconds to connect:

0 Backend_health - default Still healthy 4---Xr-- 8 5 10 0.000000 1.722273 "Poll error 110 (Connection timed out)"

The varnishlog output you presented contains the following lines:

- BackendOpen 39 default 127.0.0.1 8080 127.0.0.1 39580 connect
- Timestamp Bereq: 1719331681.322694 0.000293 0.000293
- FetchError first byte timeout
- BackendClose 39 default close
- Timestamp Beresp: 1719331701.341864 20.019463 20.019170
- Timestamp Error: 1719331701.341869 20.019469 0.000005

It shows client requests that require a connection to your backend that take more than 20 seconds. This causes the first byte timeout, because your backend definition has .first_byte_timeout = 20s; as the first byte timeout.

This means that your Magento setup requires more than 20 seconds to load that page. You'll agree that this is not normal.

Performance or scalability issue?

To learn why the Magento backend is so slow, we need to figure out whether this is a performance issue or a scalability issue.

Performance issue means that Magento is always slow, even when it's hardly receiving any traffic
Scalability issue means that Magento is currently overloaded by requests, which causes the slowdown and the timeouts.

You can run the following command on the server to figure out some timing information:

curl -s -I -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} \n" http://localhost:8080/ideus-03028-late-led-5m-300-ww-ip65-p-59027.html

If your Magento setup cannot handle the localhost hostname, you can add a -H"Host: xxx" host header to the curl request.

Checking the hit rate through varnishstat

Looking at https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#main-main-counters, you'll see some of the main counters that Varnish offers in its varnishstat program.

Please run varnishstat on the system and check the following metrics:

cache_hit
cache_hit_grace
cache_hitpass
cache_hitmiss
cache_miss
n_lru_nuked
s_pipe
s_pass
s_fetch

The values of these counters don't matter that much, but the increase of the values exposes certain trends on your system.

The s_fetch counter will display the amount of backend fetches that are taking place. If this counter increases quite fast, you know that the cache isn't serving a lot from the cache.

You can correlate this to the increase of counters like cache_miss, cache_hitmiss, cache_hitpass, s_pass. You can also use your web server logs to confirm these trends.

These counters should allow you to conclude whether your Magento is just receiving too many backend requests instead of serving the content from the cache.

If the n_lru_nuked counter goes up too much, it means your cache is full and Varnish needs to evict content from the cache to make space for new objects. In that case increasing the size of the cache can help.

The g_space counter that is part of the SMA counters will tell you how much space is left in the cache.

Digging deeper with Varnishlog

The fact that you shared varnishlog output means you're probably comfortable operating the tool.

If you have a feeling that certain requests should be served from the cache but are not, you can use varnishlog to uncover them and figure out why they aren't cached.

See https://www.varnish-software.com/developers/tutorials/troubleshooting-varnish/#varnish-is-not-caching for a tutorial on how to query varnishlog for these scenarios.

How to solve your problem

There is not turnkey solution here. What I'm trying to do is to give you the ammunition you need to inspect.

You'll either conclude that the hit rate is too low, and you'll have to optimize accordingly. This could mean making changes in Magento to make that content cacheable. It could mean making changes in the VCL. It could also mean increasing the size of the cache.

If the issue is a performance issue and not a scalability issue, you'll have to debug your Magento and figure out why it is taking so long. Unfortunately, I can only help you with the Varnish side of things.

Thijs Feryn 1,1245 silver badges8 bronze badges · Answer 1 · 2024-06-27 08:39:50Z

The current situation: timeouts

The connection timeout that is set in your backend (.connect_timeout = 10s;) is 10 seconds. However, the logs show that some of the health checks require more than 10 seconds to connect:

0 Backend_health - default Still healthy 4---Xr-- 8 5 10 0.000000 1.722273 "Poll error 110 (Connection timed out)"

The varnishlog output you presented contains the following lines:

- BackendOpen 39 default 127.0.0.1 8080 127.0.0.1 39580 connect
- Timestamp Bereq: 1719331681.322694 0.000293 0.000293
- FetchError first byte timeout
- BackendClose 39 default close
- Timestamp Beresp: 1719331701.341864 20.019463 20.019170
- Timestamp Error: 1719331701.341869 20.019469 0.000005

It shows client requests that require a connection to your backend that take more than 20 seconds. This causes the first byte timeout, because your backend definition has .first_byte_timeout = 20s; as the first byte timeout.

This means that your Magento setup requires more than 20 seconds to load that page. You'll agree that this is not normal.

Performance or scalability issue?

To learn why the Magento backend is so slow, we need to figure out whether this is a performance issue or a scalability issue.

Performance issue means that Magento is always slow, even when it's hardly receiving any traffic
Scalability issue means that Magento is currently overloaded by requests, which causes the slowdown and the timeouts.

You can run the following command on the server to figure out some timing information:

curl -s -I -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} \n" http://localhost:8080/ideus-03028-late-led-5m-300-ww-ip65-p-59027.html

If your Magento setup cannot handle the localhost hostname, you can add a -H"Host: xxx" host header to the curl request.

Checking the hit rate through varnishstat

Looking at https://varnish-cache.org/docs/trunk/reference/varnish-counters.html#main-main-counters, you'll see some of the main counters that Varnish offers in its varnishstat program.

Please run varnishstat on the system and check the following metrics:

cache_hit
cache_hit_grace
cache_hitpass
cache_hitmiss
cache_miss
n_lru_nuked
s_pipe
s_pass
s_fetch

The values of these counters don't matter that much, but the increase of the values exposes certain trends on your system.

The s_fetch counter will display the amount of backend fetches that are taking place. If this counter increases quite fast, you know that the cache isn't serving a lot from the cache.

You can correlate this to the increase of counters like cache_miss, cache_hitmiss, cache_hitpass, s_pass. You can also use your web server logs to confirm these trends.

These counters should allow you to conclude whether your Magento is just receiving too many backend requests instead of serving the content from the cache.

If the n_lru_nuked counter goes up too much, it means your cache is full and Varnish needs to evict content from the cache to make space for new objects. In that case increasing the size of the cache can help.

The g_space counter that is part of the SMA counters will tell you how much space is left in the cache.

Digging deeper with Varnishlog

The fact that you shared varnishlog output means you're probably comfortable operating the tool.

If you have a feeling that certain requests should be served from the cache but are not, you can use varnishlog to uncover them and figure out why they aren't cached.

See https://www.varnish-software.com/developers/tutorials/troubleshooting-varnish/#varnish-is-not-caching for a tutorial on how to query varnishlog for these scenarios.

How to solve your problem

There is not turnkey solution here. What I'm trying to do is to give you the ammunition you need to inspect.

You'll either conclude that the hit rate is too low, and you'll have to optimize accordingly. This could mean making changes in Magento to make that content cacheable. It could mean making changes in the VCL. It could also mean increasing the size of the cache.

If the issue is a performance issue and not a scalability issue, you'll have to debug your Magento and figure out why it is taking so long. Unfortunately, I can only help you with the Varnish side of things.

Stack Exchange Network

Poll error 110 (Connection timed out) - Error 503 Backend fetch failed

1 Answer 1

The current situation: timeouts

Performance or scalability issue?

Checking the hit rate through varnishstat

Digging deeper with Varnishlog

How to solve your problem

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Poll error 110 (Connection timed out) - Error 503 Backend fetch failed

1 Answer 1

The current situation: timeouts

Performance or scalability issue?

Checking the hit rate through varnishstat

Digging deeper with Varnishlog

How to solve your problem

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions