swift

2010年08月20日 00:42:38 +00:00

[DEFAULT]

2010年07月12日 17:03:45 -05:00

# bind_ip = 0.0.0.0

change default ports for servers Changing the recommended ports for Swift services from ports 6000-6002 to unused ports 6200-6202; so they do not conflict with X-Windows or other services. Updated SAIO docs. DocImpact Closes-Bug: #1521339 Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18

2016年02月01日 18:06:54 +00:00

bind_port = 6200

Adding keep_idle config value to socket User can cofigure KEEPIDLE time for sockets in TCP connection. The default value is the old value which is 600. Change-Id: Ib7fb166deb8a87ae4e97ba0671048b1ec079a2ef Closes-Bug:1759606

2018年09月14日 23:18:22 +02:00

# keep_idle = 600

Add config of server start timeouts for probetests Currently the timeout for a wsgi server successfully binding to a port and for a probetest background service to finish starting are hard coded to 30 seconds. While a reasonable default for most configurations, a small virtualized environment may need a little more time in order for probe tests to complete successfully. This patch adds a 'bind_timeout' option to the DEFAULT section of the main wsgi servers' config. Also a new [probe_test] section and 'check_server_timeout' option to test.conf DocImpact Change-Id: Ibcaff153c7633bbf32e460fd9dbf04932eddb56f

2012年11月26日 12:39:46 -08:00

# bind_timeout = 30

Moved backlog setting into the [Default] section of the sample-conf files

2010年10月13日 21:24:30 +00:00

# backlog = 4096

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# user = swift

Initial commit of middleware refactor

2010年08月20日 00:42:38 +00:00

# swift_dir = /etc/swift

# devices = /srv/node

# mount_check = true

Added ability to disable fallocate Change-Id: Id8872c581ed23378a8e14cbf3bf049b5c0d21577

2012年08月29日 19:57:26 +00:00

# disable_fallocate = false

Expiring Objects Support Please see the doc/source/overview_expiring_objects.rst for more detail. Change-Id: I4ab49e731248cf62ce10001016e0c819cc531738

2011年10月26日 21:42:24 +00:00

# expiring_objects_container_divisor = 86400

Make .expiring_objects account name configurable The account which tracks objects scheduled for deletion had its name hard-coded to 'expiring_objects'. This is made configurable via expiring_objects_account_name option. Backend file-systems integration efforts may want to treat these "special" accounts in a different way. This would still go undocumented, hence 'pseudo-hidden'. UpgradeImpact: None as the default value would continue to be the same which is '.expiring_objects'. Change-Id: I1a093b0d0e2bdd0c3d723090af03fc0adf2ad7e3 Signed-off-by: Prashanth Pai <ppai@redhat.com>

2014年02月04日 16:31:47 +05:30

# expiring_objects_account_name = expiring_objects

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Set default wsgi workers to cpu_count Change the default value of wsgi workers from 1 to auto. The new default value for workers in the proxy, container, account & object wsgi servers will spawn as many workers per process as you have cpu cores. This will not be ideal for some configurations, but it's much more likely to produce a successful out of the box deployment. Inspect the number of cpu_cores using python's multiprocessing when available. Multiprocessing was added in python 2.6, but I know I've compiled python without it before on accident. The cpu_count method seems to be pretty system agnostic, but it says it can raise NotImplementedError or sometimes return 0. Add a new utility method 'config_auto_int_value' to pull an integer out of the config which has a dynamic default. * drive by s/container/proxy/ in proxy-server.conf.5 * fix misplaced max_clients in *-server.conf-sample * update doc/development_saio to force workers = 1 DocImpact Change-Id: Ifa563d22952c902ab8cbe1d339ba385413c54e95

2013年07月11日 17:00:57 -07:00

# Use an integer to override the number of pre-forked processes that will

Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). Immediately stopping all swift-object-worker processes still works by sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process still causes the parent process to close all listen sockets and exit, allowing existing children to finish serving their existing requests. The drop_privileges helper function now has an optional param to suppress the setsid() call, which otherwise screws up the child workers' process management. The class method RingData.load() can be told to only load the ring metadata (i.e. everything except replica2part2dev_id) with the optional kwarg, header_only=True. This is used to keep the parent and all forked off workers from unnecessarily having full copies of all storage policy rings in memory. A new helper class, swift.common.storage_policy.BindPortsCache, provides a method to return a set of all device ports in all rings for the server on which it is instantiated (identified by its set of IP addresses). The BindPortsCache instance will track mtimes of ring files, so they are not opened more frequently than necessary. This patch includes enhancements to the probe tests and object-replicator/object-reconstructor config plumbing to allow the probe tests to work correctly both in the "normal" config (same IP but unique ports for each SAIO "server") and a server-per-port setup where each SAIO "server" must have a unique IP address and unique port per disk within each "server". The main probe tests only work with 4 servers and 4 disks, but you can see the difference in the rings for the EC probe tests where there are 2 disks per server for a total of 8 disks. Specifically, swift.common.ring.utils.is_local_device() will ignore the ports when the "my_port" argument is None. Then, object-replicator and object-reconstructor both set self.bind_port to None if server_per_port is enabled. Bonus improvement for IPv6 addresses in is_local_device(). This PR for vagrant-swift-all-in-one will aid in testing this patch: https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/ Also allow SAIO to answer is_local_device() better; common SAIO setups have multiple "servers" all on the same host with different ports for the different "servers" (which happen to match the IPs specified in the rings for the devices on each of those "servers"). However, you can configure the SAIO to have different localhost IP addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the servers' config files' bind_ip setting. This new whataremyips() implementation combined with a little plumbing allows is_local_device() to accurately answer, even on an SAIO. In the default case (an unspecified bind_ip defaults to '0.0.0.0') as well as an explict "bind to everything" like '0.0.0.0' or '::', whataremyips() behaves as it always has, returning all IP addresses for the server. Also updated probe tests to handle each "server" in the SAIO having a unique IP address. For some (noisy) benchmarks that show servers_per_port=X is at least as good as the same number of "normal" workers: https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md Benchmarks showing the benefits of I/O isolation with a small number of slow disks: https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md If you were wondering what the overhead of threads_per_disk looks like: https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md DocImpact Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6

2015年05月14日 22:14:15 -07:00

# accept connections. NOTE: if servers_per_port is set, this setting is

# ignored.

Set default wsgi workers to cpu_count Change the default value of wsgi workers from 1 to auto. The new default value for workers in the proxy, container, account & object wsgi servers will spawn as many workers per process as you have cpu cores. This will not be ideal for some configurations, but it's much more likely to produce a successful out of the box deployment. Inspect the number of cpu_cores using python's multiprocessing when available. Multiprocessing was added in python 2.6, but I know I've compiled python without it before on accident. The cpu_count method seems to be pretty system agnostic, but it says it can raise NotImplementedError or sometimes return 0. Add a new utility method 'config_auto_int_value' to pull an integer out of the config which has a dynamic default. * drive by s/container/proxy/ in proxy-server.conf.5 * fix misplaced max_clients in *-server.conf-sample * update doc/development_saio to force workers = 1 DocImpact Change-Id: Ifa563d22952c902ab8cbe1d339ba385413c54e95

2013年07月11日 17:00:57 -07:00

# workers = auto

#

Remove threads_per_disk setting This patch removes the threads_per_disk setting. It was already a deprecated setting and by default set to 0, which effectively meant to not use a per-disk thread pool at all. Users are encouraged to use servers_per_port instead. DocImpact Change-Id: Ie76be5c8a74d60a1330627caace19e06d1b9383c

2016年04月28日 12:06:24 -05:00

# Make object-server run this many worker processes per unique port of "local"

# ring devices across all storage policies. The default value of 0 disables this

# feature.

Allow 1+ object-servers-per-disk deployment Enabled by a new > 0 integer config value, "servers_per_port" in the [DEFAULT] config section for object-server and/or replication server configs. The setting's integer value determines how many different object-server workers handle requests for any single unique local port in the ring. In this mode, the parent swift-object-server process continues to run as the original user (i.e. root if low-port binding is required), binds to all ports as defined in the ring, and forks off the specified number of workers per listen socket. The child, per-port servers drop privileges and behave pretty much how object-server workers always have, except that because the ring has unique ports per disk, the object-servers will only be handling requests for a single disk. The parent process detects dead servers and restarts them (with the correct listen socket), starts missing servers when an updated ring file is found with a device on the server with a new port, and kills extraneous servers when their port is found to no longer be in the ring. The ring files are stat'ed at most every "ring_check_interval" seconds, as configured in the object-server config (same default of 15s). Immediately stopping all swift-object-worker processes still works by sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process still causes the parent process to close all listen sockets and exit, allowing existing children to finish serving their existing requests. The drop_privileges helper function now has an optional param to suppress the setsid() call, which otherwise screws up the child workers' process management. The class method RingData.load() can be told to only load the ring metadata (i.e. everything except replica2part2dev_id) with the optional kwarg, header_only=True. This is used to keep the parent and all forked off workers from unnecessarily having full copies of all storage policy rings in memory. A new helper class, swift.common.storage_policy.BindPortsCache, provides a method to return a set of all device ports in all rings for the server on which it is instantiated (identified by its set of IP addresses). The BindPortsCache instance will track mtimes of ring files, so they are not opened more frequently than necessary. This patch includes enhancements to the probe tests and object-replicator/object-reconstructor config plumbing to allow the probe tests to work correctly both in the "normal" config (same IP but unique ports for each SAIO "server") and a server-per-port setup where each SAIO "server" must have a unique IP address and unique port per disk within each "server". The main probe tests only work with 4 servers and 4 disks, but you can see the difference in the rings for the EC probe tests where there are 2 disks per server for a total of 8 disks. Specifically, swift.common.ring.utils.is_local_device() will ignore the ports when the "my_port" argument is None. Then, object-replicator and object-reconstructor both set self.bind_port to None if server_per_port is enabled. Bonus improvement for IPv6 addresses in is_local_device(). This PR for vagrant-swift-all-in-one will aid in testing this patch: https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/ Also allow SAIO to answer is_local_device() better; common SAIO setups have multiple "servers" all on the same host with different ports for the different "servers" (which happen to match the IPs specified in the rings for the devices on each of those "servers"). However, you can configure the SAIO to have different localhost IP addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the servers' config files' bind_ip setting. This new whataremyips() implementation combined with a little plumbing allows is_local_device() to accurately answer, even on an SAIO. In the default case (an unspecified bind_ip defaults to '0.0.0.0') as well as an explict "bind to everything" like '0.0.0.0' or '::', whataremyips() behaves as it always has, returning all IP addresses for the server. Also updated probe tests to handle each "server" in the SAIO having a unique IP address. For some (noisy) benchmarks that show servers_per_port=X is at least as good as the same number of "normal" workers: https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md Benchmarks showing the benefits of I/O isolation with a small number of slow disks: https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md If you were wondering what the overhead of threads_per_disk looks like: https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md DocImpact Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6

2015年05月14日 22:14:15 -07:00

# servers_per_port = 0

#

Set default wsgi workers to cpu_count Change the default value of wsgi workers from 1 to auto. The new default value for workers in the proxy, container, account & object wsgi servers will spawn as many workers per process as you have cpu cores. This will not be ideal for some configurations, but it's much more likely to produce a successful out of the box deployment. Inspect the number of cpu_cores using python's multiprocessing when available. Multiprocessing was added in python 2.6, but I know I've compiled python without it before on accident. The cpu_count method seems to be pretty system agnostic, but it says it can raise NotImplementedError or sometimes return 0. Add a new utility method 'config_auto_int_value' to pull an integer out of the config which has a dynamic default. * drive by s/container/proxy/ in proxy-server.conf.5 * fix misplaced max_clients in *-server.conf-sample * update doc/development_saio to force workers = 1 DocImpact Change-Id: Ifa563d22952c902ab8cbe1d339ba385413c54e95

2013年07月11日 17:00:57 -07:00

# Maximum concurrent requests per worker

# max_clients = 1024

#

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# You can specify default log routing here if you want:

# log_name = swift

# log_facility = LOG_LOCAL0

# log_level = INFO

Patch for Swift Solaris (Illumos) compability. * Add new configuration option log_address. Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032

2012年05月17日 15:46:38 -07:00

# log_address = /dev/log

New log_max_line_length option. Log lines can get quite large, as we previously noticed with rsync error log lines. We added a setting to cap those, but it really looks like we should have just done this overall limit. We noticed the issue when we switched to UDP syslogging and it would occasionally blow past the 16436 lo MTU! This causes Python's logging code to get an error and hilarity ensues. Change-Id: I44bdbe68babd58da58c14360379e8fef8a6b75f7

2014年05月22日 19:37:53 +00:00

# The following caps the length of log lines to the value given; no limit if

# set to 0, the default.

# log_max_line_length = 0

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Make log format for requests configurable Add the log_msg_template option in proxy-server.conf and log_format in a/c/o-server.conf. It is a string parsable by Python's format() function. Some fields containing user data might be anonymized by using log_anonymization_method and log_anonymization_salt. Change-Id: I29e30ef45fe3f8a026e7897127ffae08a6a80cd9

2018年03月01日 11:31:12 +01:00

# Hashing algorithm for log anonymization. Must be one of algorithms supported

# by Python's hashlib.

# log_anonymization_method = MD5

#

# Salt added during log anonymization

# log_anonymization_salt =

#

# Template used to format logs. All words surrounded by curly brackets

# will be substituted with the appropriate values

# log_format = {remote_addr} - - [{time.d}/{time.b}/{time.Y}:{time.H}:{time.M}:{time.S} +0000] "{method} {path}" {status} {content_length} "{referer}" "{txn_id}" "{user_agent}" {trans_time:.4f} "{additional_info}" {pid} {policy_index}

#

add support for custom log handlers Add a hook to get_logger to run custom functions to add custom log handlers or the like. Change-Id: Ib04b12939dcac7e4ad6453dea9795682044c6ae0

2012年10月05日 15:56:34 -05:00

# comma separated list of functions to call to setup custom log handlers.

# functions get passed: conf, name, log_to_console, log_route, fmt, logger,

# adapted_logger

# log_custom_handlers =

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Upating proxy-server StatsD logging. Removed many StatsD logging calls in proxy-server and added swift-informant-style catch-all logging in the proxy-logger middleware. Many errors previously rolled into the "proxy-server.<type>.errors" counter will now appear broken down by response code and with timing data at: "proxy-server.<type>.<verb>.<status>.timing". Also, bytes transferred (sum of in + out) will be at: "proxy-server.<type>.<verb>.<status>.xfer". The proxy-logging middleware can get its StatsD config from standard vars in [DEFAULT] or from access_log_statsd_* config vars in its config section. Similarly to Swift Informant, request methods ("verbs") are filtered using the new proxy-logging config var, "log_statsd_valid_http_methods" which defaults to GET, HEAD, POST, PUT, DELETE, and COPY. Requests with methods not in this list use "BAD_METHOD" for <verb> in the metric name. To avoid user error, access_log_statsd_valid_http_methods is also accepted. Previously, proxy-server metrics used "Account", "Container", and "Object" for the <type>, but these are now all lowercase. Updated the admin guide's StatsD docs to reflect the above changes and also include the "proxy-server.<type>.handoff_count" and "proxy-server.<type>.handoff_all_count" metrics. The proxy server now saves off the original req.method and proxy_logging will use this if it can (both for request logging and as the "<verb>" in the statsd timing metric). This fixes bug 1025433. Removed some stale access_log_* related code in proxy/server.py. Also removed the BaseApplication/Application distinction as it's no longer necessary. Fixed up the sample config files a bit (logging lines, mostly). Fixed typo in SAIO development guide. Got proxy_logging.py test coverage to 100%. Fixed proxy_logging.py for PEP8 v1.3.2. Enhanced test.unit.FakeLogger to track more calls to enable testing StatsD metric calls. Change-Id: I45d94cb76450be96d66fcfab56359bdfdc3a2576

2012年08月19日 17:44:43 -07:00

# If set, log_udp_host will override log_address

# log_udp_host =

# log_udp_port = 514

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Upating proxy-server StatsD logging. Removed many StatsD logging calls in proxy-server and added swift-informant-style catch-all logging in the proxy-logger middleware. Many errors previously rolled into the "proxy-server.<type>.errors" counter will now appear broken down by response code and with timing data at: "proxy-server.<type>.<verb>.<status>.timing". Also, bytes transferred (sum of in + out) will be at: "proxy-server.<type>.<verb>.<status>.xfer". The proxy-logging middleware can get its StatsD config from standard vars in [DEFAULT] or from access_log_statsd_* config vars in its config section. Similarly to Swift Informant, request methods ("verbs") are filtered using the new proxy-logging config var, "log_statsd_valid_http_methods" which defaults to GET, HEAD, POST, PUT, DELETE, and COPY. Requests with methods not in this list use "BAD_METHOD" for <verb> in the metric name. To avoid user error, access_log_statsd_valid_http_methods is also accepted. Previously, proxy-server metrics used "Account", "Container", and "Object" for the <type>, but these are now all lowercase. Updated the admin guide's StatsD docs to reflect the above changes and also include the "proxy-server.<type>.handoff_count" and "proxy-server.<type>.handoff_all_count" metrics. The proxy server now saves off the original req.method and proxy_logging will use this if it can (both for request logging and as the "<verb>" in the statsd timing metric). This fixes bug 1025433. Removed some stale access_log_* related code in proxy/server.py. Also removed the BaseApplication/Application distinction as it's no longer necessary. Fixed up the sample config files a bit (logging lines, mostly). Fixed typo in SAIO development guide. Got proxy_logging.py test coverage to 100%. Fixed proxy_logging.py for PEP8 v1.3.2. Enhanced test.unit.FakeLogger to track more calls to enable testing StatsD metric calls. Change-Id: I45d94cb76450be96d66fcfab56359bdfdc3a2576

2012年08月19日 17:44:43 -07:00

# You can enable StatsD logging here:

Removed default value for log_statsd_host Multiple files and documents showed that log_statsd_host had a default value, usually localhost. This was incorrect, instead setting a value for log_statsd_host enables statsd logging. Removed any reference of log_statsd_host having a default value. Also changed descriptions to show setting a value enables logging. Change-Id: I3ca5c0e8b8e4981de3aa6db0c476072b5a59723d Closes-Bug: #1542227

2016年02月10日 10:36:59 -06:00

# log_statsd_host =

Adding StatsD logging to Swift. Documentation, including a list of metrics reported and their semantics, is in the Admin Guide in a new section, "Reporting Metrics to StatsD". An optional "metric prefix" may be configured which will be prepended to every metric name sent to StatsD. Here is the rationale for doing a deep integration like this versus only sending metrics to StatsD in middleware. It's the only way to report some internal activities of Swift in a real-time manner. So to have one way of reporting to StatsD and one place/style of configuration, even some things (like, say, timing of PUT requests into the proxy-server) which could be logged via middleware are consistently logged the same way (deep integration via the logger delegate methods). When log_statsd_host is configured, get_logger() injects a swift.common.utils.StatsdClient object into the logger as logger.statsd_client. Then a set of delegate methods on LogAdapter either pass through to the StatsdClient object or become no-ops. This allows StatsD logging to look like: self.logger.increment('some.metric.here') and do the right thing in all cases and with no messy conditional logic. I wanted to use the pystatsd module for the StatsD client, but the version on PyPi is lagging the git repo (and is missing both the prefix functionality and timing_since() method). So I wrote my swift.common.utils.StatsdClient. The interface is the same as pystatsd.Client, but the code was written from scratch. It's pretty simple, and the tests I added cover it. This also frees Swift from an optional dependency on the pystatsd module, making this feature easier to enable. There's test coverage for the new code and all existing tests continue to pass. Refactored out _one_audit_pass() method in swift/account/auditor.py and swift/container/auditor.py. Fixed some misc. PEP8 violations. Misc test cleanups and refactorings (particularly the way "fake logging" is handled). Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2

2012年04月01日 16:47:08 -07:00

# log_statsd_port = 8125

Make statsd sample rate behave better. As Dieter pointed out in bug 1090495 (https://bugs.launchpad.net/swift/+bug/1090495), the volume of metrics can vary wildly between StatsD metrics. This patch implements a partial solution by reducing the sample_rate used for known high-volume metrics (operational experience will need to inform this over time) and introducing a new tunable, log_statsd_sample_rate_factor which is multiplied by the sample_rate for every statsd stat. This tunable can be used to reduce StatsD traffic proportionally for all metrics and is intended to replace log_statsd_default_sample_rate, which is left alone for backward-compatibility, should anyone be using it. This patch also includes a drive-by fix for log_udp_port which wasn't being converted to an int (I didn't verify that actually causes trouble in SysLogHandler(), but it's definitely an improvement regardles). Change-Id: Id404636e3629f6431cf1c4e64a143959750a3c23

2013年01月19日 15:25:27 -08:00

# log_statsd_default_sample_rate = 1.0

# log_statsd_sample_rate_factor = 1.0

Adding StatsD logging to Swift. Documentation, including a list of metrics reported and their semantics, is in the Admin Guide in a new section, "Reporting Metrics to StatsD". An optional "metric prefix" may be configured which will be prepended to every metric name sent to StatsD. Here is the rationale for doing a deep integration like this versus only sending metrics to StatsD in middleware. It's the only way to report some internal activities of Swift in a real-time manner. So to have one way of reporting to StatsD and one place/style of configuration, even some things (like, say, timing of PUT requests into the proxy-server) which could be logged via middleware are consistently logged the same way (deep integration via the logger delegate methods). When log_statsd_host is configured, get_logger() injects a swift.common.utils.StatsdClient object into the logger as logger.statsd_client. Then a set of delegate methods on LogAdapter either pass through to the StatsdClient object or become no-ops. This allows StatsD logging to look like: self.logger.increment('some.metric.here') and do the right thing in all cases and with no messy conditional logic. I wanted to use the pystatsd module for the StatsD client, but the version on PyPi is lagging the git repo (and is missing both the prefix functionality and timing_since() method). So I wrote my swift.common.utils.StatsdClient. The interface is the same as pystatsd.Client, but the code was written from scratch. It's pretty simple, and the tests I added cover it. This also frees Swift from an optional dependency on the pystatsd module, making this feature easier to enable. There's test coverage for the new code and all existing tests continue to pass. Refactored out _one_audit_pass() method in swift/account/auditor.py and swift/container/auditor.py. Fixed some misc. PEP8 violations. Misc test cleanups and refactorings (particularly the way "fake logging" is handled). Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2

2012年04月01日 16:47:08 -07:00

# log_statsd_metric_prefix =

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Add config option to turn eventlet debug on/off By default, this will be turned off. This will cause eventlet to not print stack traces to stderr which can be very annoying on production systems. It is still recommended to turn it on for development or debuging purposes. DocImpact Change-Id: I5e5b902d3d9ed85f784549e53f2ee2fc87cbe2e5

2012年12月06日 15:09:53 -06:00

# eventlet_debug = false

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Allow fallocate_reserve to be a percentage Add the ability to set the fallocate_reserve value as a percentage. This happens automatically when adding the '%' at the end of the value. Having the ability to set a % of free space rather than a byte value is useful especially when drive sizes are heterogenous. The default for fallocate_reserve has been adjusted to 1%, having the fallocate_reserve set seems sensible for all deploys and percentages are far safer to default than byte values (across drives of any size). Tests added for using fallocate_reserve as a percentage. Duplicate tests for fallocate_reserve have been removed. Docs updated to reflect the fallocate_reserve change. Change-Id: I4aea613a708205c917e81d6b2861396655e73238

2016年03月03日 11:14:39 +00:00

# You can set fallocate_reserve to the number of bytes or percentage of disk

# space you'd like fallocate to reserve, whether there is space for the given

# file size or not. Percentage will be used if the value ends with a '%'.

# fallocate_reserve = 1%

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

#

# Time to wait while attempting to connect to another backend node.

# conn_timeout = 0.5

# Time to wait while sending each chunk of data to another backend node.

# node_timeout = 3

do container listing updates in another (green)thread The actual server-side changes are simple. The tests are a different matter. Many changes were needed to the object server tests to handle the now-async calls to the container server. In an effort to test this properly, some drive-by changes were made to improve tests. I tested this patch by doing zero-byte object writes to one container as fast as possible. Then I did it again while also saturating 2 of the container replica's disks. The results are linked below. https://gist.github.com/notmyname/2bb85acfd8fbc7fc312a DocImpact Change-Id: I737bd0af3f124a4ce3e0862a155e97c1f0ac3e52

2015年05月23日 15:40:03 -07:00

# Time to wait while sending a container update on object update.

# container_update_timeout = 1.0

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# Time to wait while receiving each chunk of data from a client or another

# backend node.

Use socket_timeout kwarg instead of useless eventlet.wsgi.WRITE_TIMEOUT No version of eventlet that I'm aware of hasany sort of support for eventlet.wsgi.WRITE_TIMEOUT; I don't know why we've been setting that. On the other hand, the socket_timeout argument for eventlet.wsgi.Server has been supported for a while -- since 0.14 in 2013. Drive-by: Fix up handling of sub-second client_timeouts. Change-Id: I1dca3c3a51a83c9d5212ee5a0ad2ba1343c68cf9 Related-Change: I1d4d028ac5e864084a9b7537b140229cb235c7a3 Related-Change: I433c97df99193ec31c863038b9b6fd20bb3705b8

2020年11月11日 14:18:13 -08:00

# client_timeout = 60.0

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

#

# network_chunk_size = 65536

# disk_chunk_size = 65536

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

#

Move documented reclaim_age option to correct location The reclaim_age is a DiskFile option, it doesn't make sense for two different object services or nodes to use different values. I also driveby cleanup the reclaim_age plumbing from get_hashes to cleanup_ondisk_files since it's a method on the Manager and has access to the configured reclaim_age. This fixes a bug where finalize_put wouldn't use the [DEFAULT]/object-server configured reclaim_age - which is normally benign but leads to weird behavior on DELETE requests with really small reclaim_age. There's a couple of places in the replicator and reconstructor that reach into their manager to borrow the reclaim_age when emptying out the aborted PUTs that failed to cleanup their files in tmp - but that timeout doesn't really need to be coupled with reclaim_age and that method could have just as reasonably been implemented on the Manager. UpgradeImpact: Previously the reclaim_age was documented to be configurable in various object-* services config sections, but that did not work correctly unless you also configured the option for the object-server because of REPLICATE request rehash cleanup. All object services must use the same reclaim_age. If you require a non-default reclaim age it should be set in the [DEFAULT] section. If there are different non-default values, the greater should be used for all object services and configured only in the [DEFAULT] section. If you specify a reclaim_age value in any object related config you should move it to *only* the [DEFAULT] section before you upgrade. If you configure a reclaim_age less that your consistency window you are likely to be eaten by a Grue. Closes-Bug: #1626296 Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>

2016年07月25日 20:10:44 +05:30

# Reclamation of tombstone files is performed primarily by the replicator and

# the reconstructor but the object-server and object-auditor also reference

# this value - it should be the same for all object services in the cluster,

# and not greater than the container services reclaim_age

# reclaim_age = 604800

#

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are IOPRIO_CLASS_RT (realtime), IOPRIO_CLASS_BE (best-effort) and

# IOPRIO_CLASS_IDLE (idle). I/O niceness priority is a number which goes from

# 0 to 7. The higher the value, the lower the I/O priority of the process.

# Work only with ionice_class.

# ionice_class =

# ionice_priority =

Initial commit of middleware refactor

2010年08月20日 00:42:38 +00:00

[pipeline:main]

Allow optional, temporary healthcheck failure. A deployer may want to remove a Swift node from a load balancer for maintenance or upgrade. This patch provides an optional mechanism for this. The healthcheck filter config can specify "disable_path" which is a filesystem path. If a file is present at that location, the healthcheck middleware returns a 503 with a body of "DISABLED BY FILE". So a deployer can configure "disable_path" and then touch that filesystem path, wait for the proxy to be removed from the load balancer pool, perform maintenance/upgrade, and then remove the "disable_path" file. Also cleaned up the conf file man pages a bit. Change-Id: I1759c78c74910a54c720f298d4d8e6fa57a4dab4

2012年12月03日 16:05:44 -08:00

pipeline = healthcheck recon object-server

Initial commit of middleware refactor

2010年08月20日 00:42:38 +00:00

[app:object-server]

use = egg:swift#object

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# You can override the default log routing for this app here:

# set log_name = object-server

# set log_facility = LOG_LOCAL0

# set log_level = INFO

Unified format of boolean params in conf files In swift conf files, boolean options use different format: some use true/false, and some use True/False. This patch is aim to using lowcase true/false to unify boolean params formats in swift conf files. Fix Bug #1203421 Change-Id: I3e1bfc6e43231f51e0710aa54869f3774ee896b1

2013年07月21日 12:18:24 +08:00

# set log_requests = true

Patch for Swift Solaris (Illumos) compability. * Add new configuration option log_address. Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032

2012年05月17日 15:46:38 -07:00

# set log_address = /dev/log

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# max_upload_time = 86400

Document slow option in etc/object-server.conf Change-Id: Ic9940b0b830a468887878f7b0d7ca42c2cbbebd5

2016年02月02日 09:38:55 +01:00

#

# slow is the total amount of seconds an object PUT/DELETE request takes at

# least. If it is faster, the object server will sleep this amount of time minus

Document use-case for slow option Change-Id: Iec4087a896a2277179e3720d802cca101fa7ad54

2016年02月02日 11:44:39 -08:00

# the already passed transaction time. This is only useful for simulating slow

# devices on storage nodes during testing and development.

bug 661267 adding config eastereggs, fixing defaults Change-Id: I41356ee250c9088a2387b0d493586dd990a04ac3

2012年04月28日 16:31:00 +10:00

# slow = 0

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Make object server's caching more configurable. The object server had a constant KEEP_CACHE_SIZE = 5*1024*1024; unauthenticated GET requests for files smaller than KEEP_CACHE_SIZE would not evict the file from the kernel's buffer cache after it was read from disk. Now that hardcoded constant is a configuration parameter ("keep_cache_size"), and now there is also another parameter called "keep_cache_private". If set, then both authenticated and unauthenticated GET requests for small files will not evict the data from the buffer cache. The default values are 5 MiB and False, respectively, so the default behavior is the same. Bonus: the "mb_per_sync" parameter is now documented in the deployment guide. Change-Id: I9a11dbe861f4c23535c6aa82a9111a6fe2db2a59

2012年06月07日 16:39:56 -07:00

# Objects smaller than this are not evicted from the buffercache once read

fix example typo 5 * 1024 * 1024 = 5242880 Change-Id: I0eeb6e2d9fbd79103cd8c658627344f73fed9498

2014年11月20日 11:38:49 +09:00

# keep_cache_size = 5242880

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Make object server's caching more configurable. The object server had a constant KEEP_CACHE_SIZE = 5*1024*1024; unauthenticated GET requests for files smaller than KEEP_CACHE_SIZE would not evict the file from the kernel's buffer cache after it was read from disk. Now that hardcoded constant is a configuration parameter ("keep_cache_size"), and now there is also another parameter called "keep_cache_private". If set, then both authenticated and unauthenticated GET requests for small files will not evict the data from the buffer cache. The default values are 5 MiB and False, respectively, so the default behavior is the same. Bonus: the "mb_per_sync" parameter is now documented in the deployment guide. Change-Id: I9a11dbe861f4c23535c6aa82a9111a6fe2db2a59

2012年06月07日 16:39:56 -07:00

# If true, objects for authenticated GET requests may be kept in buffer cache

# if small enough

Unified format of boolean params in conf files In swift conf files, boolean options use different format: some use true/false, and some use True/False. This patch is aim to using lowcase true/false to unify boolean params formats in swift conf files. Fix Bug #1203421 Change-Id: I3e1bfc6e43231f51e0710aa54869f3774ee896b1

2013年07月21日 12:18:24 +08:00

# keep_cache_private = false

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

sample conf update

2010年10月13日 21:29:58 +00:00

# on PUTs, sync data every n MB

# mb_per_sync = 512

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

objects can now have arbitrary headers set in metadata that will be served back when they are fetched

2011年03月22日 20:05:44 -05:00

# Comma separated list of headers that can be set in metadata on an object.

# This list is in addition to X-Object-Meta-* headers and cannot include

# Content-Type, etag, Content-Length, or deleted

Add s3api headers to allowed_headers by default Previously, these headers had to be added by operators to their object-server.conf when enabling swift3 middleware. Since s3api is now imported into swift we should go ahead and add these headers by default too. Change-Id: Ib82e175096716e42aecdab48f01f079e09da6a1d Signed-off-by: Thiago da Silva <thiago@redhat.com>

2018年05月29日 16:00:03 -04:00

# allowed_headers = Content-Disposition, Content-Encoding, X-Delete-At, X-Object-Manifest, X-Static-Large-Object, Cache-Control, Content-Language, Expires, X-Robots-Tag

Make eventlet.tpool's thread count configurable in object server If you're running servers_per_port > 0 and threads_per_disk = 0 (as it should be with servers_per_port on), each object-server process will have 20 IO threads waiting around to service eventlet.tpool calls. This is far too many; with servers_per_port, there's no real benefit to having so many IO threads. This commit makes it so that, when servers_per_port > 0, each object server defaults to having one main thread and one IO thread. Also, eventlet's tpool size is now configurable via the object-server config file. If a tpool size is set, that's what we'll use regardless of servers_per_port. This allows operators with an excess of threads to remove some regardless of servers_per_port. Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027 Closes-Bug: 1554233

2016年03月07日 18:18:35 -08:00

# The number of threads in eventlet's thread pool. Most IO will occur

# in the object server's main thread, but certain "heavy" IO

# operations will occur in separate IO threads, managed by eventlet.

#

[Trivialfix]Fix typos in swift Fix typos that found in swift. Change-Id: I52fad1a4882cec4456f22174b46d54e42ec66d97

2017年08月04日 00:23:36 -07:00

# The default value is auto, whose actual value is dependent on the

Make eventlet.tpool's thread count configurable in object server If you're running servers_per_port > 0 and threads_per_disk = 0 (as it should be with servers_per_port on), each object-server process will have 20 IO threads waiting around to service eventlet.tpool calls. This is far too many; with servers_per_port, there's no real benefit to having so many IO threads. This commit makes it so that, when servers_per_port > 0, each object server defaults to having one main thread and one IO thread. Also, eventlet's tpool size is now configurable via the object-server config file. If a tpool size is set, that's what we'll use regardless of servers_per_port. This allows operators with an excess of threads to remove some regardless of servers_per_port. Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027 Closes-Bug: 1554233

2016年03月07日 18:18:35 -08:00

# servers_per_port value:

#

# - When servers_per_port is zero, the default value of

# eventlet_tpool_num_threads is empty, which uses eventlet's default

# (currently 20 threads).

#

# - When servers_per_port is nonzero, the default value of

# eventlet_tpool_num_threads is 1.

#

# But you may override this value to any integer value.

#

# Note that this value is threads per object-server process, so to

# compute the total number of IO threads on a node, you must multiply

# this by the number of object-server processes on the node.

#

# eventlet_tpool_num_threads = auto

Allow replication servers to handle all request methods Previously, the replication_server setting could take one of three states: * If unspecified, the server would handle all available methods. * If "true", "yes", "on", etc. it would only handle replication methods (REPLICATE, SSYNC). * If any other value (including blank), it would only handle non-replication methods. However, because SSYNC tunnels PUTs, POSTs, and DELETEs through the same object-server app that's responding to SSYNC, setting `replication_server = true` would break the protocol. This has been the case ever since ssync was introduced. Now, get rid of that second state -- operators can still set `replication_server = false` as a principle-of-least-privilege guard to ensure proxy-servers can't make replication requests, but replication servers will be able to serve all traffic. This will allow replication servers to be used as general internal-to-the-cluster endpoints, leaving non-replication servers to handle client-driven traffic. Closes-Bug: #1446873 Change-Id: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87

2020年07月07日 21:28:36 -07:00

# You can disable REPLICATE and SSYNC handling (default is to allow it). When

# deploying a cluster with a separate replication network, you'll want multiple

# object-server processes running: one for client-driven traffic and another

# for replication traffic. The server handling client-driven traffic may set

# this to false. If there is only one object-server process, leave this as

# true.

# replication_server = true

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

#

2016年03月10日 06:42:57 -08:00

# Set to restrict the number of concurrent incoming SSYNC requests

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# Set to 0 for unlimited

2016年03月10日 06:42:57 -08:00

# Note that SSYNC requests are only used by the object reconstructor or the

# object replicator when configured to use ssync.

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# replication_concurrency = 4

#

Replace replication_one_per_device by custom count This commit replaces boolean replication_one_per_device by an integer replication_concurrency_per_device. The new configuration parameter is passed to utils.lock_path() which now accept as an argument a limit for the number of locks that can be acquired for a specific path. Instead of trying to lock path/.lock, utils.lock_path() now tries to lock files path/.lock-X, where X is in the range (0, N), N being the limit for the number of locks allowed for the path. The default value of limit is set to 1. Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572

2016年10月26日 10:53:46 +02:00

# Set to restrict the number of concurrent incoming SSYNC requests per

# device; set to 0 for unlimited requests per device. This can help control

# I/O to each device. This does not override replication_concurrency described

# above, so you may need to adjust both parameters depending on your hardware

# or network capacity.

# replication_concurrency_per_device = 1

Per device replication_lock New replication_one_per_device (True by default) that restricts incoming REPLICATION requests to one per device, replication_currency allowing. Also has replication_lock_timeout (15 by default) to control how long a request will wait to obtain a replication device lock before giving up. This should be very useful in that you can be assured any concurrent REPLICATION requests are each writing to distinct devices. If you have 100 devices on a server, you can set replication_concurrency to 100 and be confident that, even if 100 replication requests were executing concurrently, they'd each be writing to separate devices. Before, all 100 could end up writing to the same device, bringing it to a horrible crawl. NOTE: This is only for ssync replication. The current default rsync replication still has the potentially horrible behavior. Change-Id: I36e99a3d7e100699c76db6d3a4846514537ff685

2013年11月09日 03:18:11 +00:00

#

# Number of seconds to wait for an existing replication device lock before

# giving up.

# replication_lock_timeout = 15

#

2016年03月10日 06:42:57 -08:00

# These next two settings control when the SSYNC subrequest handler will

# abort an incoming SSYNC attempt. An abort will occur if there are at

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# least threshold number of failures and the value of failures / successes

# exceeds the ratio. The defaults of 100 and 1.0 means that at least 100

# failures have to occur and there have to be more failures than successes for

# an abort to occur.

# replication_failure_threshold = 100

# replication_failure_ratio = 1.0

Zero-copy object-server GET responses with splice() This commit lets the object server use splice() and tee() to move data from disk to the network without ever copying it into user space. Requires Linux. Sorry, FreeBSD folks. You still have the old mechanism, as does anyone who doesn't want to use splice. This requires a relatively recent kernel (2.6.38+) to work, which includes the two most recent Ubuntu LTS releases (Precise and Trusty) as well as RHEL 7. However, it excludes Lucid and RHEL 6. On those systems, setting "splice = on" will result in warnings in the logs but no actual use of splice. Note that this only applies to GET responses without Range headers. It can easily be extended to single-range GET requests, but this commit leaves that for future work. Same goes for PUT requests, or at least non-chunked ones. On some real hardware I had laying around (not a VM), this produced a 37% reduction in CPU usage for GETs made directly to the object server. Measurements were done by looking at /proc/<pid>/stat, specifically the utime and stime fields (user and kernel CPU jiffies, respectively). Note: There is a Python module called "splicetee" available on PyPi, but it's licensed under the GPL, so it cannot easily be added to OpenStack's requirements. That's why this patch uses ctypes instead. Also fixed a long-standing annoyance in FakeLogger: >>> fake_logger.warn('stuff') >>> fake_logger.get_lines_for_level('warn') [] >>> This, of course, is because the correct log level is 'warning'. Now you get a KeyError if you call get_lines_for_level with a bogus log level. Change-Id: Ic6d6b833a5b04ca2019be94b1b90d941929d21c8

2014年06月10日 14:15:27 -07:00

#

# Use splice() for zero-copy object GETs. This requires Linux kernel

# version 3.0 or greater. If you set "splice = yes" but the kernel

# does not support it, error messages will appear in the object server

# logs at startup, but your object servers should continue to function.

#

# splice = no

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

#

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are IOPRIO_CLASS_RT (realtime), IOPRIO_CLASS_BE (best-effort) and

# IOPRIO_CLASS_IDLE (idle). I/O niceness priority is a number which goes from

# 0 to 7. The higher the value, the lower the I/O priority of the process.

# Work only with ionice_class.

# ionice_class =

# ionice_priority =

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

Allow optional, temporary healthcheck failure. A deployer may want to remove a Swift node from a load balancer for maintenance or upgrade. This patch provides an optional mechanism for this. The healthcheck filter config can specify "disable_path" which is a filesystem path. If a file is present at that location, the healthcheck middleware returns a 503 with a body of "DISABLED BY FILE". So a deployer can configure "disable_path" and then touch that filesystem path, wait for the proxy to be removed from the load balancer pool, perform maintenance/upgrade, and then remove the "disable_path" file. Also cleaned up the conf file man pages a bit. Change-Id: I1759c78c74910a54c720f298d4d8e6fa57a4dab4

2012年12月03日 16:05:44 -08:00

[filter:healthcheck]

use = egg:swift#healthcheck

# An optional filesystem path, which if present, will cause the healthcheck

# URL to return "503 Service Unavailable" with a body of "DISABLED BY FILE"

# disable_path =

Add documentation for Swift Recon. Change-Id: I37f4fb624bdc5b8bbf2e691d29aa6b15cd648aa8

2011年10月18日 21:10:50 +00:00

[filter:recon]

use = egg:swift#recon

Expand recon middleware support Expand recon middleware to include support for account and container servers in addition to the existing object servers. Also add support for retrieving recent information from auditors, replicators, and updaters. In the case of certain checks (such as container auditors) the stats returned are only for the most recent path processed. The middleware has also been refactored and should now also handle errors better in cases where stats are unavailable. While new check's have been added the output from pre-existing check's has not changed. This should allow existing 3rd party utilities such as the Swift ZenPack to continue to function. Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7

2012年05月14日 18:01:48 -05:00

#recon_cache_path = /var/cache/swift

#recon_lock_path = /var/lock

Add documentation for Swift Recon. Change-Id: I37f4fb624bdc5b8bbf2e691d29aa6b15cd648aa8

2011年10月18日 21:10:50 +00:00

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

[object-replicator]

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# You can override the default log routing for this app here (don't use set!):

Refactored logging configuration so that it has sane defaults

2010年08月24日 13:41:58 +00:00

# log_name = object-replicator

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# log_facility = LOG_LOCAL0

# log_level = INFO

Patch for Swift Solaris (Illumos) compability. * Add new configuration option log_address. Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032

2012年05月17日 15:46:38 -07:00

# log_address = /dev/log

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# daemonize = on

Replaced setting run_pause with standard interval The deprecated directive `run_pause` should be replaced with the more standard one `interval`. The `run_pause` should be still supported for backward compatibility. This patch updates object replicator to use `interval` and support `run_pause`. It also updates its sample config and documentation. Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com> Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com> Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09 Closes-Bug: #1364735

2014年10月21日 09:24:25 +00:00

#

# Time in seconds to wait between replication passes

# interval = 30

# run_pause is deprecated, use interval instead

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# run_pause = 30

Replaced setting run_pause with standard interval The deprecated directive `run_pause` should be replaced with the more standard one `interval`. The `run_pause` should be still supported for backward compatibility. This patch updates object replicator to use `interval` and support `run_pause`. It also updates its sample config and documentation. Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com> Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com> Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09 Closes-Bug: #1364735

2014年10月21日 09:24:25 +00:00

#

Multiprocess object replicator Add a multiprocess mode to the object replicator. Setting the "replicator_workers" setting to a positive value N will result in the replicator using up to N worker processes to perform replication tasks. At most one worker per disk will be spawned, so one can set replicator_workers=99999999 to always get one worker per disk regardless of the number of disks in each node. This is the same behavior that the object reconstructor has. Worker process logs will have a bit of information prepended so operators can tell which messages came from which worker. It looks like this: [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining) The prefix is "[worker M/N pid=P] ", where M is the worker's index, N is the total number of workers, and P is the process ID. Every message from the replicator's logger will have the prefix; this includes messages from down in diskfile, but does not include things printed to stdout or stderr. Drive-by fix: don't dump recon stats when replicating only certain policies. When running the object replicator with replicator_workers > 0 and "--policies=X,Y,Z", the replicator would update recon stats after running. Since it only ran on a subset of objects, it should not update recon, much like it doesn't update recon when run with --devices or --partitions. Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5

2018年03月22日 17:08:48 -07:00

# Number of concurrent replication jobs to run. This is per-process,

# so replicator_workers=W and concurrency=C will result in W*C

# replication jobs running at once.

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# concurrency = 1

Multiprocess object replicator Add a multiprocess mode to the object replicator. Setting the "replicator_workers" setting to a positive value N will result in the replicator using up to N worker processes to perform replication tasks. At most one worker per disk will be spawned, so one can set replicator_workers=99999999 to always get one worker per disk regardless of the number of disks in each node. This is the same behavior that the object reconstructor has. Worker process logs will have a bit of information prepended so operators can tell which messages came from which worker. It looks like this: [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining) The prefix is "[worker M/N pid=P] ", where M is the worker's index, N is the total number of workers, and P is the process ID. Every message from the replicator's logger will have the prefix; this includes messages from down in diskfile, but does not include things printed to stdout or stderr. Drive-by fix: don't dump recon stats when replicating only certain policies. When running the object replicator with replicator_workers > 0 and "--policies=X,Y,Z", the replicator would update recon stats after running. Since it only ran on a subset of objects, it should not update recon, much like it doesn't update recon when run with --devices or --partitions. Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5

2018年03月22日 17:08:48 -07:00

#

# Number of worker processes to use. No matter how big this number is,

# at most one worker per disk will be used. 0 means no forking; all work

# is done in the main process.

# replicator_workers = 0

#

Make obj/replicator timeouts configurable

2010年10月19日 01:05:54 +00:00

# stats_interval = 300

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Update Erasure Coding Overview doc to remove Beta version The major functionality of EC has been released for Liberty and the beta version of the code has been removed since it is now in production. Change-Id: If60712045fb1af803093d6753fcd60434e637772

2015年11月20日 12:09:26 -06:00

# default is rsync, alternative is ssync

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# sync_method = rsync

#

Make obj/replicator timeouts configurable

2010年10月19日 01:05:54 +00:00

# max duration of a partition rsync

object replicator logging and increase rsync timeouts

2011年01月27日 21:02:53 +00:00

# rsync_timeout = 900

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Fix misspellings in swift Fix misspellings detected by: * pip install misspellings * git ls-files | grep -v locale | misspellings -f - Change-Id: I6594fc4ca5ae10bd30eac8a2f2493a376adcadee Closes-Bug: #1257295

2014年02月07日 16:06:12 +08:00

# bandwidth limit for rsync in kB/s. 0 means unlimited

implement an rsync_bwlimit setting for object replicator Change-Id: I8789d6e4d22de83db9a2760d51a94eb56a48c3b5

2013年03月11日 11:15:41 -04:00

# rsync_bwlimit = 0

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Make obj/replicator timeouts configurable

2010年10月19日 01:05:54 +00:00

# passed to rsync for io op timeout

object replicator logging and increase rsync timeouts

2011年01月27日 21:02:53 +00:00

# rsync_io_timeout = 30

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Allow rsync to use compression From rsync's man page: -z, --compress With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted -- something that is useful over a slow connection. A configurable option has been added to allow rsync to compress, but only if the remote node is in a different region than the local one. NOTE: Objects that are already compressed (for example: .tar.gz, .mp3) might slow down the syncing process. On wire compression can also be extended to ssync later in a different change if required. In case of ssync, we could explore faster compression libraries like lz4. rsync uses zlib which is slow but offers higher compression ratio. Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c Signed-off-by: Prashanth Pai <ppai@redhat.com>

2015年01月20日 12:14:32 +05:30

# Allow rsync to compress data which is transmitted to destination node

# during sync. However, this is applicable only when destination node is in

# a different region than the local one.

# NOTE: Objects that are already compressed (for example: .tar.gz, .mp3) might

# slow down the syncing process.

# rsync_compress = no

#

Fixed rysnc -> rsync typo Change-Id: I671b4206072c6e22f4ae38033502336ec32e86ad

2016年10月19日 20:17:00 +02:00

# Format of the rsync module where the replicator will send data. See

Allows to configure the rsync modules where the replicators will send data Currently, the rsync module where the replicators send data is static. It forbids administrators to set rsync configuration based on their current deployment or needs. As an example, the rsyncd configuration example encourages to set a connections limit for the modules account, container and object. It permits to protect devices from excessives parallels connections, because it would impact performances. On a server with many devices, it is tempting to increase this number proportionally, but nothing guarantees that the distribution of the connections will be balanced. In the worst scenario, a single device can receive all the connections, which is a severe impact on performances. This commit adds a new option named 'rsync_module' to the *-replicator sections of the *-server configuration file. This configuration variable can be extrapolated with device attributes like ip, port, device, zone, ... by using the format {NAME}. eg: rsync_module = {replication_ip}::object_{device} With this configuration, an administrators can solve the problem of connections distribution by creating one module per device in rsyncd configuration. The default values are backward compatible: {replication_ip}::account {replication_ip}::container {replication_ip}::object Option vm_test_mode is deprecated by this commit, but backward compatibility is maintained. The option is only effective when rsync_module is not set. In that case, {replication_port} is appended to the default value of rsync_module. Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e

2015年06月16日 12:47:26 +02:00

# etc/rsyncd.conf-sample for some usage examples.

# rsync_module = {replication_ip}::object

#

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# node_timeout = <whatever's in the DEFAULT section or 10>

# max duration of an http request; this is for REPLICATE finalization calls and

# so should be longer than node_timeout

Make obj/replicator timeouts configurable

2010年10月19日 01:05:54 +00:00

# http_timeout = 60

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Make obj/replicator timeouts configurable

2010年10月19日 01:05:54 +00:00

# attempts to kill all workers if nothing replicates for lockup_timeout seconds

object replicator logging and increase rsync timeouts

2011年01月27日 21:02:53 +00:00

# lockup_timeout = 1800

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

bug 661267 adding config eastereggs, fixing defaults Change-Id: I41356ee250c9088a2387b0d493586dd990a04ac3

2012年04月28日 16:31:00 +10:00

# ring_check_interval = 15

Expand recon middleware support Expand recon middleware to include support for account and container servers in addition to the existing object servers. Also add support for retrieving recent information from auditors, replicators, and updaters. In the case of certain checks (such as container auditors) the stats returned are only for the most recent path processed. The middleware has also been refactored and should now also handle errors better in cases where stats are unavailable. While new check's have been added the output from pre-existing check's has not changed. This should allow existing 3rd party utilities such as the Swift ZenPack to continue to function. Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7

2012年05月14日 18:01:48 -05:00

# recon_cache_path = /var/cache/swift

Make the length of a line logged configurable Failed calls to rysnc can result in very long log lines. These lines are mostly made up of file paths and are not always useful. This change will allow for reducing the length of these lines logged if desired. Change-Id: I9a28f19eadc07757da9d42b0d7be1ed82170d732

2013年07月22日 22:09:40 +00:00

#

# limits how long rsync error log lines are

# 0 means to log the entire line

# rsync_error_log_line_length = 0

Add missing sample config of object-replicator Change-Id: I2bca67023aeb9a012927c69e23d582d4a0ff2098

2014年01月27日 01:08:37 -08:00

#

# handoffs_first and handoff_delete are options for a special case

# such as disk full in the cluster. These two options SHOULD NOT BE

# CHANGED, except for such an extreme situations. (e.g. disks filled up

# or are about to fill up. Anyway, DO NOT let your drives fill up)

# handoffs_first is the flag to replicate handoffs prior to canonical

# partitions. It allows to force syncing and deleting handoffs quickly.

# If set to a True value(e.g. "True" or "1"), partitions

# that are not supposed to be on the node will be replicated first.

# handoffs_first = False

#

# handoff_delete is the number of replicas which are ensured in swift.

# If the number less than the number of replicas is set, object-replicator

# could delete local handoffs even if all replicas are not ensured in the

# cluster. Object-replicator would remove local handoff partition directories

# after syncing partition when the number of successful responses is greater

# than or equal to this number. By default(auto), handoff partitions will be

# removed when it has successfully replicated to all the canonical nodes.

# handoff_delete = auto

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

#

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are IOPRIO_CLASS_RT (realtime), IOPRIO_CLASS_BE (best-effort) and

# IOPRIO_CLASS_IDLE (idle). I/O niceness priority is a number which goes from

# 0 to 7. The higher the value, the lower the I/O priority of the process.

# Work only with ionice_class.

# ionice_class =

# ionice_priority =

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

Erasure Code Reconstructor This patch adds the erasure code reconstructor. It follows the design of the replicator but: - There is no notion of update() or update_deleted(). - There is a single job processor - Jobs are processed partition by partition. - At the end of processing a rebalanced or handoff partition, the reconstructor will remove successfully reverted objects if any. And various ssync changes such as the addition of reconstruct_fa() function called from ssync_sender which performs the actual reconstruction while sending the object to the receiver Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> blueprint ec-reconstructor Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51

2014年10月28日 09:51:06 -07:00

[object-reconstructor]

# You can override the default log routing for this app here (don't use set!):

# Unless otherwise noted, each setting below has the same meaning as described

# in the [object-replicator] section, however these settings apply to the EC

# reconstructor

#

# log_name = object-reconstructor

# log_facility = LOG_LOCAL0

# log_level = INFO

# log_address = /dev/log

#

# daemonize = on

Replaced setting run_pause with standard interval The deprecated directive `run_pause` should be replaced with the more standard one `interval`. The `run_pause` should be still supported for backward compatibility. This patch updates object replicator to use `interval` and support `run_pause`. It also updates its sample config and documentation. Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com> Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com> Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09 Closes-Bug: #1364735

2014年10月21日 09:24:25 +00:00

#

# Time in seconds to wait between reconstruction passes

# interval = 30

# run_pause is deprecated, use interval instead

Erasure Code Reconstructor This patch adds the erasure code reconstructor. It follows the design of the replicator but: - There is no notion of update() or update_deleted(). - There is a single job processor - Jobs are processed partition by partition. - At the end of processing a rebalanced or handoff partition, the reconstructor will remove successfully reverted objects if any. And various ssync changes such as the addition of reconstruct_fa() function called from ssync_sender which performs the actual reconstruction while sending the object to the receiver Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> blueprint ec-reconstructor Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51

2014年10月28日 09:51:06 -07:00

# run_pause = 30

Replaced setting run_pause with standard interval The deprecated directive `run_pause` should be replaced with the more standard one `interval`. The `run_pause` should be still supported for backward compatibility. This patch updates object replicator to use `interval` and support `run_pause`. It also updates its sample config and documentation. Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com> Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com> Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09 Closes-Bug: #1364735

2014年10月21日 09:24:25 +00:00

#

Add multiple worker processes strategy to reconstructor This change adds a new Strategy concept to the daemon module similar to how we manage WSGI workers. We need to leverage multiple python processes to get the concurrency properties we need. More workers will rebalance much faster on dense chassis with many devices. Currently the default is still only one process, and no workers. Set reconstructor_workers in the [object-reconstructor] section to some whole number <= the number of devices on a node to get that many reconstructor workers. Each worker will operate on a different subset of disks. Once mode works as before, but tends to want to update recon drops a little bit more. If you change the rings, the strategy will shutdown workers and spawn new ones. You can kill the worker pids and the daemon strategy will respawn them. New per-disk reconstructor stats are dumped to recon under the object_reconstruction_per_disk key. To maintain legacy compatibility and replication monitoring based on cycle times they are aggregated every stats_interval (default 5 mins). Change-Id: I28925a37f3985c9082b5a06e76af4dc3ec813abe

2017年06月02日 17:47:25 -07:00

# Maximum number of worker processes to spawn. Each worker will handle a

# subset of devices. Devices will be assigned evenly among the workers so that

# workers cycle at similar intervals (which can lead to fewer workers than

# requested). You can not have more workers than devices. If you have no

# devices only a single worker is spawned.

# reconstructor_workers = 0

#

Erasure Code Reconstructor This patch adds the erasure code reconstructor. It follows the design of the replicator but: - There is no notion of update() or update_deleted(). - There is a single job processor - Jobs are processed partition by partition. - At the end of processing a rebalanced or handoff partition, the reconstructor will remove successfully reverted objects if any. And various ssync changes such as the addition of reconstruct_fa() function called from ssync_sender which performs the actual reconstruction while sending the object to the receiver Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> blueprint ec-reconstructor Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51

2014年10月28日 09:51:06 -07:00

# concurrency = 1

# stats_interval = 300

# node_timeout = 10

# http_timeout = 60

# lockup_timeout = 1800

# ring_check_interval = 15

# recon_cache_path = /var/cache/swift

Deprecate broken handoffs_first in favor of handoffs_only The handoffs_first mode in the replicator has the useful behavior of processing all handoff parts across all disks until there aren't any handoffs anymore on the node [1] and then it seemingly tries to drop back into normal operation. In practice I've only ever heard of handoffs_first used while rebalancing and turned off as soon as the rebalance finishes - it's not recommended to run with handoffs_first mode turned on and it emits a warning on startup if option is enabled. The handoffs_first mode on the reconstructor doesn't work - it was prioritizing handoffs *per-part* [2] - which is really unfortunate because in the reconstructor during a rebalance it's often *much* more attractive from an efficiency disk/network perspective to revert a partition from a handoff than it is to rebuild an entire partition from another primary using the other EC fragments in the cluster. This change deprecates handoffs_first in favor of handoffs_only in the reconstructor which is far more useful - and just like handoffs_first mode in the replicator - it gives the operator the option of forcing the consistency engine to focus on rebalance. The handoffs_only behavior is somewhat consistent with the replicator's handoffs_first option (any error on any handoff in the replicactor will make it essentially handoff only forever) but the option does what you want and is named correctly in the reconstructor. For consistency with the replicator the reconstructor will mostly honor the handoffs_first option, but if you set handoffs_only in the config it always takes precedence. Having handoffs_first in your config always results in a warning, but if handoff_only is not set and handoffs_first is true the reconstructor will assume you need handoffs_only and behaves as such. When running in handoffs_only mode the reconstructor will start to log a warning every cycle if you leave it running in handoffs_only after it finishes reverting handoffs. However you should be monitoring on-disk partitions and disable the option as soon as the cluster finishes the full rebalance cycle. 1. Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea fixed replicator handoffs_first "mode" 2. Unlike replication each partition in a EC policy can have a different kind of job per frag_index, but the cardinality of jobs is typically only one (either sync or revert) unless there's been a bunch of errors during write and then handoffs partitions maybe hold a number of different fragments. Known-Issues: handoffs_only is not documented outside of the example config, see lp bug #1626290 Closes-Bug: #1653018 Change-Id: Idde4b6cf92fab6c45f2c0c2733277701eb436898

2017年01月25日 11:51:03 -08:00

# The handoffs_only mode option is for special case emergency situations during

# rebalance such as disk full in the cluster. This option SHOULD NOT BE

# CHANGED, except for extreme situations. When handoffs_only mode is enabled

Add reconstructor section to deployment guide Change-Id: I062998e813718828b7adf4e7c3f877b6a31633c0 Closes-Bug: #1626290

2017年07月20日 11:40:17 +01:00

# the reconstructor will *only* revert fragments from handoff nodes to primary

# nodes and will not sync primary nodes with neighboring primary nodes. This

# will force the reconstructor to sync and delete handoffs' fragments more

# quickly and minimize the time of the rebalance by limiting the number of

# rebuilds. The handoffs_only option is only for temporary use and should be

# disabled as soon as the emergency situation has been resolved. When

# handoffs_only is not set, the deprecated handoffs_first option will be

# honored as a synonym, but may be ignored in a future release.

Deprecate broken handoffs_first in favor of handoffs_only The handoffs_first mode in the replicator has the useful behavior of processing all handoff parts across all disks until there aren't any handoffs anymore on the node [1] and then it seemingly tries to drop back into normal operation. In practice I've only ever heard of handoffs_first used while rebalancing and turned off as soon as the rebalance finishes - it's not recommended to run with handoffs_first mode turned on and it emits a warning on startup if option is enabled. The handoffs_first mode on the reconstructor doesn't work - it was prioritizing handoffs *per-part* [2] - which is really unfortunate because in the reconstructor during a rebalance it's often *much* more attractive from an efficiency disk/network perspective to revert a partition from a handoff than it is to rebuild an entire partition from another primary using the other EC fragments in the cluster. This change deprecates handoffs_first in favor of handoffs_only in the reconstructor which is far more useful - and just like handoffs_first mode in the replicator - it gives the operator the option of forcing the consistency engine to focus on rebalance. The handoffs_only behavior is somewhat consistent with the replicator's handoffs_first option (any error on any handoff in the replicactor will make it essentially handoff only forever) but the option does what you want and is named correctly in the reconstructor. For consistency with the replicator the reconstructor will mostly honor the handoffs_first option, but if you set handoffs_only in the config it always takes precedence. Having handoffs_first in your config always results in a warning, but if handoff_only is not set and handoffs_first is true the reconstructor will assume you need handoffs_only and behaves as such. When running in handoffs_only mode the reconstructor will start to log a warning every cycle if you leave it running in handoffs_only after it finishes reverting handoffs. However you should be monitoring on-disk partitions and disable the option as soon as the cluster finishes the full rebalance cycle. 1. Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea fixed replicator handoffs_first "mode" 2. Unlike replication each partition in a EC policy can have a different kind of job per frag_index, but the cardinality of jobs is typically only one (either sync or revert) unless there's been a bunch of errors during write and then handoffs partitions maybe hold a number of different fragments. Known-Issues: handoffs_only is not documented outside of the example config, see lp bug #1626290 Closes-Bug: #1653018 Change-Id: Idde4b6cf92fab6c45f2c0c2733277701eb436898

2017年01月25日 11:51:03 -08:00

# handoffs_only = False

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

#

Rebuild frags for unmounted disks Change the behavior of the EC reconstructor to perform a fragment rebuild to a handoff node when a primary peer responds with 507 to the REPLICATE request. Each primary node in a EC ring will sync with exactly three primary peers, in addition to the left & right nodes we now select a third node from the far side of the ring. If any of these partners respond unmounted the reconstructor will rebuild it's fragments to a handoff node with the appropriate index. To prevent ssync (which is uninterruptible) receiving a 409 (Conflict) we must give the remote handoff node the correct backend_index for the fragments it will recieve. In the common case we will use determistically different handoffs for each fragment index to prevent multiple unmounted primary disks from forcing a single handoff node to hold more than one rebuilt fragment. Handoff nodes will continue to attempt to revert rebuilt handoff fragments to the appropriate primary until it is remounted or rebalanced. After a rebalance of EC rings (potentially removing unmounted/failed devices), it's most IO efficient to run in handoffs_only mode to avoid unnecessary rebuilds. Closes-Bug: #1510342 Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec

2019年02月04日 15:46:40 -06:00

# The default strategy for unmounted drives will stage rebuilt data on a

# handoff node until updated rings are deployed. Because fragments are rebuilt

# on offset handoffs based on fragment index and the proxy limits how deep it

# will search for EC frags we restrict how many nodes we'll try. Setting to 0

# will disable rebuilds to handoffs and only rebuild fragments for unmounted

# devices to mounted primaries after a ring change.

# Setting to -1 means "no limit".

# rebuild_handoff_node_count = 2

#

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are IOPRIO_CLASS_RT (realtime), IOPRIO_CLASS_BE (best-effort) and

# IOPRIO_CLASS_IDLE (idle). I/O niceness priority is a number which goes from

# 0 to 7. The higher the value, the lower the I/O priority of the process.

# Work only with ionice_class.

# ionice_class =

# ionice_priority =

ec: Add an option to write fragments with legacy crc When upgrading from liberasurecode<=1.5.0, you may want to continue writing legacy CRCs until all nodes are upgraded and capabale of reading fragments with zlib CRCs. Starting in liberasurecode>=1.6.2, we can use the environment variable LIBERASURECODE_WRITE_LEGACY_CRC to control whether we write zlib or legacy CRCs, but for many operators it's easier to manage swift configs than environment variables. Add a new option, write_legacy_ec_crc, to the proxy-server app and object-reconstructor; if set to true, ensure legacy frags are written. Note that more daemons instantiate proxy-server apps than just the proxy-server. The complete set of impacted daemons should be: * proxy-server * object-reconstructor * container-reconciler * any users of internal-client.conf UpgradeImpact ============= To ensure a smooth liberasurecode upgrade: 1. Determine whether your cluster writes legacy or zlib CRCs. Depending on the order in which shared libraries are loaded, your servers may already be reading and writing zlib CRCs, even with old liberasurecode. In that case, no special action is required and WRITING LEGACY CRCS DURING THE UPGRADE WILL CAUSE AN OUTAGE. Just upgrade liberasurecode normally. See the closed bug for more information and a script to determine which CRC is used. 2. On all nodes, ensure Swift is upgraded to a version that includes write_legacy_ec_crc support and write_legacy_ec_crc is enabled on all daemons. 3. On each node, upgrade liberasurecode and restart Swift services. Because of (2), they will continue writing legacy CRCs which will still be readable by nodes that have not yet upgraded. 4. Once all nodes are upgraded, remove the write_legacy_ec_crc option from all configs across all nodes. After restarting daemons, they will write zlib CRCs which will also be readable by all nodes. Change-Id: Iff71069f808623453c0ff36b798559015e604c7d Related-Bug: #1666320 Closes-Bug: #1886088 Depends-On: https://review.opendev.org/#/c/738959/

2020年07月02日 16:29:59 -07:00

#

# When upgrading from liberasurecode<=1.5.0, you may want to continue writing

# legacy CRCs until all nodes are upgraded and capabale of reading fragments

# with zlib CRCs. liberasurecode>=1.6.2 checks for the environment variable

# LIBERASURECODE_WRITE_LEGACY_CRC; if set (value doesn't matter), it will use

# its legacy CRC. Set this option to true or false to ensure the environment

# variable is or is not set. Leave the option blank or absent to not touch

# the environment (default). For more information, see

# https://bugs.launchpad.net/liberasurecode/+bug/1886088

# write_legacy_ec_crc =

Erasure Code Reconstructor This patch adds the erasure code reconstructor. It follows the design of the replicator but: - There is no notion of update() or update_deleted(). - There is a single job processor - Jobs are processed partition by partition. - At the end of processing a rebalanced or handoff partition, the reconstructor will remove successfully reverted objects if any. And various ssync changes such as the addition of reconstruct_fa() function called from ssync_sender which performs the actual reconstruction while sending the object to the receiver Co-Authored-By: Alistair Coles <alistair.coles@hp.com> Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: John Dickinson <me@not.mn> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com> Co-Authored-By: Samuel Merritt <sam@swiftstack.com> Co-Authored-By: Christian Schwede <christian.schwede@enovance.com> Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com> blueprint ec-reconstructor Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51

2014年10月28日 09:51:06 -07:00

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

[object-updater]

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# You can override the default log routing for this app here (don't use set!):

Refactored logging configuration so that it has sane defaults

2010年08月24日 13:41:58 +00:00

# log_name = object-updater

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# log_facility = LOG_LOCAL0

# log_level = INFO

Patch for Swift Solaris (Illumos) compability. * Add new configuration option log_address. Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032

2012年05月17日 15:46:38 -07:00

# log_address = /dev/log

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# interval = 300

Object replication ssync (an rsync alternative) For this commit, ssync is just a direct replacement for how we use rsync. Assuming we switch over to ssync completely someday and drop rsync, we will then be able to improve the algorithms even further (removing local objects as we successfully transfer each one rather than waiting for whole partitions, using an index.db with hash-trees, etc., etc.) For easier review, this commit can be thought of in distinct parts: 1) New global_conf_callback functionality for allowing services to perform setup code before workers, etc. are launched. (This is then used by ssync in the object server to create a cross-worker semaphore to restrict concurrent incoming replication.) 2) A bit of shifting of items up from object server and replicator to diskfile or DEFAULT conf sections for better sharing of the same settings. conn_timeout, node_timeout, client_timeout, network_chunk_size, disk_chunk_size. 3) Modifications to the object server and replicator to optionally use ssync in place of rsync. This is done in a generic enough way that switching to FutureSync should be easy someday. 4) The biggest part, and (at least for now) completely optional part, are the new ssync_sender and ssync_receiver files. Nice and isolated for easier testing and visibility into test coverage, etc. All the usual logging, statsd, recon, etc. instrumentation is still there when using ssync, just as it is when using rsync. Beyond the essential error and exceptional condition logging, I have not added any additional instrumentation at this time. Unless there is something someone finds super pressing to have added to the logging, I think such additions would be better as separate change reviews. FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION CLUSTERS. Some of us will be in a limited fashion to look for any subtle issues, tuning, etc. but generally ssync is an experimental feature. In its current implementation it is probably going to be a bit slower than rsync, but if all goes according to plan it will end up much faster. There are no comparisions yet between ssync and rsync other than some raw virtual machine testing I've done to show it should compete well enough once we can put it in use in the real world. If you Tweet, Google+, or whatever, be sure to indicate it's experimental. It'd be best to keep it out of deployment guides, howtos, etc. until we all figure out if we like it, find it to be stable, etc. Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6

2013年08月28日 16:10:43 +00:00

# node_timeout = <whatever's in the DEFAULT section or 10>

Replace slowdown option with *_per_second option container and object updaters sleeps "slowdown" (default 0.01) seconds after every processed container/object. Because time.sleep call adds overhead, use ratelimit_sleep from common.utils instead. Same as in auditor. Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212

2017年03月21日 20:10:12 +01:00

#

object-updater: add concurrent updates The object updater now supports two configuration settings: "concurrency" and "updater_workers". The latter controls how many worker processes are spawned, while the former controls how many concurrent container updates are performed by each worker process. This should speed the processing of async_pendings. There is a change to the semantics of the configuration options. Previously, "concurrency" controlled the number of worker processes spawned, and "updater_workers" did not exist. I switched the meanings for consistency with other configuration options. In the object reconstructor, object replicator, object server, object expirer, container replicator, container server, account replicator, account server, and account reaper, "concurrency" refers to the number of concurrent tasks performed within one process (for reference, the container updater and object auditor use "concurrency" to mean number of processes). On upgrade, a node configured with concurrency=N will still handle async updates N-at-a-time, but will do so using only one process instead of N. UpgradeImpact: If you have a config file like this: [object-updater] concurrency = <N> and you want to take advantage of faster updates, then do this: [object-updater] concurrency = 8 # the default; you can omit this line updater_workers = <N> If you want updates to be processed exactly as before, do this: [object-updater] concurrency = 1 updater_workers = <N> Change-Id: I17e18088e61f664e1b9942d66423666d0cae1689

2018年06月04日 16:26:50 -07:00

# updater_workers controls how many processes the object updater will

# spawn, while concurrency controls how many async_pending records

# each updater process will operate on at any one time. With

# concurrency=C and updater_workers=W, there will be up to W*C

# async_pending records being processed at once.

# concurrency = 8

# updater_workers = 1

#

Replace slowdown option with *_per_second option container and object updaters sleeps "slowdown" (default 0.01) seconds after every processed container/object. Because time.sleep call adds overhead, use ratelimit_sleep from common.utils instead. Same as in auditor. Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212

2017年03月21日 20:10:12 +01:00

# Send at most this many object updates per second

# objects_per_second = 50

#

# slowdown will sleep that amount between objects. Deprecated; use

# objects_per_second instead.

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

# slowdown = 0.01

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Improve object-updater's stats logging The object updater has five different stats, but its logging only told you two of them (successes and failures), and it only told you after finishing all the async_pendings for a device. If you have a cluster that's been sick and has millions upon millions of async_pendings laying around, then your object-updaters are frustratingly silent. I've seen one cluster with around 8 million async_pendings per disk where the object-updaters only emitted stats every 12 hours. Yes, if you have StatsD logging set up properly, you can go look at your graphs and get real-time feedback on what it's doing. If you don't have that, all you get is a frustrating silence. Now, the object updater tells you all of its stats (successes, failures, quarantines due to bad pickles, unlinks, and errors), and it tells you incremental progress every five minutes. The logging at the end of a pass remains and has been expanded to also include all stats. Also included is a small change to what counts as an error: unmounted drives no longer do. The goal is that only abnormal things count as errors, like permission problems, malformed filenames, and so on. These are things that should never happen, but if they do, may require operator intervention. Drives fail, so logging an error upon encountering an unmounted drive is not useful. Change-Id: Idbddd507f0b633d14dffb7a9834fce93a10359ab

2018年01月12日 07:17:18 -08:00

# Log stats (at INFO level) every report_interval seconds. This

# logging is per-process, so with concurrency > 1, the logs will

# contain one stats log per worker process every report_interval

# seconds.

# report_interval = 300

#

Expand recon middleware support Expand recon middleware to include support for account and container servers in addition to the existing object servers. Also add support for retrieving recent information from auditors, replicators, and updaters. In the case of certain checks (such as container auditors) the stats returned are only for the most recent path processed. The middleware has also been refactored and should now also handle errors better in cases where stats are unavailable. While new check's have been added the output from pre-existing check's has not changed. This should allow existing 3rd party utilities such as the Swift ZenPack to continue to function. Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7

2012年05月14日 18:01:48 -05:00

# recon_cache_path = /var/cache/swift

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

#

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are IOPRIO_CLASS_RT (realtime), IOPRIO_CLASS_BE (best-effort) and

# IOPRIO_CLASS_IDLE (idle). I/O niceness priority is a number which goes from

# 0 to 7. The higher the value, the lower the I/O priority of the process.

# Work only with ionice_class.

# ionice_class =

# ionice_priority =

Initial commit of Swift code

2010年07月12日 17:03:45 -05:00

[object-auditor]

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# You can override the default log routing for this app here (don't use set!):

Refactored logging configuration so that it has sane defaults

2010年08月24日 13:41:58 +00:00

# log_name = object-auditor

More doc updates for logger stuff

2011年01月23日 13:18:28 -08:00

# log_facility = LOG_LOCAL0

# log_level = INFO

Patch for Swift Solaris (Illumos) compability. * Add new configuration option log_address. Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032

2012年05月17日 15:46:38 -07:00

# log_address = /dev/log

Make sample configs more readable. Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83

2013年06月06日 15:35:19 -07:00

#

Allow to change auditor sleep interval in config Change-Id: Ic451c5e0b686509f8982ed1bf65a223a2d77b9a0

2016年01月12日 21:26:33 +01:00

# Time in seconds to wait between auditor passes

# interval = 30

#

Update docs to highlight that the auditor chunk size can be set May not be obvious, but existing code will let you change the disk_chunk_size just for the auditor so this just points that out in the docs. In one short test I ran with a 4 node cluster with 18GB of 4MB objects on it, changint he auditor chunk size from the default of 64K to 1MB creased the auditor CPU time from 10% to 4%. Also added test code to make sure this overridden value is actually used and checked other auditWorker conf values as well. Change-Id: Ia12e1c6127877dc2124b60cd963cd0b6d5f3d6ef

2014年07月10日 06:21:56 -07:00

# You can set the disk chunk size that the auditor uses making it larger if

# you like for more efficient local auditing of larger objects

# disk_chunk_size = 65536

adding defaults, docs, and unit tests

2010年12月28日 14:54:00 -08:00

# files_per_second = 20

Parallel object auditor We are soon going to put servers with a high ratio of disk to CPU into production as object servers. One of our concerns with this configuration is that the object auditor would take too long to complete its audit cycle. Therefore we decided to parallelise the auditor. The auditor already uses fork(), so we decided to use the parallel model from the replicator. Concurrency is set by the concurrency parameter in the auditor stanza, which sets the number of parallel checksum auditors. The actual number of parallel auditing processes is concurrency + 1 if zero_byte_fps is non-zero. Only one ZBF process is forked, and a new ZBF process is forked as soon as the current ZBF process finishes. Thus the last process running will always be a ZBF process. Both forever and once modes are parallelised. Each checksum auditor process submits a nested dictionary with keys {'object_auditor_stats_ALL': {'diskn': {..}}} to dump_recon_cache so that the object_auditor_stats_ALL dict in recon cache consists of individual sub-dicts for each of the object disks on the server. The recon cache is no different to before when the checksum auditor is run in serial mode. When swift-recon is run, it sums the stats for the individual disks. DocImpact Change-Id: I0ce3db57a43e482d4be351cc522fc9060af6e2d3

2014年03月26日 16:32:07 +00:00

# concurrency = 1

adding defaults, docs, and unit tests

2010年12月28日 14:54:00 -08:00

# bytes_per_second = 10000000

object replicator logging and increase rsync timeouts

2011年01月27日 21:02:53 +00:00

# log_time = 3600

simplifying options and code

2011年02月21日 16:37:12 -08:00

# zero_byte_files_per_second = 50

Expand recon middleware support Expand recon middleware to include support for account and container servers in addition to the existing object servers. Also add support for retrieving recent information from auditors, replicators, and updaters. In the case of certain checks (such as container auditors) the stats returned are only for the most recent path processed. The middleware has also been refactored and should now also handle errors better in cases where stats are unavailable. While new check's have been added the output from pre-existing check's has not changed. This should allow existing 3rd party utilities such as the Swift ZenPack to continue to function. Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7

2012年05月14日 18:01:48 -05:00

# recon_cache_path = /var/cache/swift

Record some simple object stats in the object auditor Change-Id: I043a80c38091f59ce6707730363a4b43b29ae6ec

2013年07月01日 14:58:35 -07:00

# Takes a comma separated list of ints. If set, the object auditor will

# increment a counter for every object whose size is <= to the given break

# points and report the result after a full scan.

# object_size_stats =

Change schedule priority of daemon/server in config The goal is to modify schedule priority and I/O scheduling class and priority of daemon/server via configuration. Setting is optional, default keeps current behaviour. Use case: Prioritize object-server to object-auditor, because all user's requests needed to be served in peak hours and audit could wait. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> DocImpact Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396

2015年10月22日 10:19:49 +02:00

#

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are IOPRIO_CLASS_RT (realtime), IOPRIO_CLASS_BE (best-effort) and

# IOPRIO_CLASS_IDLE (idle). I/O niceness priority is a number which goes from

# 0 to 7. The higher the value, the lower the I/O priority of the process.

# Work only with ionice_class.

# ionice_class =

# ionice_priority =

Add profiling middleware in Swift The profile middleware provide a tool to profile Swift code on the fly and collect statistic data for performance analysis. An native simple Web UI is also provided to help query and visualize the data. Change-Id: I6a1554b2f8dc22e9c8cd20cff6743513eb9acc05 Implements: blueprint profiling-middleware

2013年10月24日 03:40:06 +08:00

Auditor will clean up stale rsync tempfiles DiskFile already fills in the _ondisk_info attribute when it tries to open a diskfile - even if the DiskFile's fileset is not valid or deleted. During this process the rsync tempfiles would be discovered and logged, but no-one would attempt to clean them up - even if they were really old. Instead of logging and ignoring unexpected files when validate a DiskFile fileset we'll add unexpected files to the unexpected key in the _ondisk_info attribute. With a little bit of re-organization in the auditor's object_audit method to get things into a single return path we can add an unconditional check for unexpected files and remove those that are "old enough". Since the replicator will kill any rsync processes that are running longer than the configured rsync_timeout we know that any rsync tempfiles older than this can be deleted. Split unlink_older_than in common.utils into two functions to allow an explicit list of previously discovered paths to be passed in to avoid an extra listdir. Since the getmtime handling already ignores OSError there's less concern of race condition where a previous discovered unexpected file is reaped by rsync while we're attempting to clean it up. Update some doc on the new config option. Closes-Bug: #1554005 Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283

2016年03月15日 17:09:21 -07:00

# The auditor will cleanup old rsync tempfiles after they are "old

# enough" to delete. You can configure the time elapsed in seconds

# before rsync tempfiles will be unlinked, or the default value of

# "auto" try to use object-replicator's rsync_timeout + 900 and fallback

# to 86400 (1 day).

# rsync_tempfile_timeout = auto

Let developers/operators add watchers to object audit Swift operators may find it useful to operate on each object in their cluster in some way. This commit provides them a way to hook into the object auditor with a simple, clearly-defined boundary so that they can iterate over their objects without additional disk IO. For example, a cluster operator may want to ensure a semantic consistency with all SLO segments accounted in their manifests, or locate objects that aren't in container listings. Now that Swift has encryption support, this could be used to locate unencrypted objects. The list goes on. This commit makes the auditor locate, via entry points, the watchers named in its config file. A watcher is a class with at least these four methods: __init__(self, conf, logger, **kwargs) start(self, audit_type, **kwargs) see_object(self, object_metadata, data_file_path, **kwargs) end(self, **kwargs) The auditor will call watcher.start(audit_type) at the start of an audit pass, watcher.see_object(...) for each object audited, and watcher.end() at the end of an audit pass. All method arguments are passed as keyword args. This version of the API is implemented on the context of the auditor itself, without spawning any additional processes. If the plugins are not working well -- hang, crash, or leak -- it's easier to debug them when there's no additional complication of processes that run by themselves. In addition, we include a reference implementation of plugin for the watcher API, as a help to plugin writers. Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a

2015年08月13日 17:05:25 -05:00

# A comma-separated list of watcher entry points. This lets operators

# programmatically see audited objects.

#

# The entry point group name is "swift.object_audit_watcher". If your

# setup.py has something like this:

#

# entry_points={'swift.object_audit_watcher': [

# 'some_watcher = some_module:Watcher']}

#

# then you would enable it with "watchers = some_package#some_watcher".

# For example, the built-in reference implementation is enabled as

# "watchers = swift#dark_data".

#

# watchers =

# Watcher-specific parameters can he added after "object-auditor:watcher:"

# like the following (note that entry points are qualified by package#):

#

# [object-auditor:watcher:swift#dark_data]

# action=log

py3: add swift-dsvm-functional-py3 job Note that keystone wants to stick some UTF-8 encoded bytes into memcached, but we want to store it as JSON... or something? Also, make sure we can hit memcache for containers with invalid UTF-8. Although maybe it'd be better to catch that before we ever try memcache? Change-Id: I1fbe133c8ec73ef6644ecfcbb1931ddef94e0400

2019年04月17日 13:11:33 -07:00

[object-expirer]

Clarify usage of dequeue_from_legacy option Change-Id: Iae9aa7a91b9afc19cb8613b5bc31de463b853dde

2019年05月02日 16:10:55 -06:00

# If this true, this expirer will execute tasks from legacy expirer task queue,

# at least one object server should run with dequeue_from_legacy = true

# dequeue_from_legacy = false

#

Enable to configure object-expirer in object-server.conf To prepare for object-expirer's general task queue feature [1], this patch enables to configure object-expirer in object-server.conf. Object-expirer.conf can be used in the same manner as before, but deprecated. If both of object-server.conf with "object-expirer" section and object-expirer.conf are in a node, only object-server.conf is used. Object-expirer.conf is used only if all object-server.conf doesn't have "object-expirer" section. There are two differences between "object-expirer.conf" style and "object-server.conf" style. The first difference is `dequeue_from_legacy` default value. `dequeue_from_legacy` defines task queue mode. In "object-expirer.conf" style, the default mode is legacy queue. In "object-server.conf" style, the default mode is general queue. But general mode means no-op mode for now, because general task queue is not implemented yet. The second difference is internal client config. In "object-expirer.conf" style, config file of internal client is the object-expirer.conf itself. In "object-server.conf" style, config file of internal client is another file. [1]: https://review.openstack.org/#/c/517389/ Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230

2018年09月12日 07:35:51 +00:00

# Note: Be careful not to enable ``dequeue_from_legacy`` on too many expirers

# as all legacy tasks are stored in a single hidden account and the same hidden

# containers. On a large cluster one may inadvertently make the

Clarify usage of dequeue_from_legacy option Change-Id: Iae9aa7a91b9afc19cb8613b5bc31de463b853dde

2019年05月02日 16:10:55 -06:00

# acccount/container server for the hidden too busy.

Enable to configure object-expirer in object-server.conf To prepare for object-expirer's general task queue feature [1], this patch enables to configure object-expirer in object-server.conf. Object-expirer.conf can be used in the same manner as before, but deprecated. If both of object-server.conf with "object-expirer" section and object-expirer.conf are in a node, only object-server.conf is used. Object-expirer.conf is used only if all object-server.conf doesn't have "object-expirer" section. There are two differences between "object-expirer.conf" style and "object-server.conf" style. The first difference is `dequeue_from_legacy` default value. `dequeue_from_legacy` defines task queue mode. In "object-expirer.conf" style, the default mode is legacy queue. In "object-server.conf" style, the default mode is general queue. But general mode means no-op mode for now, because general task queue is not implemented yet. The second difference is internal client config. In "object-expirer.conf" style, config file of internal client is the object-expirer.conf itself. In "object-server.conf" style, config file of internal client is another file. [1]: https://review.openstack.org/#/c/517389/ Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230

2018年09月12日 07:35:51 +00:00

#

Clarify usage of dequeue_from_legacy option Change-Id: Iae9aa7a91b9afc19cb8613b5bc31de463b853dde

2019年05月02日 16:10:55 -06:00

# Note: the processes and process options can only be used in conjunction with

# notes using `dequeue_from_legacy = true`. These options are ignored on nodes

# with `dequeue_from_legacy = false`.

Enable to configure object-expirer in object-server.conf To prepare for object-expirer's general task queue feature [1], this patch enables to configure object-expirer in object-server.conf. Object-expirer.conf can be used in the same manner as before, but deprecated. If both of object-server.conf with "object-expirer" section and object-expirer.conf are in a node, only object-server.conf is used. Object-expirer.conf is used only if all object-server.conf doesn't have "object-expirer" section. There are two differences between "object-expirer.conf" style and "object-server.conf" style. The first difference is `dequeue_from_legacy` default value. `dequeue_from_legacy` defines task queue mode. In "object-expirer.conf" style, the default mode is legacy queue. In "object-server.conf" style, the default mode is general queue. But general mode means no-op mode for now, because general task queue is not implemented yet. The second difference is internal client config. In "object-expirer.conf" style, config file of internal client is the object-expirer.conf itself. In "object-server.conf" style, config file of internal client is another file. [1]: https://review.openstack.org/#/c/517389/ Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230

2018年09月12日 07:35:51 +00:00

#

# processes is how many parts to divide the legacy work into, one part per

# process that will be doing the work

# processes set 0 means that a single legacy process will be doing all the work

# processes can also be specified on the command line and will override the

# config value

# processes = 0

#

# process is which of the parts a particular legacy process will work on

# process can also be specified on the command line and will override the config

# value

# process is "zero based", if you want to use 3 processes, you should run

# processes with process set to 0, 1, and 2

# process = 0

#

Clarify usage of dequeue_from_legacy option Change-Id: Iae9aa7a91b9afc19cb8613b5bc31de463b853dde

2019年05月02日 16:10:55 -06:00

# internal_client_conf_path = /etc/swift/internal-client.conf

#

# You can override the default log routing for this app here (don't use set!):

# log_name = object-expirer

# log_facility = LOG_LOCAL0

# log_level = INFO

# log_address = /dev/log

#

# interval = 300

#

Enable to configure object-expirer in object-server.conf To prepare for object-expirer's general task queue feature [1], this patch enables to configure object-expirer in object-server.conf. Object-expirer.conf can be used in the same manner as before, but deprecated. If both of object-server.conf with "object-expirer" section and object-expirer.conf are in a node, only object-server.conf is used. Object-expirer.conf is used only if all object-server.conf doesn't have "object-expirer" section. There are two differences between "object-expirer.conf" style and "object-server.conf" style. The first difference is `dequeue_from_legacy` default value. `dequeue_from_legacy` defines task queue mode. In "object-expirer.conf" style, the default mode is legacy queue. In "object-server.conf" style, the default mode is general queue. But general mode means no-op mode for now, because general task queue is not implemented yet. The second difference is internal client config. In "object-expirer.conf" style, config file of internal client is the object-expirer.conf itself. In "object-server.conf" style, config file of internal client is another file. [1]: https://review.openstack.org/#/c/517389/ Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230

2018年09月12日 07:35:51 +00:00

# report_interval = 300

#

# request_tries is the number of times the expirer's internal client will

# attempt any given request in the event of failure. The default is 3.

# request_tries = 3

#

# concurrency is the level of concurrency to use to do the work, this value

# must be set to at least 1

# concurrency = 1

#

Add tasks_per_second option to expirer This allows operators to throttle expirers as needed. Partial-Bug: #1784753 Change-Id: If75dabb431bddd4ad6100e41395bb6c31a4ce569

2020年10月02日 17:16:09 -05:00

# deletes can be ratelimited to prevent the expirer from overwhelming the cluster

# tasks_per_second = 50.0

#

Enable to configure object-expirer in object-server.conf To prepare for object-expirer's general task queue feature [1], this patch enables to configure object-expirer in object-server.conf. Object-expirer.conf can be used in the same manner as before, but deprecated. If both of object-server.conf with "object-expirer" section and object-expirer.conf are in a node, only object-server.conf is used. Object-expirer.conf is used only if all object-server.conf doesn't have "object-expirer" section. There are two differences between "object-expirer.conf" style and "object-server.conf" style. The first difference is `dequeue_from_legacy` default value. `dequeue_from_legacy` defines task queue mode. In "object-expirer.conf" style, the default mode is legacy queue. In "object-server.conf" style, the default mode is general queue. But general mode means no-op mode for now, because general task queue is not implemented yet. The second difference is internal client config. In "object-expirer.conf" style, config file of internal client is the object-expirer.conf itself. In "object-server.conf" style, config file of internal client is another file. [1]: https://review.openstack.org/#/c/517389/ Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230

2018年09月12日 07:35:51 +00:00

# The expirer will re-attempt expiring if the source object is not available

# up to reclaim_age seconds before it gives up and deletes the entry in the

# queue.

# reclaim_age = 604800

#

# recon_cache_path = /var/cache/swift

#

# You can set scheduling priority of processes. Niceness values range from -20

# (most favorable to the process) to 19 (least favorable to the process).

# nice_priority =

#

# You can set I/O scheduling class and priority of processes. I/O niceness

# class values are realtime, best-effort and idle. I/O niceness

# priority is a number which goes from 0 to 7. The higher the value, the lower

# the I/O priority of the process. Work only with ionice_class.

# ionice_class =

# ionice_priority =

#

Add profiling middleware in Swift The profile middleware provide a tool to profile Swift code on the fly and collect statistic data for performance analysis. An native simple Web UI is also provided to help query and visualize the data. Change-Id: I6a1554b2f8dc22e9c8cd20cff6743513eb9acc05 Implements: blueprint profiling-middleware

2013年10月24日 03:40:06 +08:00

# Note: Put it at the beginning of the pipleline to profile all middleware. But

# it is safer to put this after healthcheck.

[filter:xprofile]

use = egg:swift#xprofile

# This option enable you to switch profilers which should inherit from python

# standard profiler. Currently the supported value can be 'cProfile',

# 'eventlet.green.profile' etc.

# profile_module = eventlet.green.profile

#

# This prefix will be used to combine process ID and timestamp to name the

# profile data file. Make sure the executing user has permission to write

# into this path (missing path segments will be created, if necessary).

# If you enable profiling in more than one type of daemon, you must override

# it with an unique value like: /var/log/swift/profile/object.profile

# log_filename_prefix = /tmp/log/swift/profile/default.profile

#

# the profile data will be dumped to local disk based on above naming rule

# in this interval.

# dump_interval = 5.0

#

# Be careful, this option will enable profiler to dump data into the file with

# time stamp which means there will be lots of files piled up in the directory.

# dump_timestamp = false

#

# This is the path of the URL to access the mini web UI.

# path = /__profile__

#

# Clear the data when the wsgi server shutdown.

# flush_at_shutdown = false

#

# unwind the iterator of applications

# unwind = false

relinker: Allow conf files for configuration Swap out the standard logger stuff in place of --logfile. Keep --device as a CLI-only option. Everything else is pretty standard stuff that ought to be in [DEFAULT]. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I32f979f068592eaac39dcc6807b3114caeaaa814

2021年02月01日 21:04:36 -08:00

[object-relinker]

# You can override the default log routing for this app here (don't use set!):

# log_name = object-relinker

# log_facility = LOG_LOCAL0

# log_level = INFO

# log_address = /dev/log

relinker: Add option to ratelimit relinking Sure, you could use stuff like ionice or cgroups to limit relinker I/O, but sometimes a nice simple blunt instrument is handy. Change-Id: I7fe29c7913a9e09bdf7a787ccad8bba2c77cf995

2021年01月28日 16:13:29 -08:00

#

relinker: Parallelize per disk Add a new option, workers, that works more or less like the same option from background daemons. Disks will be distributed across N worker sub-processes so we can make the best use of the I/O available. While we're at it, log final stats at warning if there were errors. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I039d2b8861f69a64bd9d2cdf68f1f534c236b2ba

2021年01月06日 16:18:04 -08:00

# Start up to this many sub-processes to process disks in parallel. Each disk

# will be handled by at most one child process. By default, one process is

# spawned per disk.

# workers = auto

#

# Target this many relinks/cleanups per second for each worker, to reduce the

relinker: Add option to ratelimit relinking Sure, you could use stuff like ionice or cgroups to limit relinker I/O, but sometimes a nice simple blunt instrument is handy. Change-Id: I7fe29c7913a9e09bdf7a787ccad8bba2c77cf995

2021年01月28日 16:13:29 -08:00

# likelihood that the added I/O from a partition-power increase impacts

# client traffic. Use zero for unlimited.

# files_per_second = 0.0

relinker: retry links from older part powers If a previous partition power increase failed to cleanup all files in their old partition locations, then during the next partition power increase the relinker may find the same file to relink in more than one source partition. This currently leads to an error log due to the second relink attempt getting an EEXIST error. With this patch, when an EEXIST is raised, the relinker will attempt to create/verify a link from older partition power locations to the next part power location, and if such a link is found then suppress the error log. During the relink step, if an alternative link is verified and if a file is found that is neither linked to the next partition power location nor in the current part power location, then the file is removed during the relink step. That prevents the same EEXIST occuring again during the cleanup step when it may no longer be possible to verify that an alternative link exists. For example, consider identical filenames in the N+1th, Nth and N-1th partition power locations, with the N+1th being linked to the Nth: - During relink, the Nth location is visited and its link is verified. Then the N-1th location is visited and an EEXIST error is encountered, but the new check verifies that a link exists to the Nth location, which is OK. - During cleanup the locations are visited in the same order, but files are removed so that the Nth location file no longer exists when the N-1th location is visited. If the N-1th location still has a conflicting file then existence of an alternative link to the Nth location can no longer be verified, so an error would be raised. Therefore, the N-1th location file must be removed during relink. The error is only suppressed for tombstones. The number of partition power location that the relinker will look back over may be configured using the link_check_limit option in a conf file or --link-check-limit on the command line, and defaults to 2. Closes-Bug: 1921718 Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c

2021年03月26日 13:41:36 +00:00

#

# Maximum number of partition power locations to check for a valid link target

# if the relinker encounters an existing tombstone, but with different inode,

# in the next partition power location. If the relinker fails to make a link

# because a different tombstone already exists in the next partition power

# location then it will try to validate that the existing tombstone links to a

# valid target in the current partition power location, or previous partition

# power locations, in descending order. This option limits the number of

# partition power locations searched, including the current partition power,

# and should be a whole number. A value of 0 implies that no validation is

# attempted, and an error is logged, when an existing tombstone prevents link

# creation. A value of 1 implies that an existing link is accepted if it links

# to a tombstone in the current partition power location. The default value of

# 2 implies that an existing link is acceptable if it links to a tombstone in

# the current or previous partition power locations. Increased values may be

# useful if previous partition power increases have failed to cleanup

# tombstones from their old locations, causing duplicate tombstones with

# different inodes to be relinked to the next partition power location.

relinker: Parallelize per disk Add a new option, workers, that works more or less like the same option from background daemons. Disks will be distributed across N worker sub-processes so we can make the best use of the I/O available. While we're at it, log final stats at warning if there were errors. Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I039d2b8861f69a64bd9d2cdf68f1f534c236b2ba

2021年01月06日 16:18:04 -08:00

# link_check_limit = 2

openstack/swift - swift - OpenDev: Free Software Needs Free Tools

641 lines

25 KiB

Plaintext

Raw Normal View History

openstack/swift - swift - OpenDev: Free Software Needs Free Tools

641 lines 25 KiB Plaintext Raw Normal View History Unescape Escape

641 lines

25 KiB

Plaintext

Raw Normal View History