af8d842076ba269fed7f4128d0c7503ab5d1a94a
45 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Joanna H. Huang
|
af8d842076 |
Replaced setting run_pause with standard interval
The deprecated directive `run_pause` should be replaced with the more standard one `interval`. The `run_pause` should be still supported for backward compatibility. This patch updates object replicator to use `interval` and support `run_pause`. It also updates its sample config and documentation. Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com> Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com> Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09 Closes-Bug: #1364735 |
||
|
Prashanth Pai
|
9c33bbde69 |
Allow rsync to use compression
From rsync's man page: -z, --compress With this option, rsync compresses the file data as it is sent to the destination machine, which reduces the amount of data being transmitted -- something that is useful over a slow connection. A configurable option has been added to allow rsync to compress, but only if the remote node is in a different region than the local one. NOTE: Objects that are already compressed (for example: .tar.gz, .mp3) might slow down the syncing process. On wire compression can also be extended to ssync later in a different change if required. In case of ssync, we could explore faster compression libraries like lz4. rsync uses zlib which is slow but offers higher compression ratio. Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c Signed-off-by: Prashanth Pai <ppai@redhat.com> |
||
|
Rafael Rivero
|
c1f6569c00 |
Fixes several typos (Swift)
Corrects spelling errors found in comments. Change-Id: I228a888e3f256569ea32ef1613092dbd63e13c62 |
||
|
John Dickinson
|
b7281cf2c5 |
make the bind_port config setting required
In a long-term effort to change the recommended ports for Swift, the first step is to require the bind_port in config files. Later, we can change the recommended setting. Anyone currently explicitly setting the ports will not be affected. Anyone not setting the ports will need to specify them to match their rings. DocImpact Change-Id: Icca83a263acdd0afc9016424a3e9f8c15e944789 |
||
|
Matthew Oliver
|
090baa1fa9 |
Swift configuration parameter audit
This change is the result of an audit through the config parameters
provided by swift and how/if they are addressed in the swift
documentation. The documentation being the sample config files in
the /etc directory or the documentation.
This change is only concerned with the config files in etc/ next
I will look at the documentation in the doc/ folder.
This change makes the following assumptions:
- Unless stated otherwise, the commented out parameter in the
sample configuration is the default for swift.
- When the default in the code differs from that of the sample
configuration, the default in the code is correct.
Container reconciler:
Parameter: interval
- code: 30
- config: 300
Result: config = 30
Object Expirer:
Parameter: recon_cache_path
- code: /var/cache/swift
- config: Parameter missing
Result: Add parameter
swift-dispersion-populate && swift-dispersion-report
Parameter: auth_version
- code: 1.0
- config: 2.0 (due to being a confusing example of how to setup
version 2.0).
Result: Added 'auth_version = 1.0' to the right section (showing
default and make the sample configuration for auth version
2.0 easier to understand.
swift-drive-audit:
Parameter: log_file_pattern
- code: /var/log/kern.*[!.][!g][!z]
- config: /var/log/kern*
Result: config = /var/log/kern.*[!.][!g][!z]
NOTE: swift-drive-audit uses a parameter called device_dir which
defaults to '/srv/node'. In all other swift binaries/services
there is a similar parameter called devices which stores the
same thing. This is an inconsistency which I haven't fixed
as this could break existing swift clusters out in the wild.
Proxy Server:
Parameter: object_chunk_size
- code: 65536
- config: 8192
Result: config = 65536
Parameter: client_chunk_size
- code: 65536
- config: 8192
Result: config = 65536
Parameter: strict_cors_mode
- code: True
- config: No parameter
Result: config = True
Account and Container replicator configuration confusion:
NOTES:
The account and container replicators have parameters:
- interval
- run_pause
Both of these are loaded into the same variable in code:
self.interval = int(conf.get('interval') or
conf.get('run_pause') or 30)
If a user sets both to different values then interval is used.
Result: Update the configuration to make this more clear.
DocImpact
Change-Id: Iaadbb1a6284f8b3e0801bc343b29772f70f4bf6e
|
||
|
gholt
|
2d00f7b7ba |
New log_max_line_length option.
Log lines can get quite large, as we previously noticed with rsync error log lines. We added a setting to cap those, but it really looks like we should have just done this overall limit. We noticed the issue when we switched to UDP syslogging and it would occasionally blow past the 16436 lo MTU! This causes Python's logging code to get an error and hilarity ensues. Change-Id: I44bdbe68babd58da58c14360379e8fef8a6b75f7 |
||
|
zhang-hare
|
f5caac43ac |
Add profiling middleware in Swift
The profile middleware provide a tool to profile Swift code on the fly and collect statistic data for performance analysis. An native simple Web UI is also provided to help query and visualize the data. Change-Id: I6a1554b2f8dc22e9c8cd20cff6743513eb9acc05 Implements: blueprint profiling-middleware |
||
|
Jenkins
|
a2126add0b | Merge "Set default wsgi workers to cpu_count" | ||
|
Newptone
|
5c1a7871d9 |
Unified format of boolean params in conf files
In swift conf files, boolean options use different format: some use true/false, and some use True/False. This patch is aim to using lowcase true/false to unify boolean params formats in swift conf files. Fix Bug #1203421 Change-Id: I3e1bfc6e43231f51e0710aa54869f3774ee896b1 |
||
|
Clay Gerrard
|
de3acec4bf |
Set default wsgi workers to cpu_count
Change the default value of wsgi workers from 1 to auto. The new default value for workers in the proxy, container, account & object wsgi servers will spawn as many workers per process as you have cpu cores. This will not be ideal for some configurations, but it's much more likely to produce a successful out of the box deployment. Inspect the number of cpu_cores using python's multiprocessing when available. Multiprocessing was added in python 2.6, but I know I've compiled python without it before on accident. The cpu_count method seems to be pretty system agnostic, but it says it can raise NotImplementedError or sometimes return 0. Add a new utility method 'config_auto_int_value' to pull an integer out of the config which has a dynamic default. * drive by s/container/proxy/ in proxy-server.conf.5 * fix misplaced max_clients in *-server.conf-sample * update doc/development_saio to force workers = 1 DocImpact Change-Id: Ifa563d22952c902ab8cbe1d339ba385413c54e95 |
||
|
Samuel Merritt
|
efdb0e3681 |
Make sample configs more readable.
Inject some empty lines to avoid the wall-of-text effect and to make it a little clearer which descriptions go with which options. Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83 |
||
|
Donagh McCabe
|
34e2ab3f31 |
account-reaper warns if not making progress
DocImpact If account reaper has not managed to clean out an account after a long period, it prints a message to the log (you can search your system looking for such messages). Introduce reap_warn_after config variable to determine when to emit the message (defaults to 30 days). Also fix bug 1181995 (edge case where object name is an empty string) Change-Id: Ic0dfee04742d06b6a51b59f302d7a272d7c1de92 |
||
|
Sergey Kraynev
|
ea7858176b |
Implementation of replication servers
Support separate replication ip address: - Added new function in utils. This function provides ability to select separate IP address for replication service. - Db_replicator and object replicators were changed. Replication process uses new function now. Replication network parameters: - Replication network fields (replication_ip, replication_port) support was added to device dictionary in swift-ring-builder script. - Changes were made to support new fields in search, show and set_info functions. Implementation of replication servers: - Separate replication servers use the same code as normal replication servers, but with replication_server parameter = True. When using a separate replication network, the non-replication servers set replication_server = False. When there is no separate replication network (the default case), replication_server is not included in the config. DocImpact Change-Id: Ie9af5bdcdf9241c355e36053ca4adfe49dc35bd0 Implements: blueprint dedicated-replication-network |
||
|
Peter Portante
|
2d42b37303 |
Add the max_clients parameter to bound clients
The new max_clients parameter allows one full control over the maximum number of client requests that will be handled by a given worker for any of the proxy, account, container or object servers. Lowering the number of clients handled per worker, and raising the number of workers can lessen the impact that a CPU intensive, or blocking, request can have on other requests served by the same worker. If the maximum number of clients is set to one, then a given worker will not perform another accept(2) call while processing, allowing other workers a chance to process it. DocImpact Signed-off-by: Peter Portante <peter.portante@redhat.com> Change-Id: Ic01430f7a6c5ff48d7aa349dc86a5f8ac463a420 |
||
|
Jenkins
|
249a65461e | Merge "Adding speed limit options for DB auditor" | ||
|
Samuel Merritt
|
a4a047c4ec |
Fix descriptions in sample configs.
Change-Id: I7aca3c6cafd9391031f7a10cc233f99e81ee0393 |
||
|
yuan-zhou
|
09370862ca |
Adding speed limit options for DB auditor
Fix bug 1129760 Without speed limit, DB auditor will likely consume high CPU% on storage node. That will highly impact the cluster's performance. This patch adds two options for account/container auditor: - containers_per_second: Maximum containers audited per second - accounts_per_second: Maximum accounts audited per second DocImpact Change-Id: I9faa506438185a83ca77db4906969328624d015f |
||
|
Jenkins
|
23f33b2069 | Merge "Make statsd sample rate behave better." | ||
|
gholt
|
87a42ab9ca |
Added fallocate_reserve option
Some systems behave badly when they completely run out of space. To alleviate this problem, you can set the fallocate_reserve conf value to a number of bytes to "reserve" on each disk. When the disk free space falls at or below this amount, fallocate calls will fail, even if the underlying OS fallocate call would succeed. For example, a fallocate_reserve of 5368709120 (5G) would make all fallocate calls fail, even for zero-byte files, when the disk free space falls under 5G. The default fallocate_reserve is 0, meaning "no reserve", and so the software behaves exactly as it always has unless you set this conf value to something non-zero. Also fixed ring builder's search_devs doc bugs. Related: To get rsync to do the same, see https://github.com/rackspace/cloudfiles-rsync Specifically, see this patch: https://github.com/rackspace/cloudfiles-rsync/blob/master/debian/patches/limit-fs-fullness.diff DocImpact Change-Id: I8db176ae0ca5b41c9bcfeb7cb8abb31c2e614527 |
||
|
Darrell Bishop
|
8801b74090 |
Make statsd sample rate behave better.
As Dieter pointed out in bug 1090495 (https://bugs.launchpad.net/swift/+bug/1090495), the volume of metrics can vary wildly between StatsD metrics. This patch implements a partial solution by reducing the sample_rate used for known high-volume metrics (operational experience will need to inform this over time) and introducing a new tunable, log_statsd_sample_rate_factor which is multiplied by the sample_rate for every statsd stat. This tunable can be used to reduce StatsD traffic proportionally for all metrics and is intended to replace log_statsd_default_sample_rate, which is left alone for backward-compatibility, should anyone be using it. This patch also includes a drive-by fix for log_udp_port which wasn't being converted to an int (I didn't verify that actually causes trouble in SysLogHandler(), but it's definitely an improvement regardles). Change-Id: Id404636e3629f6431cf1c4e64a143959750a3c23 |
||
|
Jenkins
|
8b770aa55e | Merge "Add config option to turn eventlet debug on/off" | ||
|
Chuck Thier
|
4c6a354483 |
Add config option to turn eventlet debug on/off
By default, this will be turned off. This will cause eventlet to not print stack traces to stderr which can be very annoying on production systems. It is still recommended to turn it on for development or debuging purposes. DocImpact Change-Id: I5e5b902d3d9ed85f784549e53f2ee2fc87cbe2e5 |
||
|
clayg
|
3a70112d03 |
Add config of server start timeouts for probetests
Currently the timeout for a wsgi server successfully binding to a port and for a probetest background service to finish starting are hard coded to 30 seconds. While a reasonable default for most configurations, a small virtualized environment may need a little more time in order for probe tests to complete successfully. This patch adds a 'bind_timeout' option to the DEFAULT section of the main wsgi servers' config. Also a new [probe_test] section and 'check_server_timeout' option to test.conf DocImpact Change-Id: Ibcaff153c7633bbf32e460fd9dbf04932eddb56f |
||
|
Darrell Bishop
|
b8e3e9e1c2 |
Allow optional, temporary healthcheck failure.
A deployer may want to remove a Swift node from a load balancer for maintenance or upgrade. This patch provides an optional mechanism for this. The healthcheck filter config can specify "disable_path" which is a filesystem path. If a file is present at that location, the healthcheck middleware returns a 503 with a body of "DISABLED BY FILE". So a deployer can configure "disable_path" and then touch that filesystem path, wait for the proxy to be removed from the load balancer pool, perform maintenance/upgrade, and then remove the "disable_path" file. Also cleaned up the conf file man pages a bit. Change-Id: I1759c78c74910a54c720f298d4d8e6fa57a4dab4 |
||
|
Florian Hines
|
92826d0602 |
add support for custom log handlers
Add a hook to get_logger to run custom functions to add custom log handlers or the like. Change-Id: Ib04b12939dcac7e4ad6453dea9795682044c6ae0 |
||
|
Darrell Bishop
|
4a2ae2b460 |
Upating proxy-server StatsD logging.
Removed many StatsD logging calls in proxy-server and added
swift-informant-style catch-all logging in the proxy-logger middleware.
Many errors previously rolled into the "proxy-server.<type>.errors"
counter will now appear broken down by response code and with timing
data at: "proxy-server.<type>.<verb>.<status>.timing". Also, bytes
transferred (sum of in + out) will be at:
"proxy-server.<type>.<verb>.<status>.xfer". The proxy-logging
middleware can get its StatsD config from standard vars in [DEFAULT] or
from access_log_statsd_* config vars in its config section.
Similarly to Swift Informant, request methods ("verbs") are filtered
using the new proxy-logging config var, "log_statsd_valid_http_methods"
which defaults to GET, HEAD, POST, PUT, DELETE, and COPY. Requests with
methods not in this list use "BAD_METHOD" for <verb> in the metric name.
To avoid user error, access_log_statsd_valid_http_methods is also
accepted.
Previously, proxy-server metrics used "Account", "Container", and
"Object" for the <type>, but these are now all lowercase.
Updated the admin guide's StatsD docs to reflect the above changes and
also include the "proxy-server.<type>.handoff_count" and
"proxy-server.<type>.handoff_all_count" metrics.
The proxy server now saves off the original req.method and proxy_logging
will use this if it can (both for request logging and as the "<verb>" in
the statsd timing metric). This fixes bug 1025433.
Removed some stale access_log_* related code in proxy/server.py. Also
removed the BaseApplication/Application distinction as it's no longer
necessary.
Fixed up the sample config files a bit (logging lines, mostly).
Fixed typo in SAIO development guide.
Got proxy_logging.py test coverage to 100%.
Fixed proxy_logging.py for PEP8 v1.3.2.
Enhanced test.unit.FakeLogger to track more calls to enable testing
StatsD metric calls.
Change-Id: I45d94cb76450be96d66fcfab56359bdfdc3a2576
|
||
|
gholt
|
c509ac2371 |
Added ability to disable fallocate
Change-Id: Id8872c581ed23378a8e14cbf3bf049b5c0d21577 |
||
|
Victor Rodionov
|
13e4de1899 |
Patch for Swift Solaris (Illumos) compability.
* Add new configuration option log_address. Change-Id: I636bd4116687629c997b70a0d804b7ed4bc46032 |
||
|
Florian Hines
|
ccb6334c17 |
Expand recon middleware support
Expand recon middleware to include support for account and container servers in addition to the existing object servers. Also add support for retrieving recent information from auditors, replicators, and updaters. In the case of certain checks (such as container auditors) the stats returned are only for the most recent path processed. The middleware has also been refactored and should now also handle errors better in cases where stats are unavailable. While new check's have been added the output from pre-existing check's has not changed. This should allow existing 3rd party utilities such as the Swift ZenPack to continue to function. Change-Id: Ib9893a77b9b8a2f03179f2a73639bc4a6e264df7 |
||
|
gholt
|
9eb797b099 |
!! Changed db_preallocation to False
Long explanation, but hopefully answers any questions. We don't like changing the default behavior of Swift unless there's a really good reason and, up until now, I've tried doing this with this new db_preallocation setting. For clusters with dedicated account/container servers that usually have fewer disks overall but SSD for speed, having db_preallocation on will gobble up disk space quite quickly and the fragmentation it's designed to fight isn't that big a speed impact to SSDs anyway. For clusters with account/container servers spread across all servers along with object servers usually having standard disks for cost, having db_preallocation off will cause very fragmented database files impacting speed, sometimes dramatically. Weighing these two negatives, it seems the second is the lesser evil. The first can cause disks to fill up and disable the cluster. The second will cause performance degradation, but the cluster will still function. Furthermore, if just one piece of code that touches all databases runs with db_preallocation on, it's effectively on for the whole cluster. We discovered this most recently when we finally configured everything within the Swift codebase to have db_preallocation off, only to find out Slogging didn't know about the new setting and so ran with it on and starting filling up SSDs. So that's why I'm proposing this change to the default behavior. We will definitely need to post a prominent notice of this change with the next release. Change-Id: I48a43439264cff5d03c14ec8787f718ee44e78ea |
||
|
Darrell Bishop
|
3d3ed34f44 |
Adding StatsD logging to Swift.
Documentation, including a list of metrics reported and their semantics,
is in the Admin Guide in a new section, "Reporting Metrics to StatsD".
An optional "metric prefix" may be configured which will be prepended to
every metric name sent to StatsD.
Here is the rationale for doing a deep integration like this versus only
sending metrics to StatsD in middleware. It's the only way to report
some internal activities of Swift in a real-time manner. So to have one
way of reporting to StatsD and one place/style of configuration, even
some things (like, say, timing of PUT requests into the proxy-server)
which could be logged via middleware are consistently logged the same
way (deep integration via the logger delegate methods).
When log_statsd_host is configured, get_logger() injects a
swift.common.utils.StatsdClient object into the logger as
logger.statsd_client. Then a set of delegate methods on LogAdapter
either pass through to the StatsdClient object or become no-ops. This
allows StatsD logging to look like:
self.logger.increment('some.metric.here')
and do the right thing in all cases and with no messy conditional logic.
I wanted to use the pystatsd module for the StatsD client, but the
version on PyPi is lagging the git repo (and is missing both the prefix
functionality and timing_since() method). So I wrote my
swift.common.utils.StatsdClient. The interface is the same as
pystatsd.Client, but the code was written from scratch. It's pretty
simple, and the tests I added cover it. This also frees Swift from an
optional dependency on the pystatsd module, making this feature easier
to enable.
There's test coverage for the new code and all existing tests continue
to pass.
Refactored out _one_audit_pass() method in swift/account/auditor.py and
swift/container/auditor.py.
Fixed some misc. PEP8 violations.
Misc test cleanups and refactorings (particularly the way "fake logging"
is handled).
Change-Id: Ie968a9ae8771f59ee7591e2ae11999c44bfe33b2
|
||
|
Tom Fifield
|
9920aeb7d4 |
bug 661267 adding config eastereggs, fixing defaults
Change-Id: I41356ee250c9088a2387b0d493586dd990a04ac3 |
||
|
gholt
|
0becfab629 |
Added option to disable SQLite db preallocation
Added option to disable SQLite db preallocation. This can be very useful on pure ssd account/container servers where the extra space is worth more than the lesser fragmentation. Change-Id: I8fbb028a9b6143775b25b343e97896497a8b63a9 |
||
|
gholt
|
ac3cc680de |
Add an optional delay to account reaping.
Normally, the reaper begins deleting account information for deleted accounts immediately. With this patch you can set it to delay its work. You set the delay_reaping value in the [account-reaper] section of the account-server.conf. The value is in seconds; 2592000 = 30 days, for example. Unfortunately, there are currently zero tests for the account-reaper. This also needs fixing, but I thought I'd submit this delay patch alone for consideration. Change-Id: Ic077df9cdd95c5d3f8949dd3bbe9893cf24c6623 |
||
|
gholt
|
52ba08d67d |
Improvements to database replication.
Note: I'd like to get this released as soon as possible as it is a data durability issue. 1) Orders nodes so that none get starved (see code and footnote). 2) New max_diffs setting that caps how long the replicator will spend trying to sync a given database per pass so the other databases don't get starved. 3) Replaces run_pause with the more standard "interval", which means the replicator won't pause unless it takes less than the interval set. Change-Id: I986742229e65031df88f5251ca61746b7c8d2bde |
||
|
John Dickinson
|
5490c514fe | removed slogging references from docs | ||
|
gholt
|
4905c71669 | More doc updates for logger stuff | ||
|
John Dickinson
|
c53f49ce98 | merged with trunk | ||
|
Jay Payne
|
66c8b412c8 | Moved backlog setting into the [Default] section of the sample-conf files | ||
|
Jay Payne
|
223c2e9011 | add default backlog setting to sample configs | ||
|
John Dickinson
|
2579cf54c2 | updated container auditor to only do local work and updated auditor configs | ||
|
Chuck Thier
|
04a5ccb4b1 | Added vm_test_mode to the sample configs | ||
|
Chuck Thier
|
c62707ae72 | Refactored logging configuration so that it has sane defaults | ||
|
Chuck Thier
|
2c596c0a0f | Initial commit of middleware refactor | ||
|
Chuck Thier
|
001407b969 | Initial commit of Swift code |