c26c7b8edd464d1fcf1d219f7c3fb040914d9da3
687 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Zuul
|
e10c2bafcb | Merge "proxy-logging: create field for access_user_id" | ||
|
Vitaly Bordyug
|
32eaab20b1 |
proxy-logging: create field for access_user_id
Added the new field to be able to log the access key during the s3api calls, while reserving the field to be filled with auth relevant information in case of other middlewares. Added respective code to the tempauth and keystone middlewares. Since s3api creates a copy of the environ dict for the downstream request object when translating the s3req.to_swift_req the environ dict that is seen/modifed in other mw module is not the same instance seen in proxy-logging - using mutable objects get transfered into the swift_req.environ. Change the assert in test_proxy_logging from "the last field" to the index 21 in the interests of maintainability. Also added some regression tests for object, bucket and s3 v4 apis and updated the documentation with the details about the new field. Signed-off-by: Vitaly Bordyug <vbordug@gmail.com> Change-Id: I0ce4e92458e2b05a4848cc7675604c1aa2b64d64 |
||
|
Tim Burke
|
ae062f8b09 |
ring: Introduce a v2 ring format
There's a bunch of moving pieces here: - Add a new RingWriter class. Stick it in a new swift.common.ring.io module. You *can* use it like the old gzip file, but you can also define named sections which can be referenced later on read. Section names may be arbitrary strings, but the "swift/" prefix is reserved for upstream use. Sections must contain a single length-value encoded BLOB. If sections are used, an additional BLOB is written at the end containing a JSON section-index, followed by an uncompressed offset for the index. Move RingReader to ring/io.py, too. - Clean up some ring metadata handling: - Drop MD5 tracking in RingReader. It was brittle at best anyway, and nothing uses it. YAGNI - Fix size/raw_size attributes when loading only metadata. - Add the ability to seek within RingReaders, though you need to know what you're doing and only seek to flush points. - Let RingBuilder objects change how wide their replica2part2dev_id arrays are. Add a dev_id_bytes key to serialized ring metadata. dev_id_bytes may be either 2 or 4, but 4 requires v2 rings. We considered allowing dev_id_bytes of 1, but dropped it as unnecessary complexity for a niche use case. - swift-ring-builder version subcommand added, which takes a ring. This lets operators see the serialization format of a ring on disk: $ swift-ring-builder object.ring.gz version object.ring.gz: Serialization version: 2 (2-byte IDs), build version: 54 Signed-off-by: Tim Burke <tim.burke@gmail.com> Change-Id: Ia0ac4ea2006d8965d7fdb6659d355c77386adb70 |
||
|
Tim Burke
|
74030236ad |
tempauth: Support fernet tokens
Tempauth fernet tokens use a secret shared among all proxies to encrypt user group information. Because they are encrypted, clients can neither view nor edit this information; it is an opaque bearer token similar to the existing memcached-backed tokens (just much longer). Note that tokens still expire after the configured token_life. Add a new set of config options of the form fernet_key_<keyid> = <32 url-safe base64-encoded bytes> Any of the configured keys will be used to attempt to decrypt tokens starting with "ftk" and extract group information. Another new config option active_fernet_key_id = <keyid> dictates which key should be used when minting tokens. Such tokens will start with "ftk" to distinguish them from memcached-backed tokens (which continue to start with "tk"). If active_fernet_key_id is not configured, memcached-backed tokens continue to be used. Together, these allow seamless transitions from memcached-backed tokens to fernet tokens, as well as transitions from one fernet key to another: 1. Add a new fernet_key_<keyid> entry. 2. Ensure all proxies have the new config with fernet_key_<keyid>. 3. Set active_fernet_key_id = <keyid>. 4. Ensure all proxies have the new config with the new active_fernet_key_id. This is similar to the key-rotation process for the encryption feature, except that old keys may be pruned following a token_life period. Additionally, opportunistically compress groups before minting tokens. Compressed tokens will begin with "zftk" but otherwise behave just like "ftk" tokens. Change-Id: I0bdc98765d05e91f872ef39d4722f91711a5641f |
||
|
Clay Gerrard
|
0e2791a88a |
Remove deprecated statsd label_mode
Hopefully if we never do a release that supports signalfx no one will ever use it and we won't have to maintain it. Drive-by: refactor label model dispatch to fix a weird bug where a config name could be a class attribute and blow up weird. Change-Id: I2c67b59820c5ca094077bf47628426f4b0445ba0 |
||
|
Tim Burke
|
7e5235894b |
stats: API for native labeled metrics
Introduce a LabeledStatsdClient API; no callers yet. Include three config options: - statsd_label_mode, which specifies which label format to use - statsd_emit_legacy, which dictates whether to emit old-style metrics dotted metrics - statsd_user_label_<name> = <value>, which supports user defined labels in restricted ASCII characters Co-Authored-By: yanxiao@nvidia.com Co-Authored-By: alistairncoles@gmail.com Change-Id: I115ffb1dc601652a979895d7944e011b951a91c1 |
||
|
Clay Gerrard
|
b69a2bef45 |
Deprecate expirer options
The following configuration options are deprecated: * expiring_objects_container_divisor * expiring_objects_account_name The upstream maintainers are not aware of any clusters where these have been configured to non-default values. UpgradeImpact: Operators are encouraged to remove their "container_divisor" setting and use the default value of 86400. If a cluster was deployed with a non-standard "account_name", operators should remove the option from all configs so they are using a supported configuration going forward, but will need to deploy stand-alone expirer processes with legacy expirer config to clean-up old expiration tasks from the previously configured account name. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Jianjian Huo <jhuo@nvidia.com> Change-Id: I5ea9e6dc8b44c8c5f55837debe24dd76be7d6248 |
||
|
Tim Burke
|
ae6300af86 |
wsgi: Reap stale workers (after a timeout) following a reload
Add a new tunable, `stale_worker_timeout`, defaulting to 86400 (i.e. 24 hours). Once this time elapses following a reload, the manager process will issue SIGKILLs to any remaining stale workers. This gives operators a way to configure a limit for how long old code and configs may still be running in their cluster. To enable this, the temporary reload child (which waits for the reload to complete then closes the accept socket on all the old workers) has grown the ability to send state to the re-exec'ed manager. Currently, this is limited to just the set of pre-re-exec child PIDs and their reload times, though it was designed to be reasonably extensible. This allows the new manager to recognize stale workers as they exit instead of logging Ignoring wait() result from unknown PID ... With the improved knowledge of subprocesses, we can kick the log level for the above message up from info to warning; we no longer expect it to trigger in practice. Drive-by: Add logging to ServersPerPortStrategy.register_worker_exit that's comparable to what WorkersStrategy does. Change-Id: I8227939d04fda8db66fb2f131f2c71ce8741c7d9 |
||
|
Zuul
|
94d3a5dee8 | Merge "obj: Add option to tune down etag validation in object-server" | ||
|
Tim Burke
|
3d8fb046cb |
obj: Add option to tune down etag validation in object-server
Historically, the object-server would validate the ETag of an object whenever it was streaming the complete object. This minimizes the possibility of returning corrupted data to clients, but - Clients that only ever make ranged requests get no benefit and - MD5 can be rather CPU-intensive; this is especially noticeable in all-flash clusters/policies where Swift is not disk-constrained. Add a new `etag_validate_pct` option to tune down this validation. This takes values from 100 (default; all whole-object downloads are validated) down to 0 (none are). Note that even with etag validation turned off, the object-auditor should eventually detect and quarantine corrupted objects. However, transient read errors may cause clients to download corrupted data. Hat-tip to Jianjian for all the profiling work! Co-Authored-By: Jianjian Huo <jhuo@nvidia.com> Change-Id: Iae48e8db642f6772114c0ae7c6bdd9c653cd035b |
||
|
Tim Burke
|
a55a48ffc8 |
docs: Call out that xprofile is not intended for production
Change-Id: I1e9d4d5df403040d69db93a08647cd0abe1b8037 |
||
|
Jianjian Huo
|
ea1d84c1d7 |
Object-server: add periodic greenthread yielding during file write
Currently, when object-server serves PUT request and DiskFile writer write file chunks to disk, there is no explicit eventlet sleep called. When network outpace the slow disk IO, it's possible one large and slow PUT request could cause eventlet hub not to schedule any other green threads for a long period of time. To improve this, this patch enable the configurable yield parameter 'cooperative_period' into object server controller write path. Related-Change: I80b04bad0601b6cd6caef35498f89d4ba70a4fd4 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I1c0aba9830433f093d024b4c39cd3a3b2f0d69f1 |
||
|
Zuul
|
7662cde704 | Merge "Add oldest failed async pending tracker" | ||
|
Chinemerem
|
0a5348eb48 |
Add oldest failed async pending tracker
In the past we have had some async pendings that repeatedly fail for months at a time. This patch adds an OldestAsyncPendingTracker class which manages the tracking of the oldest async pending updates for each account-container pair. This class maintains timestamps for pending updates associated with account-container pairs. It evicts the newest pairs when the max_entries is reached. It supports retrieving the N oldest pending updates or calculating the age of the oldest pending update. Change-Id: I6d9667d555836cfceda52708a57a1d29ebd1a80b |
||
|
Clay Gerrard
|
df22032d79 |
object-expirer: add round_robin_cache_size option
Drive-Bys: * DRY out redundent configuration examples in expiring objects overview documentation. * Add missing delay_reaping man page docs. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I8879dbd13527233c878dff764ec411ce9619ee39 |
||
|
Tim Burke
|
ef8764cb06 |
logging: Add UPDATE to valid http methods
We introduced this a while back, but forgot to add it then. Related-Change: Ia13ee5da3d1b5c536eccaadc7a6fdcd997374443 Change-Id: Ib65ddf50d7f5c3e27475626000943eb18e65c73a |
||
|
Alistair Coles
|
d555755423 |
proxy_logging config: unit tests and doc pointers
Add unit tests to verify the precedence of access_log_ and log_ prefixes to options. Add pointers from proxy_logging sections in other sample config files to the proxy-server.conf-sample file. Change-Id: Id18176d3790fd187e304f0e33e3f74a94dc5305c |
||
|
Thomas Goirand
|
90da23c7d2 |
kms_keymaster: allow specifying barbican_endpoint
Under a multi-region deployment with a single Keystone server, specifying the Keystone auth credentials isn't enough. Indeed, Castellan succeeds when logging-in, but may use the wrong Barbican endpoint (if there are 2 Barbican deployed). This is what happened to us, when deploying our 2nd region. They way to fix it would be to tell Castellan what region to use, unfortunately, there's no such option in Castellan. Though we may specify the barbican_endpoint, which is what this patch allows. Change-Id: Ib7f4219ef5fdef65e9cfd5701e28b5288741783e |
||
|
Zuul
|
d1aa735a37 | Merge "backend ratelimit: support per-method rate limits" | ||
|
Zuul
|
bf206ed2fe | Merge "backend ratelimit: support reloadable config file" | ||
|
Zuul
|
937af35e62 | Merge "object-expirer: add example to delay_reaping sample config" | ||
|
indianwhocodes
|
11eb17d3b2 |
support x-open-expired header for expired objects
If the global configuration option 'enable_open_expired' is set to true in the config, then the client will be able to make a request with the header 'x-open-expired' set to true in order to access an object that has expired, provided it is in its grace period. If this config flag is set to false, the client will not be able to access any expired objects, even with the header, which is the default behavior unless the flag is set. When a client sets a 'x-open-expired' header to a true value for a GET/HEAD/POST request the proxy will forward x-backend-open-expired to storage server. The storage server will allow clients that set x-backend-open-expired to open and read an object that has not yet been reaped by the object-expirer, even after the x-delete-at time has passed. The header is always ignored when used with temporary URLs. Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com> Related-Change: I106103438c4162a561486ac73a09436e998ae1f0 Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3 |
||
|
Alistair Coles
|
ce619137db |
object-expirer: add example to delay_reaping sample config
Add an example of a delay_reaping config option with quoted key. Change-Id: I0c7ead6795822ea0fb0e81abc1e4685d7946942c Related-Change: I106103438c4162a561486ac73a09436e998ae1f0 |
||
|
Mandell Degerness
|
5961ba0ca7 |
expirer: account and container level delay_reaping
The object expirer can be configured to delay the reaping of objects from disk after their expiration time using account and container level delay_reaping values. The delay_reaping value of accounts and containers in seconds is configured in the object server config. The object expirer references these configured values to only reap objects from specified accounts and containers after their corresponding delays. The goal of the delay_reaping feature is to prevent accidental or premature data loss if an object marked for deletion with the 'x-delete-at' feature should not be reaped immediately, for whatever reason. Configuring the delay_reaping value at a granular account and container level is beneficial for being able to keep storage capacity consumption in control while maintaining a desired data recovery window. This patch also adds a sample configuration, documentation, and tests for bad configurations and grace period functionality. Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com> Change-Id: I106103438c4162a561486ac73a09436e998ae1f0 |
||
|
Alistair Coles
|
3517ca453e |
backend ratelimit: support per-method rate limits
Add support for config options such as: head_requests_per_device_per_second = 100 Change-Id: I2936f799b6112155ff01dcd8e1f985849a1af178 |
||
|
Alistair Coles
|
e9abfd76ee |
backend ratelimit: support reloadable config file
Add support for a backend_ratelimit_conf_path option in the [filter:backend_ratelimit] config. If specified then the middleware will give precedence to config options from that file over config options from the [filter:backend_ratelimit] section. The path defaults to /etc/swift/backend-ratelimit.conf. The config file is periodically reloaded and any changed options are applied. The middleware will log a warning the first time it fails to load a config file that had previously been successfully loaded. The middleware also logs at info level when it first successfully loads a config file that had previously failed to be loaded. Otherwise, the middleware will log when a config file is loaded that results in the config being changed. Change-Id: I6554e37c6ab5b0a260f99b54169cb90ab5718f81 |
||
|
Tim Burke
|
6a426f7fa0 |
sharder: Add periodic_warnings_interval to example config
Change-Id: Ie3c64646373580b70557f2720a13a5a0c5ef7097 |
||
|
Zuul
|
07c8e8bcdc | Merge "Object-server: add periodic greenthread yielding during file read." | ||
|
Jianjian Huo
|
d5877179a5 |
Object-server: add periodic greenthread yielding during file read.
Currently, when object-server serves GET request and DiskFile reader iterate over disk file chunks, there is no explicit eventlet sleep called. When network outpace the slow disk IO, it's possible one large and slow GET request could cause eventlet hub not to schedule any other green threads for a long period of time. To improve this, this patch add a configurable sleep parameter into DiskFile reader, which is 'cooperative_period' with a default value of 0 (disabled). Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I80b04bad0601b6cd6caef35498f89d4ba70a4fd4 |
||
|
Alistair Coles
|
2500fbeea9 |
proxy: don't use recoverable_node_timeout with x-newest
Object GET requests with a truthy X-Newest header are not resumed if a backend request times out. The GetOrHeadHandler therefore uses the regular node_timeout when waiting for a backend connection response, rather than the possibly shorter recoverable_node_timeout. However, previously while reading data from a backend response the recoverable_node_timeout would still be used with X-Newest requests. This patch simplifies GetOrHeadHandler to never use recoverable_node_timeout when X-Newest is truthy. Change-Id: I326278ecb21465f519b281c9f6c2dedbcbb5ff14 |
||
|
Takashi Kajinami
|
bd64748a03 |
Document allowed_digests for formpost middleware
The allowed_digests option were added to the formpost middleware in
addition to the tempurl middleware[1], but the option was not added to
the formpost section in the example proxy config file.
[1]
|
||
|
Tim Burke
|
0c9b545ea7 |
docs: Clean up proxy logging docs
Change-Id: I6ef909e826d3901f24d3c42a78d2ab1e4e47bb64 |
||
|
Jianjian Huo
|
cb1e584e64 |
Object-server: keep SLO manifest files in page cache.
Currently, SLO manifest files will be evicted from page cache after reading it, which cause hard drives very busy when user requests a lot of parallel byte range GETs for a particular SLO object. This patch will add a new config 'keep_cache_slo_manifest', and try keeping the manifest files in page cache by not evicting them after reading if config settings allow so. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I557bd01643375d7ad68c3031430899b85908a54f |
||
|
Tim Burke
|
469c38e9fb |
wsgi: Add keepalive_timeout option
Clients sometimes hold open connections "just in case" they might later pipeline requests. This can cause issues for proxies, especially if operators restrict max_clients in an effort to improve response times for the requests that *do* get serviced. Add a new keepalive_timeout option to give proxies a way to drop these established-but-idle connections without impacting active connections (as may happen when reducing client_timeout). Note that this requires eventlet 0.33.4 or later. Change-Id: Ib5bb84fa3f8a4b9c062d58c8d3689e7030d9feb3 |
||
|
Zuul
|
5fae344ef4 | Merge "internal_client: Remove allow_modify_pipeline option" | ||
|
Matthew Oliver
|
e5105ffa09 |
internal_client: Remove allow_modify_pipeline option
The internal client is suppose to be internal to the cluster, and as such we rely on it to not remove any headers we decide to send. However if the allow_modify_pipeline option is set the gatekeeper middleware is added to the internal client's proxy pipeline. So firstly, this patch removes the allow_modify_pipeline option from the internal client constructor. And when calling loadapp allow_modify_pipeline is always passed with a False. Further, an op could directly put the gatekeeper middleware into the internal client config. The internal client constructor will now check the pipeline and raise a ValueError if one has been placed in the pipeline. To do this, there is now a check_gatekeeper_loaded staticmethod that will walk the pipeline which called from the InternalClient.__init__ method. Enabling this walking through the pipeline, we are now stashing the wsgi pipeline in each filter so that we don't have to rely on 'app' naming conventions to iterate the pipeline. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: Idcca7ac0796935c8883de9084d612d64159d9f92 |
||
|
Tim Burke
|
cbba65ac91 |
quotas: Add account-level per-policy quotas
Reseller admins can set new headers on accounts like X-Account-Quota-Bytes-Policy-<policy-name>: <quota> This may be done to limit consumption of a faster, all-flash policy, for example. This is independent of the existing X-Account-Meta-Quota-Bytes header, which continues to limit the total storage for an account across all policies. Change-Id: Ib25c2f667e5b81301f8c67375644981a13487cfe |
||
|
Zuul
|
0470994a03 | Merge "slo: Default allow_async_delete to true" | ||
|
Jianjian Huo
|
4ed2b89cb7 |
Sharder: warn when sharding appears to have stalled.
This patch add a configurable timeout after which the sharder will warn if a container DB has not completed sharding. The new config is container_sharding_timeout with a default of 172800 seconds (2 days). Drive-by fix: recording sharding progress will cover the case of shard range shrinking too. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I6ce299b5232a8f394e35f148317f9e08208a0c0f |
||
|
Zuul
|
8ab6af27c5 | Merge "proxy: Add a chance to skip memcache for get_*_info calls" | ||
|
Zuul
|
b05b27c0b6 | Merge "Add note about rsync_bwlimit suffixes" | ||
|
Tim Burke
|
5c6407bf59 |
proxy: Add a chance to skip memcache for get_*_info calls
If you've got thousands of requests per second for objects in a single container, you basically NEVER want that container's info to ever fall out of memcache. If it *does*, all those clients are almost certainly going to overload the container. Avoid this by allowing some small fraction of requests to bypass and refresh the cache, pushing out the TTL as long as there continue to be requests to the container. The likelihood of skipping the cache is configurable, similar to what we did for shard range sets. Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f Closes-Bug: #1883324 |
||
|
Zuul
|
24acc6e56b | Merge "Add backend rate limiting middleware" | ||
|
Tim Burke
|
a9177a4b9d |
Add note about rsync_bwlimit suffixes
Change-Id: I019451e118d3bd7263a52cf4bf354d0d0d2b4607 |
||
|
Tim Burke
|
f6196b0a22 |
AUTHORS/CHANGELOG for 2.30.0
Change-Id: If7c9e13fc62f8104ccb70a12b9c839f78e7e6e3e |
||
|
Zuul
|
5ff37a0d5e | Merge "DB Replicator: Add handoff_delete option" | ||
|
Matthew Oliver
|
bf4edefce4 |
DB Replicator: Add handoff_delete option
Currently the object-replicator has an option called `handoff_delete` which allows us to define the the number of replicas which are ensured in swift. Once a handoff node ensures that many successful responses it can go ahead and delete the handoff partition. By default it's 'auto' or rather the number of primary nodes. But this can be reduced. It's useful in draining full disks, but has to be used carefully. This patch adds the same option to the DB replicator and works the same way. But instead of deleting a partition it's done at the per DB level. Because it's done in the DB Replicator level it means the option is now available to both the Account and Container replicators. Change-Id: Ide739a6d805bda20071c7977f5083574a5345a33 |
||
|
Zuul
|
73b2730f71 | Merge "Add ring_ip option to object services" | ||
|
Clay Gerrard
|
12bc79bf01 |
Add ring_ip option to object services
This will be used when finding their own devices in rings, defaulting to the bind_ip. Notably, this allows services to be containerized while servers_per_port is enabled: * For the object-server, the ring_ip should be set to the host ip and will be used to discover which ports need binding. Sockets will still be bound to the bind_ip (likely 0.0.0.0), with the assumption that the host will publish ports 1:1. * For the replicator and reconstructor, the ring_ip will be used to discover which devices should be replicated. While bind_ip could previously be used for this, it would have required a separate config from the object-server. Also rename object deamon's bind_ip attribute to ring_ip so that it's more obvious wherever we're using the IP for ring lookups instead of socket binding. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: I1c9bb8086994f7930acd8cda8f56e766938c2218 |
||
|
Zuul
|
5398204f22 | Merge "tempurl: Deprecate sha1 signatures" |