52730e1037563ad8ba0e09da93886c856ed9e875
Commit Graph

693 Commits

Author SHA1 Message Date
Alistair Coles
18f20daf38 Add absolute values for shard shrinking config options
Add two new sharder config options for configuring shrinking
behaviour:
 - shrink_threshold: the size below which a shard may shrink
 - expansion_limit: the maximum size to which an acceptor shard
 may grow
The new options match the 'swift-manage-shard-ranges' command line
options and take absolute values.
The new options provide alternatives to the current equivalent options
'shard_shrink_point' and 'shard_shrink_merge_point', which are
expressed as percentages of 'shard_container_threshold'.
'shard_shrink_point' and 'shard_shrink_merge_point' are deprecated and
will be overridden by the new options if the new options are
explicitly set in a config file.
The default values of the new options are the same as the values that
would result from the default 'shard_container_threshold',
'shard_shrink_point' and 'shard_shrink_merge_point' i.e.:
 - shrink_threshold: 100000
 - expansion_limit: 750000
Change-Id: I087eac961c1eab53540fe56be4881e01ded1f60e
2021年05月20日 21:00:02 +01:00
Alistair Coles
f7fd99a880 Use ContainerSharderConf class in sharder and manage-shard-ranges
Change the swift-manage-shard-ranges default expansion-limit to equal
the sharder daemon default merge_size i.e 750000. The previous default
of 500000 had erroneously differed from the sharder default value.
Introduce a ContainerSharderConf class to encapsulate loading of
sharder conf and the definition of defaults. ContainerSharder inherits
this and swift-manage-shard-ranges instantiates it.
Rename ContainerSharder member vars to match the equivalent vars and
cli options in manage_shard_ranges:
 shrink_size -> shrink_threshold
 merge_size -> expansion_limit
 split_size -> rows_per_shard
(This direction of renaming is chosen so that the manage_shard_ranges
cli options are not changed.)
Rename ContainerSharder member vars to match the conf file option name:
 scanner_batch_size -> shard_scanner_batch_size
Remove some ContainerSharder member vars that were not used outside of
the __init__ method:
 shrink_merge_point
 shard_shrink_point
Change-Id: I8a58a82c08ac3abaddb43c11d26fda9fb45fe6c1
2021年05月20日 20:59:56 +01:00
Zuul
5ec3826246 Merge "Quarantine stale EC fragments after checking handoffs" 2021年05月11日 22:44:16 +00:00
Alistair Coles
46ea3aeae8 Quarantine stale EC fragments after checking handoffs
If the reconstructor finds a fragment that appears to be stale then it
will now quarantine the fragment. Fragments are considered stale if
insufficient fragments at the same timestamp can be found to rebuild
missing fragments, and the number found is less than or equal to a new
reconstructor 'quarantine_threshold' config option.
Before quarantining a fragment the reconstructor will attempt to fetch
fragments from handoff nodes in addition to the usual primary nodes.
The handoff requests are limited by a new 'request_node_count'
config option.
'quarantine_threshold' defaults to zero i.e. no fragments will be
quarantined. 'request node count' defaults to '2 * replicas'.
Closes-Bug: 1655608
Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816
2021年05月10日 20:45:17 +01:00
Matthew Oliver
4ce907a4ae relinker: Add /recon/relinker endpoint and drop progress stats
To further benefit the stats capturing for the relinker, drop partition
progress to a new relinker.recon recon cache and add a new recon endpoint:
 GET /recon/relinker
To gather get live relinking progress data:
 $ curl http://127.0.0.3:6030/recon/relinker |python -mjson.tool
 {
 "devices": {
 "sdb3": {
 "parts_done": 523,
 "policies": {
 "1": {
 "next_part_power": 11,
 "start_time": 1618998724.845616,
 "stats": {
 "errors": 0,
 "files": 1630,
 "hash_dirs": 1630,
 "linked": 1630,
 "policies": 1,
 "removed": 0
 },
 "timestamp": 1618998730.24672,
 "total_parts": 1029,
 "total_time": 5.400741815567017
 }},
 "start_time": 1618998724.845946,
 "stats": {
 "errors": 0,
 "files": 836,
 "hash_dirs": 836,
 "linked": 836,
 "removed": 0
 },
 "timestamp": 1618998730.24672,
 "total_parts": 523,
 "total_time": 5.400741815567017
 },
 "sdb7": {
 "parts_done": 506,
 "policies": {
 "1": {
 "next_part_power": 11,
 "part_power": 10,
 "parts_done": 506,
 "start_time": 1618998724.845616,
 "stats": {
 "errors": 0,
 "files": 794,
 "hash_dirs": 794,
 "linked": 794,
 "removed": 0
 },
 "step": "relink",
 "timestamp": 1618998730.166175,
 "total_parts": 506,
 "total_time": 5.320528984069824
 }
 },
 "start_time": 1618998724.845616,
 "stats": {
 "errors": 0,
 "files": 794,
 "hash_dirs": 794,
 "linked": 794,
 "removed": 0
 },
 "timestamp": 1618998730.166175,
 "total_parts": 506,
 "total_time": 5.320528984069824
 }
 },
 "workers": {
 "100": {
 "drives": ["sda1"],
 "return_code": 0,
 "timestamp": 1618998730.166175}
 }}
Also, add a constant DEFAULT_RECON_CACHE_PATH to help fix failing tests
by mocking recon_cache_path, so that errors are not logged due
to dump_recon_cache exceptions.
Mock recon_cache_path more widely and assert no error logs more
widely.
Change-Id: I625147dadd44f008a7c48eb5d6ac1c54c4c0ef05
2021年05月10日 16:13:32 +01:00
Tim Burke
c374a7a851 Allow floats for all intervals
Change-Id: I91e9bc02d94fe7ea6e89307305705c383087845a
2021年05月05日 15:30:21 -07:00
Zuul
e8580f0346 Merge "s3api: Add config option to return 429s on ratelimit" 2021年04月13日 01:42:12 +00:00
Tim Burke
abfa6bee72 relinker: Parallelize per disk
Add a new option, workers, that works more or less like the same option
from background daemons. Disks will be distributed across N worker
sub-processes so we can make the best use of the I/O available.
While we're at it, log final stats at warning if there were errors.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I039d2b8861f69a64bd9d2cdf68f1f534c236b2ba
2021年04月05日 12:15:56 -07:00
Zuul
7594d97f38 Merge "relinker: retry links from older part powers" 2021年04月02日 00:27:22 +00:00
Alistair Coles
3bdd01cf4a relinker: retry links from older part powers
If a previous partition power increase failed to cleanup all files in
their old partition locations, then during the next partition power
increase the relinker may find the same file to relink in more than
one source partition. This currently leads to an error log due to the
second relink attempt getting an EEXIST error.
With this patch, when an EEXIST is raised, the relinker will attempt
to create/verify a link from older partition power locations to the
next part power location, and if such a link is found then suppress
the error log.
During the relink step, if an alternative link is verified and if a
file is found that is neither linked to the next partition power
location nor in the current part power location, then the file is
removed during the relink step. That prevents the same EEXIST occuring
again during the cleanup step when it may no longer be possible to
verify that an alternative link exists.
For example, consider identical filenames in the N+1th, Nth and N-1th
partition power locations, with the N+1th being linked to the Nth:
 - During relink, the Nth location is visited and its link is
 verified. Then the N-1th location is visited and an EEXIST error
 is encountered, but the new check verifies that a link exists to
 the Nth location, which is OK.
 - During cleanup the locations are visited in the same order, but
 files are removed so that the Nth location file no longer exists
 when the N-1th location is visited. If the N-1th location still
 has a conflicting file then existence of an alternative link to
 the Nth location can no longer be verified, so an error would be
 raised. Therefore, the N-1th location file must be removed during
 relink.
The error is only suppressed for tombstones. The number of partition
power location that the relinker will look back over may be configured
using the link_check_limit option in a conf file or --link-check-limit
on the command line, and defaults to 2.
Closes-Bug: 1921718
Change-Id: If9beb9efabdad64e81d92708f862146d5fafb16c
2021年04月01日 18:56:57 +01:00
Alistair Coles
71a4aea31a Update docs to discourage policy names being numbers
There are times when it is convenient to specify a policy by name or
by index (see Related-Change), but policy names can unfortunately
collide with indexes. Using a number as a policy name should at least
be discouraged.
Change-Id: I0cdd3b86b527d6656b7fb50c699e3c0cc566e732
Related-Change: Icf1517bd930c74e9552b88250a7b4019e0ab413e
2021年03月26日 09:17:34 +00:00
Tim Burke
e35365df51 s3api: Add config option to return 429s on ratelimit
Change-Id: If04c083ccc9f63696b1f53ac13edc932740a0654
2021年03月17日 10:58:58 -07:00
Zuul
310298a948 Merge "s3api: Allow CORS preflight requests" 2021年03月16日 04:20:52 +00:00
Tim Burke
27a734c78a s3api: Allow CORS preflight requests
Unfortunately, we can't identify the user, so we can't map to an
account, so we can't respect whatever CORS metadata might be set on the
container.
As a result, the allowed origins must be configured cluster-wide. Add a
new config option, cors_preflight_allow_origin, for that; default it
to blank (ie, deny preflights from all origins, preserving existing
behavior), but allow either a comma-separated list of origins or
* (to allow all origins).
Change-Id: I985143bf03125a05792e79bc5e5f83722d6431b3
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
2021年03月15日 13:52:05 -07:00
Matthew Oliver
fb186f6710 Add a config file option to swift-manage-shard-ranges
While working on the shrinking recon drops, we want to display numbers
that directly relate to how tool should behave. But currently all
options of the s-m-s-r tool is driven by cli options.
This creates a disconnect, defining what should be used in the sharder
and in the tool via options are bound for failure. It would be much
better to be able to define the required default options for your
environment in one place that both the sharder and tool could use.
This patch does some refactoring and adding max_shrinking and
max_expanding options to the sharding config. As well as adds a
--config option to the tool.
The --config option expects a config with at '[container-sharder]'
section. It only supports the shard options:
 - max_shrinking
 - max_expanding
 - shard_container_threshold
 - shard_shrink_point
 - shard_merge_point
The latter 2 are used to generate the s-m-s-r's:
 - shrink_threshold
 - expansion_limit
 - rows_per_shard
Use of cli arguments take precedence over that of the config.
Change-Id: I4d0147ce284a1a318b3cd88975e060956d186aec
2021年03月12日 10:49:46 +11:00
Zuul
5c3eb488f2 Merge "Report final in_progress when sharding is complete" 2021年02月26日 18:42:36 +00:00
Matthew Oliver
1de9834816 Report final in_progress when sharding is complete
On every sharder cycle up update in progress recon stats for each sharding
container. However, we tend to not run it one final time once sharding
is complete because the DB state is changed to SHARDED and therefore the
in_progress stats never get their final update.
For those collecting this data to monitor, this makes sharding/cleaving shards
never complete.
This patch, adds a new option `recon_shared_timeout` which will now
allow sharded containers to be processed by `_record_sharding_progress()`
after they've finished sharding for an amount of time.
Change-Id: I5fa39d41f9cd3b211e45d2012fd709f4135f595e
2021年02月26日 15:56:30 +00:00
Zuul
0c2cc63b59 Merge "tempauth: Add .reseller_reader group" 2021年02月26日 00:26:14 +00:00
Tim Burke
53c0fc3403 relinker: Add option to ratelimit relinking
Sure, you could use stuff like ionice or cgroups to limit relinker I/O,
but sometimes a nice simple blunt instrument is handy.
Change-Id: I7fe29c7913a9e09bdf7a787ccad8bba2c77cf995
2021年02月11日 11:31:39 -08:00
Tim Burke
cf4f320644 tempauth: Add .reseller_reader group
Change-Id: I8c5197ed327fbb175c8a2c0e788b1ae14e6dfe23
2021年02月09日 16:35:03 -08:00
Zuul
0c072e244c Merge "relinker: Allow conf files for configuration" 2021年02月09日 14:26:41 +00:00
Pete Zaitcev
98a0275a9d Add a read-only role to keystoneauth
An idea was floated recently of a read-only role that can be used for
cluster-wide audits, and is otherwise safe. It was also included into
the "Consistent and Secure Default Policies" effort in OpenStack,
where it implements "reader" personas in system, domain, and project
scopes. This patch implements it for system scope, where it's most
useful for operators.
Change-Id: I5f5fff2e61a3e5fb4f4464262a8ea558a6e7d7ef
2021年02月08日 22:02:17 -06:00
Tim Burke
1b7dd34d38 relinker: Allow conf files for configuration
Swap out the standard logger stuff in place of --logfile. Keep --device
as a CLI-only option. Everything else is pretty standard stuff that
ought to be in [DEFAULT].
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I32f979f068592eaac39dcc6807b3114caeaaa814
2021年02月08日 14:39:27 -08:00
Zuul
48b26ba833 Merge "docs: Clarify that encryption should not be in reconciler pipeline" 2021年01月22日 15:35:16 +00:00
Tim Burke
13c0980e71 docs: Clarify that encryption should not be in reconciler pipeline
UpgradeImpact
=============
Operators should verify that encryption is not enabled in their
reconciler pipelines; having it enabled there may harm data durability.
For more information, see https://launchpad.net/bugs/1910804
Change-Id: I1a1d78ed91d940ef0b4eba186dcafd714b4fb808
Closes-Bug: #1910804 
2021年01月21日 15:39:35 -06:00
Alistair Coles
6896f1f54b s3api: actually execute check_pipeline in real world
Previously, S3ApiMiddleware.check_pipeline would always exit early
because the __file__ attribute of the Config instance passed to
check_pipeline was never set. The __file__ key is typically passed to
the S3ApiMiddleware constructor in the wsgi config dict, so this dict
is now passed to check_pipeline() for it to test for the existence of
__file__.
Also, the use of a Config object is replaced with a dict where it
mimics the wsgi conf object in the unit tests setup.
UpgradeImpact
=============
The bug prevented the pipeline order checks described in
proxy-server.conf-sample being made on the proxy-server pipeline when
s3api middleware was included. With this change, these checks will now
be made and an invalid pipeline configuration will result in a
ValueError being raised during proxy-server startup.
A valid pipeline has another middleware (presumed to be an auth
middleware) between s3api and the proxy-server app. If keystoneauth is
found, then a further check is made that s3token is configured after
s3api and before keystoneauth.
The pipeline order checks can be disabled by setting the s3api
auth_pipeline_check option to False in proxy-server.conf. This
mitigation is recommended if previously operating with what will now
be considered an invalid pipeline.
The bug also prevented a check for slo middleware being in the
pipeline between s3api and the proxy-server app. If the slo middleware
is not found then multipart uploads will now not be supported,
regardless of the value of the allow_multipart_uploads option
described in proxy-server.conf-sample. In this case a warning will be
logged during startup but no exception is raised.
Closes-Bug: #1912391
Change-Id: I357537492733b97e5afab4a7b8e6a5c527c650e4
2021年01月19日 20:22:43 +00:00
Tim Burke
10d9a737d8 s3api: Make allowable clock skew configurable
While we're at it, make the default match AWS's 15 minute limit (instead
of our old 5 minute limit).
UpgradeImpact
=============
This (somewhat) weakens some security protections for requests over the
S3 API; operators may want to preserve the prior behavior by setting
 allowable_clock_skew = 300
in the [filter:s3api] section of their proxy-server.conf
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I0da777fcccf056e537b48af4d3277835b265d5c9
2021年01月14日 10:40:23 +00:00
Zuul
d5bb644a17 Merge "Use cached shard ranges for container GETs" 2021年01月08日 20:50:45 +00:00
Zuul
8c611be876 Merge "Memcached client TLS support" 2021年01月07日 11:35:34 +00:00
Grzegorz Grasza
6930bc24b2 Memcached client TLS support
This patch specifies a set of configuration options required to build
a TLS context, which is used to wrap the client connection socket.
Closes-Bug: #1906846
Change-Id: I03a92168b90508956f367fbb60b7712f95b97f60
2021年01月06日 09:47:38 -08:00
Alistair Coles
077ba77ea6 Use cached shard ranges for container GETs
This patch makes four significant changes to the handling of GET
requests for sharding or sharded containers:
 - container server GET requests may now result in the entire list of
 shard ranges being returned for the 'listing' state regardless of
 any request parameter constraints.
 - the proxy server may cache that list of shard ranges in memcache
 and the requests environ infocache dict, and subsequently use the
 cached shard ranges when handling GET requests for the same
 container.
 - the proxy now caches more container metadata so that it can
 synthesize a complete set of container GET response headers from
 cache.
 - the proxy server now enforces more container GET request validity
 checks that were previously only enforced by the backend server,
 e.g. checks for valid request parameter values
With this change, when the proxy learns from container metadata
that the container is sharded then it will cache shard
ranges fetched from the backend during a container GET in memcache.
On subsequent container GETs the proxy will use the cached shard
ranges to gather object listings from shard containers, avoiding
further GET requests to the root container until the cached shard
ranges expire from cache.
Cached shard ranges are most useful if they cover the entire object
name space in the container. The proxy therefore uses a new
X-Backend-Override-Shard-Name-Filter header to instruct the container
server to ignore any request parameters that would constrain the
returned shard range listing i.e. 'marker', 'end_marker', 'includes'
and 'reverse' parameters. Having obtained the entire shard range
listing (either from the server or from cache) the proxy now applies
those request parameter constraints itself when constructing the
client response.
When using cached shard ranges the proxy will synthesize response
headers from the container metadata that is also in cache. To enable
the full set of container GET response headers to be synthezised in
this way, the set of metadata that the proxy caches when handling a
backend container GET response is expanded to include various
timestamps.
The X-Newest header may be used to disable looking up shard ranges
in cache.
Change-Id: I5fc696625d69d1ee9218ee2a508a1b9be6cf9685
2021年01月06日 16:28:49 +00:00
Samuel Merritt
b971280907 Let developers/operators add watchers to object audit
Swift operators may find it useful to operate on each object in their
cluster in some way. This commit provides them a way to hook into the
object auditor with a simple, clearly-defined boundary so that they
can iterate over their objects without additional disk IO.
For example, a cluster operator may want to ensure a semantic
consistency with all SLO segments accounted in their manifests,
or locate objects that aren't in container listings. Now that Swift
has encryption support, this could be used to locate unencrypted
objects. The list goes on.
This commit makes the auditor locate, via entry points, the watchers
named in its config file.
A watcher is a class with at least these four methods:
 __init__(self, conf, logger, **kwargs)
 start(self, audit_type, **kwargs)
 see_object(self, object_metadata, data_file_path, **kwargs)
 end(self, **kwargs)
The auditor will call watcher.start(audit_type) at the start of an
audit pass, watcher.see_object(...) for each object audited, and
watcher.end() at the end of an audit pass. All method arguments are
passed as keyword args.
This version of the API is implemented on the context of the
auditor itself, without spawning any additional processes.
If the plugins are not working well -- hang, crash, or leak --
it's easier to debug them when there's no additional complication
of processes that run by themselves.
In addition, we include a reference implementation of plugin for
the watcher API, as a help to plugin writers.
Change-Id: I1be1faec53b2cdfaabf927598f1460e23c206b0a
2020年12月26日 17:16:14 -06:00
Zuul
ebfc3a61fa Merge "Use socket_timeout kwarg instead of useless eventlet.wsgi.WRITE_TIMEOUT" 2020年11月18日 02:19:01 +00:00
Zuul
cd228fafad Merge "Add a new URL parameter to allow for async cleanup of SLO segments" 2020年11月18日 00:50:54 +00:00
Tim Burke
918ab8543e Use socket_timeout kwarg instead of useless eventlet.wsgi.WRITE_TIMEOUT
No version of eventlet that I'm aware of hasany sort of support for
eventlet.wsgi.WRITE_TIMEOUT; I don't know why we've been setting that.
On the other hand, the socket_timeout argument for eventlet.wsgi.Server
has been supported for a while -- since 0.14 in 2013.
Drive-by: Fix up handling of sub-second client_timeouts.
Change-Id: I1dca3c3a51a83c9d5212ee5a0ad2ba1343c68cf9
Related-Change: I1d4d028ac5e864084a9b7537b140229cb235c7a3
Related-Change: I433c97df99193ec31c863038b9b6fd20bb3705b8
2020年11月11日 14:23:40 -08:00
Tim Burke
e78377624a Add a new URL parameter to allow for async cleanup of SLO segments
Add a new config option to SLO, allow_async_delete, to allow operators
to opt-in to this new behavior. If their expirer queues get out of hand,
they can always turn it back off.
If the option is disabled, handle the delete inline; this matches the
behavior of old Swift.
Only allow an async delete if all segments are in the same container and
none are nested SLOs, that way we only have two auth checks to make.
Have s3api try to use this new mode if the data seems to have been
uploaded via S3 (since it should be safe to assume that the above
criteria are met).
Drive-by: Allow the expirer queue and swift-container-deleter to use
high-precision timestamps.
Change-Id: I0bbe1ccd06776ef3e23438b40d8fb9a7c2de8921
2020年11月10日 18:22:01 +00:00
Zuul
2593f7f264 Merge "memcache: Make error-limiting values configurable" 2020年11月07日 01:32:38 +00:00
Tim Burke
aff65242ff memcache: Make error-limiting values configurable
Previously these were all hardcoded; let operators tweak them as needed.
Significantly, this also allows operators to disable error-limiting
entirely, which may be a useful protection in case proxies are
configured with a single memcached server.
Use error_suppression_limit and error_suppression_interval to mirror the
option names used by the proxy-server to ratelimit backend Swift
servers.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: Ife005cb8545dd966d7b0e34e5496a0354c003881
2020年11月05日 23:37:24 +00:00
Zuul
b9a404b4d1 Merge "ec: Add an option to write fragments with legacy crc" 2020年11月02日 23:03:49 +00:00
Clay Gerrard
b05ad82959 Add tasks_per_second option to expirer
This allows operators to throttle expirers as needed.
Partial-Bug: #1784753
Change-Id: If75dabb431bddd4ad6100e41395bb6c31a4ce569
2020年10月23日 10:24:52 -05:00
Tim Burke
599f63e762 ec: Add an option to write fragments with legacy crc
When upgrading from liberasurecode<=1.5.0, you may want to continue
writing legacy CRCs until all nodes are upgraded and capabale of reading
fragments with zlib CRCs.
Starting in liberasurecode>=1.6.2, we can use the environment variable
LIBERASURECODE_WRITE_LEGACY_CRC to control whether we write zlib or
legacy CRCs, but for many operators it's easier to manage swift configs
than environment variables. Add a new option, write_legacy_ec_crc, to the
proxy-server app and object-reconstructor; if set to true, ensure legacy
frags are written.
Note that more daemons instantiate proxy-server apps than just the
proxy-server. The complete set of impacted daemons should be:
 * proxy-server
 * object-reconstructor
 * container-reconciler
 * any users of internal-client.conf
UpgradeImpact
=============
To ensure a smooth liberasurecode upgrade:
 1. Determine whether your cluster writes legacy or zlib CRCs. Depending
 on the order in which shared libraries are loaded, your servers may
 already be reading and writing zlib CRCs, even with old
 liberasurecode. In that case, no special action is required and
 WRITING LEGACY CRCS DURING THE UPGRADE WILL CAUSE AN OUTAGE.
 Just upgrade liberasurecode normally. See the closed bug for more
 information and a script to determine which CRC is used.
 2. On all nodes, ensure Swift is upgraded to a version that includes
 write_legacy_ec_crc support and write_legacy_ec_crc is enabled on
 all daemons.
 3. On each node, upgrade liberasurecode and restart Swift services.
 Because of (2), they will continue writing legacy CRCs which will
 still be readable by nodes that have not yet upgraded.
 4. Once all nodes are upgraded, remove the write_legacy_ec_crc option
 from all configs across all nodes. After restarting daemons, they
 will write zlib CRCs which will also be readable by all nodes.
Change-Id: Iff71069f808623453c0ff36b798559015e604c7d
Related-Bug: #1666320
Closes-Bug: #1886088
Depends-On: https://review.opendev.org/#/c/738959/ 
2020年09月30日 16:49:59 -07:00
Clay Gerrard
754defc39c Client should retry when there's just one 404 and a bunch of errors
During a rebalance, it's expected that we may get a 404 for data that
does exist elsewhere in the cluster. Normally this isn't a problem; the
proxy sees the 404, keeps digging, and one of the other primaries will
serve the response.
Previously, if the other replicas were heavily loaded, the proxy would
see a bunch of timeouts and the fresh (empty) primary, treat the 404 as
good, and send that on to the client.
Now, have the proxy throw out that first 404 (provided it doesn't have a
timestamp); it will then return a 503 to the client, indicating that it
should try again.
Add a new (per-policy) proxy-server config option,
rebalance_missing_suppression_count; operators may use this to increase
the number of 404-no-timestamp responses to discard if their rebalances
are going faster than replication can keep up, or set it to zero to
return to the previous behavior.
Change-Id: If4bd39788642c00d66579b26144af8f116735b4d
2020年09月08日 14:33:09 -07:00
Zuul
cca5e8b1de Merge "Make all concurrent_get options per-policy" 2020年09月04日 19:31:12 +00:00
Zuul
20e1544ad8 Merge "Extend concurrent_gets to EC GET requests" 2020年09月04日 14:22:20 +00:00
Clay Gerrard
f043aedec1 Make all concurrent_get options per-policy
Change-Id: Ib81f77cc343c3435d7e6258d4631563fa022d449
2020年09月02日 12:11:49 -05:00
Zuul
7015ac2fdc Merge "py3: Work with proper native string paths in crypto meta" 2020年08月30日 04:11:59 +00:00
Clay Gerrard
8f60e0a260 Extend concurrent_gets to EC GET requests
After the initial requests are started, if the proxy still does not have
enough backend responses to return a client response additional requests
will be spawned to remaining primaries at the frequency configured by
the concurrency_timeout.
A new tunable concurrent_ec_extra_requests allows operators to control
how many requests to backend fragments are started immediately with a
client request to an object stored in an EC storage policy. By default
the minimum ndata backend requests are started immediately, but
operators may increase concurrent_ec_extra_requests up to nparity which
is similar in effect to a concurrency_timeout of 0.
Change-Id: Ia0a9398107a400815be2e0097b1b8e76336a0253
2020年08月24日 13:30:44 -05:00
Zuul
a4f2252e2b Merge "proxy-logging: Be able to configure log_route" 2020年08月20日 04:59:23 +00:00
Zuul
50800aba37 Merge "Update SAIO & docker image to use 62xx ports" 2020年08月01日 02:39:00 +00:00
Tim Burke
7d429318dd py3: Work with proper native string paths in crypto meta
Previously, we would work with these paths as WSGI strings -- this would
work fine when all data were read and written on the same major version
of Python, but fail pretty badly during and after upgrading Python.
In particular, if a py3 proxy-server tried to read existing data that
was written down by a py2 proxy-server, it would hit an error and
respond 500. Worse, if an un-upgraded py2 proxy tried to read data that
was freshly-written by a py3 proxy, it would serve corrupt data back to
the client (including a corrupt/invalid ETag and Content-Type).
Now, ensure that both py2 and py3 write down paths as native strings.
Make an effort to still work with WSGI-string metadata, though it can be
ambiguous as to whether a string is a WSGI string or not. The heuristic
used is if
 * the path from metadata does not match the (native-string) request
 path and
 * the path from metadata (when interpreted as a WSGI string) can be
 "un-wsgi-fied" without any encode/decode errors and
 * the native-string path from metadata *does* match the native-string
 request path
then trust the path from the request. By contrast, we usually prefer the
path from metadata in case there was a pipeline misconfiguration (see
related bug).
Add the ability to read and write a new, unambiguous version of metadata
that always has the path as a native string. To support rolling
upgrades, a new config option is added: meta_version_to_write. This
defaults to 2 to support rolling upgrades without configuration changes,
but the default may change to 3 in a future release.
UpgradeImpact
=============
When upgrading from Swift 2.20.0 or Swift 2.19.1 or earlier, set
 meta_version_to_write = 1
in your keymaster's configuration. Regardless of prior Swift version, set
 meta_version_to_write = 3
after upgrading all proxy servers.
When switching from Python 2 to Python 3, first upgrade Swift while on
Python 2, then upgrade to Python 3.
Change-Id: I00c6693c42c1a0220b64d8016d380d5985339658
Closes-Bug: #1888037
Related-Bug: #1813725 
2020年07月29日 17:33:54 -07:00