94d3a5dee8122e14be0b00694d65c448ce47ea31
Commit Graph

366 Commits

Author SHA1 Message Date
Chinemerem
5281af5cf2 Add object_updater_last stat
Change-Id: I22674f2e887bdeeffe325efd2898fb90faa4235f
2024年12月19日 11:10:52 -08:00
Chinemerem
af57922cd8 Aggregate per-disk recon stats
Address an issue where `OldestAsyncManager` instances created before forking resulted in each child process maintaining its own isolated copy-on-write stats, leaving the parent process with an empty/unused instance. This caused the final `dump_recon` call at the end of `run_forever` to report no meaningful telemetry.
The fix aggregates per-disk recon stats collected by each child process. This is done by loading recon cache data from all devices, consolidating key metrics, and writing the aggregated stats back to the recon cache.
Change-Id: I70a60ae280e4fccc04ff5e7df9e62b18d916421e
2024年12月19日 02:02:41 -08:00
Zuul
f9a3f142ab Merge "Make OldestAsyncPendingTracker timestamp float" 2024年12月09日 23:05:20 +00:00
Chinemerem
83528de743 Make OldestAsyncPendingTracker timestamp float
Previously, the OldestAsyncPendingTracker timestamp was stored as a string. This change updates it to be stored as a float.
UpgradeImpact: This will require an additional change to the recon parsers in order to process the timestamp as a float.
Change-Id: Iba43783e880e0860357ba8b9f0a11f28abf87555
2024年12月09日 10:30:08 -08:00
Zuul
1d240aa86c Merge "Rename probe/test_mixed_policy_upload.py to test_mpu.py" 2024年12月03日 18:09:49 +00:00
Alistair Coles
e751ccfb27 Add probe test for reconciler with object versioning
Verify that the reconciler is able to move objects to a container that
has, or will have, versioning enabled, and versions of the moved object
can subsequently be created.
Change-Id: I019447614ebadbb9e2cc8a18c0369bc16a89c0d9
2024年11月26日 18:09:02 +00:00
Alistair Coles
66e4887c91 Rename probe/test_mixed_policy_upload.py to test_mpu.py
Give this test file a less specific name in anticipation of further
mpu-related probe tests being added.
Change-Id: Iea01928e1daad25f3425f486b9dda4c6fb58510c
2024年11月18日 11:37:48 +00:00
Zuul
89815389d5 Merge "probe tests: Set default timeout for subprocesses" 2024年11月15日 02:37:19 +00:00
Tim Burke
32fbf9eaae probe tests: Set default timeout for subprocesses
There's an upstream eventlet bug that seems to cause process hangs
during an atexit hook; unfortunately, that means that every time we
call "once" in probe tests, we can hang indefinitely waiting for a
process that won't terminate.
See https://github.com/eventlet/eventlet/issues/989
Now, wait with a timeout; if it pops, kill the offending process and
hope for the best. Do this by patching out subprocess.Popen.wait, but
only in probe tests -- this ensures that we won't impact any real
systems, while also ensuring a broad coverage of probe tests (as
opposed to, say, plumbing some new wait_timeout kwarg into all the
Manager call sites).
Closes-Bug: #2088027
Change-Id: I8983eafbb575d73d1654c354815a7de7ae141873
2024年11月14日 10:32:33 -08:00
Zuul
71696d3a83 Merge "Remove PrefixLoggerAdapter and SwiftLoggerAdapter" 2024年11月14日 12:51:08 +00:00
Shreeya Deshpande
f88efdb4df Remove PrefixLoggerAdapter and SwiftLoggerAdapter
In order to modernize swift's statsd configuration we're working to
separate it from logging. This change is a pre-requisite for the
Related-Change in order to simplfy the stdlib base logger instance
wrapping in a single extended SwiftLogAdapter (previously LogAdapter)
which supports all the features swift's servers/daemons need
from our logger instance interface.
Related-Change-Id: I44694b92264066ca427bb96456d6f944e09b31c0
Change-Id: I8988c0add6bb4a65cc8be38f0bf527f141aac48a
2024年11月13日 15:40:41 -05:00
Tim Burke
b262b16d55 probe tests: Run relinker via subprocess
Otherwise, test-run state (eventlet.tpool initialization, in particular)
can bleed over and cause hangs.
Closes-Bug: #2088026
Change-Id: I4f7dd0755b8dc8806d9b9046ac192d94ca705383
2024年11月12日 15:39:19 -08:00
Zuul
7662cde704 Merge "Add oldest failed async pending tracker" 2024年11月05日 08:22:10 +00:00
Chinemerem
0a5348eb48 Add oldest failed async pending tracker
In the past we have had some async pendings that repeatedly fail for months at a time. This patch adds an OldestAsyncPendingTracker class which manages the tracking of the oldest async pending updates for each account-container pair. This class maintains timestamps for pending updates associated with account-container pairs. It evicts the newest pairs when the max_entries is reached. It supports retrieving the N oldest pending updates or calculating the age of the oldest pending update.
Change-Id: I6d9667d555836cfceda52708a57a1d29ebd1a80b
2024年11月01日 15:49:53 -07:00
Anish Kachinthaya
4f69ab3c5d fix x-open-expired 404 on HEAD?part-number reqs
Fixes a bug with the x-open-expired feature where our magic header
does not get copied when refetching all manifests that causes
404 on HEAD requests with part-number=N query parameter since
the object-server returns an empty response body and the
proxy needs to refetch. The fix also applies to segment GET
requests if the segments have expired.
Change-Id: If0382d433f73cc0333bb4d0319fe1487b7783e4c
2024年10月18日 18:33:09 +00:00
Matthew Oliver
e8affa7db5 Pass db_state to container-update and async_pending
When the proxy passes the container-update headers to the object server
now include the db_state, which it already had in hand. This will be
written to async_pending and allow the object-updater to know more about
a container rather then just relying on container_path attribute.
This patch also cleans up the PUT, POST and DELETE _get_update_target
paths refactoring the call into _backend_requests, only used by these
methods, so it only happens once.
Change-Id: Ie665e5c656c7fb27b45ee7427fe4b07ad466e3e2
2024年07月12日 20:46:14 -05:00
Anish Kachinthaya
3637b1abd9 add bytes of expiring objects to queue entry
The size in bytes from object metadata of expiring objects are stored in
expirey queue entries under the content_type field.
The x-content-type-timestamp take from object metadata is provided along
with the x-content-type update so the container replicator resolves the
latest content-type and ensures eventual consistency.
UpgradeImpact: During rolling upgrades you should expect expirer queue
entries to continue lacking swift_expirer_bytes= annotations until ALL
object servers replicas have been upgraded to new code.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Ie4b25f1bd16def4069878983049b83de06f68e54
2024年06月13日 15:47:51 -05:00
Clay Gerrard
a666010aae Lazy import is not needed
There was an abandoned change that made reference to a RecussionError
when running a probe test that imported boto3 that had something to do
with eventlet, ssl and a transitive dependency on requests-mock, but the
fix that actually got merged seemed to depend on another change to
tox.ini that disables request-mock when we run pytest.
Either way, we already import from boto3 at the top of probe tests and
it's in test-requirements; so we require it to be installed even if you
don't have s3api in your pipeline.
Related-Change: I789b257635c031ac0cb6e4b5980f741e0cb5244d
Related-Change: I2793e335a08ad373c49cbbe6759d4e97cc420867
Related-Change: If14e4d2c1af2efcbc99e9b6fe10973a7eb94d589
Change-Id: Id2662bfc5ef2f21f901f1c98e6389c4cb01818a2
2024年06月04日 12:42:20 -05:00
indianwhocodes
11eb17d3b2 support x-open-expired header for expired objects
If the global configuration option 'enable_open_expired' is set
to true in the config, then the client will be able to make a
request with the header 'x-open-expired' set to true in order
to access an object that has expired, provided it is in its
grace period. If this config flag is set to false, the client
will not be able to access any expired objects, even with the
header, which is the default behavior unless the flag is set.
When a client sets a 'x-open-expired' header to a true value for a
GET/HEAD/POST request the proxy will forward x-backend-open-expired to
storage server. The storage server will allow clients that set
x-backend-open-expired to open and read an object that has not yet
been reaped by the object-expirer, even after the x-delete-at time
has passed.
The header is always ignored when used with temporary URLs.
Co-Authored-By: Anish Kachinthaya <akachinthaya@nvidia.com>
Related-Change: I106103438c4162a561486ac73a09436e998ae1f0
Change-Id: Ibe7dde0e3bf587d77e14808b169c02f8fb3dddb3
2024年04月26日 10:13:40 +01:00
Zuul
4d3f9fe952 Merge "sharding: don't replace own_shard_range without an epoch" 2024年02月08日 01:04:58 +00:00
Matthew Oliver
8227f4539c sharding: don't replace own_shard_range without an epoch
We've observed a root container suddenly thinks it's unsharded when it's
own_shard_range is reset. This patch blocks a remote osr with an epoch
of None from overwriting a local epoched OSR.
The only way we've observed this happen is when a new replica or handoff
node creates a container and it's new own_shard_range is created without
an epoch and then replicated to older primaries.
However, if a bad node with a non-epoched OSR is on a primary, it's
newer timestamp would prevent pulling the good osr from it's peers. So
it'll be left stuck with it's bad one.
When this happens expect to see a bunch of:
 Ignoring remote osr w/o epoch: x, from: y
When an OSR comes in from a replica that doesn't have an epoch when
it should, we do a pre-flight check to see if it would remove the epoch
before emitting the error above. We do this because when sharding is
first initiated it's perfectly valid to get OSR's without epochs from
replicas. This is expected and harmless.
Closes-bug: #1980451
Change-Id: I069bdbeb430e89074605e40525d955b3a704a44f
2024年02月07日 13:37:58 -08:00
Zuul
486fb23447 Merge "proxy: only use listing shards cache for 'auto' listings" 2024年02月01日 11:59:47 +00:00
Alistair Coles
252f0d36b7 proxy: only use listing shards cache for 'auto' listings
The proxy should NOT read or write to memcache when handling a
container GET that explicitly requests 'shard' or 'object' record
type. A request for 'shard' record type may specify 'namespace'
format, but this request is unrelated to container listings or object
updates and passes directly to the backend.
This patch also removes unnecessary JSON serialisation and
de-serialisation of namespaces within the proxy GET path when a
sharded object listing is being built. The final response body will
contain a list of objects so there is no need to write intermediate
response bodies with a list of namespaces.
Requests that explicitly specify record type of 'shard' will of
course still have the response body with serialised shard dicts that
is returned from the backend.
Change-Id: Id79c156432350c11c52a4004d69b85e9eb904ca6
2024年01月31日 11:02:54 +00:00
Zuul
569525a937 Merge "tests: Get test_handoff_non_durable passing with encryption enabled" 2024年01月18日 08:47:36 +00:00
Tim Burke
7e3925aa9c tests: Fix probe test when encryption is enabled
Change-Id: I94e8cfd154aa058d91255efc87776224a919f572
2024年01月17日 10:19:08 -08:00
Matthew Oliver
03b66c94f4 Proxy: Use namespaces when getting listing/updating shards
With the Related-Change, container servers can return a list Namespace
objects in response to a GET request. This patch modifies the proxy
to take advantage of this when fetching namespaces. Specifically,
the proxy only needs Namespaces when caching 'updating' or 'listing'
shard range metadata.
In order to allow upgrades to clusters we can't just send
'X-Backend-Record-Type = namespace', as old container servers won't
know how to respond. Instead, proxies send a new header
'X-Backend-Record-Shard-Format = namespace' along with the existing
'X-Backend-Record-Type = shard' header. Newer container servers will
return namespaces, old container servers continue to return full
shard ranges and they are parsed as Namespaces by the new proxy.
This patch refactors _get_from_shards to clarify that it does not
require ShardRange objects. The method is now passed a list of
namespaces, which is parsed from the response body before the method
is called. Some unit tests are also refactored to be more realistic
when mocking _get_from_shards.
Also refactor the test_container tests to better test shard-range and
namespace responses from legacy and modern container servers.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Jianjian Huo <jhuo@nvidia.com>
Related-Change: If152942c168d127de13e11e8da00a5760de5ae0d
Change-Id: I7169fb767525753554a40e28b8c8c2e265d08ecd
2024年01月11日 10:46:53 +00:00
Jianjian Huo
c073933387 Container-server: add container namespaces GET
The proxy-server makes GET requests to the container server to fetch
full lists of shard ranges when handling object PUT/POST/DELETE and
container GETs, then it only stores the Namespace attributes (lower
and name) of the shard ranges into Memcache and reconstructs the list
of Namespaces based on those attributes. Thus, a namespaces GET
interface can be added into the backend container-server to only
return a list of those Namespace attributes.
On a container server setup which serves a container with ~12000
shard ranges, benchmarking results show that the request rate of the
HTTP GET all namespaces (states=updating) is ~12 op/s, while the
HTTP GET all shard ranges (states=updating) is ~3.2 op/s.
The new namespace GET interface supports most of headers and
parameters supported by shard range GET interface. For example,
the support of marker, end_marker, include, reverse and etc. Two
exceptions are: 'x-backend-include-deleted' cannot be supported
because there is no way for a Namespace to indicate the deleted state;
the 'auditing' state query parameter is not supported because it is
specific to the sharder which only requests full shard ranges.
Co-Authored-By: Matthew Oliver <matt@oliver.net.au>
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: If152942c168d127de13e11e8da00a5760de5ae0d
2024年01月11日 10:46:53 +00:00
Alistair Coles
71ad062bc3 proxy: remove x-backend-record-type=shard in object listing
When constructing an object listing from container shards, the proxy
would previously return the X-Backend-Record-Type header with the
value 'shard' that is returned with the initial GET response from the
root container. It didn't break anything but was plainly wrong.
This patch removes the header from object listing responses to request
that did not have the header. The header value is not set to 'object'
because in a request that value specifically means 'do not recurse
into shards'.
Change-Id: I94c68e5d5625bc8b3d9cd9baa17a33bb35a7f82f
2023年12月11日 14:18:20 +00:00
Alistair Coles
60c04f116b s3api: Stop propagating storage policy to sub-requests
The proxy_logging middleware needs an X-Backend-Storage-Policy-Index
header to populate the storage policy field in logs, and will look in
both request and response headers to find it.
Previously, the s3api middleware would indiscriminately copy the
X-Backend-Storage-Policy-Index from swift backend requests into the
S3Request headers [1]. This works for logging but causes the header
to leak between backend requests [2] and break mixed policy
multipart uploads. This patch sets the X-Backend-Storage-Policy-Index
header on s3api responses rather than requests.
Additionally, the middleware now looks for the
X-Backend-Storage-Policy-Index header in the swift backend request
*and* response headers, in the same way that proxy_logging would
(preferring a response header over a request header). This means that
a policy index is now logged for bucket requests, which only have
X-Backend-Storage-Policy-Index header in their response headers.
The s3api adds the value from the *final* backend request/response
pair to its response headers. Returning the policy index from the
final backend request/response is consistent with swift.backend_path
being set to that backend request's path i.e. proxy_logging will log
the correct policy index for the logged path.
The FakeSwift helper no longer looks in registered object responses
for an X-Backend-Storage-Policy-Index header to update an object
request. Real Swift object responses do not have an
X-Backend-Storage-Policy-Index header. By default, FakeSwift will now
update *all* object requests with an X-Backend-Storage-Policy-Index as
follows:
 - If a matching container HEAD response has been registered then
 any X-Backend-Storage-Policy-Index found with that is used.
 - Otherwise the default policy index is used.
Furthermore, FakeSwift now adds the X-Backend-Storage-Policy-Index
header to the request *after* the request has been captured. Tests
using FakeSwift.calls_wth_headers() to make assertions about captured
headers no longer need to make allowance for the header that FakeSwift
added.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Closes-Bug: #2038459
[1] Related-Change: I5fe5ab31d6b2d9f7b6ecb3bfa246433a78e54808
[2] Related-Change: I40b252446b3a1294a5ca8b531f224ce9c16f9aba
Change-Id: I2793e335a08ad373c49cbbe6759d4e97cc420867
2023年11月14日 15:09:18 +00:00
Tim Burke
cb468840b9 Add non-ascii meta values to ssync probe test
Change-Id: I61c7780a84a1f0cee6975da67d08417cf6aa4ea2
2023年08月03日 12:33:56 -07:00
Tim Burke
66e6ee6241 tests: Make dark data probe tests pass with sync_method = ssync
Change-Id: Ic94761e435d85a7fe4bd17a7d341b1655b98b3ff
2023年05月17日 15:25:22 -07:00
Tim Burke
88941ebe46 tests: Fix config numbers in test_versioning_with_metadata_replication
Closes-Bug: #2017021
Change-Id: If422f99a77245b35ab755857f9816c1e401a4e22
2023年04月27日 15:20:07 -07:00
Zuul
2e89e92cb7 Merge "ssync: fix decoding of ts_meta when ts_data has offset" 2023年04月14日 18:18:09 +00:00
Alistair Coles
29414ab146 Allow internal container POSTs to not update put_timestamp
There may be circumstances when an internal client wishes to modify
container sysmeta that is hidden from the user. It is desirable that
this happens without modifying the put-timestamp and therefore the
last-modified time that is reported in responses to client HEADs and
GETs.
This patch modifies the container server so that a POST will not
update the container put_timestamp if an X-Backend-No-Timestamp-Update
header is included with the request and has a truthy value.
Note: there are already circumstances in which container sysmeta is
modified without changing the put_timestamp:
 - PUT requests with shard range content do not update put_timestamp.
 - the sharder updates sysmeta directly via the ContainerBroker without
 modifying put_timestamp.
Change-Id: I835b2dd58bc1d4fb911629e4da2ea4b9697dd21b
2023年03月20日 11:41:27 +00:00
Alistair Coles
2fe18b24cd ssync: fix decoding of ts_meta when ts_data has offset
The SsyncSender encodes object file timestamps in a compact form and
the SsyncReceiver decodes the timestamps and compares them to its
object file set.
The encoding represents the meta file timestamp as a delta from the
data file timestamp, NOT INCLUDING the data file timestamp offset.
Previously, the decoding was erroneously calculating the meta file
timestamp as the sum of the delta plus the data file timestamp
INCLUDING the offset.
For example, if the SssyncSender has object file timestamps:
 ts_data = t0_1.data
 ts_meta = t1.data
then the receiver would erroneously perceive that the sender has:
 ts_data = t0_1.data
 ts_meta = t1_1.data
As described in the referenced bug report, this erroneous decoding
could cause the SsyncReceiver to request that the SsyncSender sync an
object that is already in sync, which results in a 409 Conflict at the
receiver. The 409 causes the ssync session to terminate, and the same
process repeats on the next attempt.
Closes-Bug: #2007643
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: I74a0aac0ac29577026743f87f4b654d85e8fcc80
2023年02月27日 07:27:32 -06:00
Zuul
247c17b60c Merge "Sharding: No stat updates before CLEAVED state" 2023年02月02日 04:07:55 +00:00
Jianjian Huo
b4124e0cd2 Memcached: add timing stats to set/get and other public functions
Change-Id: If6af519440fb444539e2526ea4dcca0ec0636388
2023年01月06日 10:02:15 -08:00
Jianjian Huo
ec95047339 Sharder: add a new probe test for the case of slow parent sharding.
Probe test to produce a scenario where a parent container is stuck
at sharding because of a gap in shard ranges. And the gap is caused
 by deleted child shard range which finishes sharding before its
parent does.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Change-Id: I73918776ed91b19ba3fd6deda2fe4ca2820f4dbf
2023年01月03日 13:02:03 -08:00
Zuul
ec47bd12bf Merge "Switch to pytest" 2022年12月13日 19:51:08 +00:00
Matthew Oliver
ece4b04e82 Sharding: No stat updates before CLEAVED state
Once a shard container has been created as part of the sharder cycle it
pulls the shards own_shard_range, updates the object_count and
bytes_used and pushes this to the root container. The root container can
use these to display the current container stats.
However, it is not until a shard gets to the CLEAVED state, that it
holds enough information for it's namespace, so before this the number
it returns is incorrect. Further, when we find and create a shard, it
starts out with the number of objects, at the time, that are expected to
go into them. This is better answer then, say, nothing.
So it's better for the shard to send it's current own_shard_range but
don't update the stats until it can be authoritive of that answer.
This patch adds a new SHARD_UPDATE_STAT_STATES that track what
ShardRange states a shard needs to be in in order to be responsible,
current definition is:
 SHARD_UPDATE_STAT_STATES = [ShardRange.CLEAVED, ShardRange.ACTIVE,
 ShardRange.SHARDING, ShardRange.SHARDED,
 ShardRange.SHRINKING, ShardRange.SHRUNK]
As we don't want to update the OSR stats and the meta_timestmap, also
move tombstone updates to only happen when in a SHARD_UPDATE_STAT_STATES
state.
Change-Id: I838dbba3c791fffa6a36ffdcf73eceeaff718373
2022年12月12日 17:34:53 +11:00
Tim Burke
ef155bd74a Switch to pytest
nose has not seen active development for many years now. With py310, we
can no longer use it due to import errors.
Also update lower contraints
Closes-Bug: #1993531
Change-Id: I215ba0d4654c9c637c3b97953d8659ac80892db8
2022年12月09日 11:38:02 -08:00
Alistair Coles
001d931e6a sharder: update own_sr stats explicitly
Previously, when fetching a shard range from a container DB using
ContainerBroker.get_own_shard_range(), the stats of the returned shard
range were updated as a side effect. However, the stats were not
persisted in the own shard range row in the DB. Often the extra DB
queries to get the stats are unnecessary because we don't need
up-to-date stats in the returned shard range. The behavior also leads
to potential confusion because the object stats of the returned shard
range object are not necessarily consistent with the object stats of
the same shard range in the DB.
This patch therefore removes the stats updating behavior from
get_own_shard_range() and makes the stats updating happen as an
explicit separate operation, when needed. This is also more consistent
with how the tombstone count is updated.
Up-to-date own shard range stats are persisted when a container is
first enabled for sharding, and then each time a shard container
reports its stats to the root container.
Change-Id: Ib10ef918c8983ca006a3740db8cfd07df2dfecf7
2022年12月01日 14:23:37 +00:00
Alistair Coles
2bcf3d1a8e sharder: merge shard shard_ranges from root while sharding
We've seen shards become stuck while sharding because they had
incomplete or stale deleted shard ranges. The root container had more
complete and useful shard ranges into which objects could have been
cleaved, but the shard never merged the root's shard ranges.
While the sharder is auditing shard container DBs it would previously
only merge shard ranges fetched from root into the shard DB if the
shard was shrinking or the shard ranges were known to be children of
the shard. With this patch the sharder will now merge other shard
ranges from root during sharding as well as shrinking.
Shard ranges from root are only merged if they would not result in
overlaps or gaps in the set of shard ranges in the shard DB. Shard
ranges that are known to be ancestors of the shard are never merged,
except the root shard range which may be merged into a shrinking
shard. These checks were not previously applied when merging
shard ranges into a shrinking shard.
The two substantive changes with this patch are therefore:
 - shard ranges from root are now merged during sharding,
 subject to checks.
 - shard ranges from root are still merged during shrinking,
 but are now subjected to checks.
Change-Id: I066cfbd9062c43cd9638710882ae9bd85a5b4c37
2022年11月16日 16:12:32 +00:00
Alistair Coles
a46f2324ab sharder: always merge child shard ranges fetched from root
While the sharder is auditing shard container DBs it would previously
only merge shard ranges fetched from root into the shard DB if the
shard was shrinking; shrinking is the only time when a shard normally
*must* receive sub-shards from the root. With this patch the sharder
will also merge shard ranges fetched from the root if they are known
to be the children of the shard, regardless of the state of the shard.
Children shard ranges would previously only have been merged during
replication with peers of the shard; merging shard-ranges from the
root during audit potentially speeds their propagation to peers that
have yet to replicate.
Change-Id: I57aafc537ff94b081d0e1ea70e7fb7dd3598c61e
2022年09月30日 11:20:23 +01:00
Zuul
a554bb8861 Merge "Fix test.probe.brain CLI" 2022年09月13日 21:47:55 +00:00
Tim Burke
0860db1f60 Fix test.probe.brain CLI
Change-Id: I19716aeb4a4bf7b464928616a3ccb129fff7a7f2
Related-Change: Ib918f10e95970b9f562b88e923c25608b826b83f
2022年09月12日 13:54:20 -07:00
Jianjian Huo
a53270a15a swift-manage-shard-ranges repair: check for parent-child overlaps.
Stuck shard ranges have been seen in the production, root cause has been
traced back to that s-m-s-r failed to detect parent-child relationship
in overlaps and it either shrinked child shard ranges into parents or
the other way around. A patch has been added to check minimum age before
s-m-s-r performs repair, which will most likely prevent this from
happening again, but we also need to check for parent-child relationship
in overlaps explicitly during repairs. This patch will do that and
remove parent or child shard ranges from doners, and prevent s-m-s-r
from shrinking them into acceptor shard ranges.
Drive-by 1: fixup gap repair probe test.
The probe test is no longer appropriate because we're no longer
allowed to repair parent-child overlaps, so replace the test with a
manually created gap.
Drive-by 2: address probe test TODOs.
The commented assertion would fail because the node filtering
comparison failed to account for the same node having different indexes
when generated for the root versus the shard. Adding a new iterable
function filter_nodes makes the node filtering behave as expected.
Co-Authored-By: Alistair Coles <alistairncoles@gmail.com>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Iaa89e94a2746ba939fb62449e24bdab9666d7bab
2022年09月09日 11:04:43 -07:00
Jianjian Huo
61624ab837 swift-manage-shard-ranges repair: ignore recent overlaps
Add an option for the swift-manage-shard-ranges repair tool to ignore
 overlaps where the overlapping shards appear to be recently created,
 since this may indicate that they are shards of the parent with
which they overlap.
The new option is --min-shard-age with a default of 14400 seconds.
Change-Id: Ib82219a506732303a1157c2c9e1c452b4a56061b
2022年08月22日 21:45:29 -07:00
Alistair Coles
38271142eb sharder: process deleted DBs
It is possible for some replicas of a container DB to be sharded, and
for the container to then be emptied and deleted, before another
replica of the DB has started sharding. Previously, the unsharded
replica would remain in the unsharded state because the sharder would
not process deleted DBs.
This patch modifies the sharder to always process deleted DBs; this
will result in the sharder making some additional DB queries for shard
ranges in order to determine whether any processing is required for
deleted DBs.
Auto-sharding will not act on deleted DBs.
Change-Id: Ia6ad92042aa9a51e3ddefec0b9b8acae44e9e9d7
2022年07月27日 15:48:40 +01:00
Alistair Coles
57f7145f73 sharder: always set state to CLEAVED after cleaving
During cleaving, if the sharder finds that zero object rows are copied
from the parent retiring DB to a cleaved shard DB, and if that shard
DB appears have been freshly created by the cleaving process, then the
sharder skips replicating that shard DB and does not count the shard
range as part of the batch (see Related-Change).
Previously, any shard range treated in this way would not have its
state moved to CLEAVED but would remain in the CREATED state. However,
cleaving of following shard ranges does continue, leading to anomalous
sets of shard range states, including all other shard ranges moving to
ACTIVE but the skipped range remaining in CREATED (until another
sharder visitation finds object rows and actually replicates the
cleaved shard DB).
These anomalies can be avoided by moving the skipped shard range to
the CLEAVED state. This is exactly what would happen anyway if the
cleaved DB had only one object row copied to it, or if the cleaved DB
had zero object rows copied to it but happened to already exist on
disk.
Related-Change: Id338f6c3187f93454bcdf025a32a073284a4a159
Change-Id: I1ca7bf42ee03a169261d8c6feffc38b53226c97f
2022年07月13日 17:54:06 +01:00