00373dad617db30bda3e8b722ab8518f59e0cb10
4066 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Zuul
|
14a8797228 | Merge "Make test_greater_with_offset not fail on py36" | ||
|
Tim Burke
|
afb6cb5835 |
Try to avoid leaving (killed) long-running rsyncs in the process table
Also, add some guards against a NameError in particularly-bad races. Change-Id: If90662b6996e25bde74e0a202301b52a1d266e92 Related-Change: Ifd14ce82de1f7ebb636d6131849e0fadb113a701 |
||
|
Zuul
|
cfb893eb87 | Merge "Cleanup for iterators in SegmentedIterable" | ||
|
Zuul
|
ceb3c01bf6 | Merge "Make statsd errors correspond to 5xx only" | ||
|
Zuul
|
95225d3b12 | Merge "Solve the zombie process problem of Auditor" | ||
|
Tim Burke
|
57b632fbb5 |
Fix object-server to not 400 all expirer DELETEs
In the related changes, we switched to using Timestamp.normal representations for the X-If-Delete-At header. However, the object-server required that the header be an int, and the trailing '.00000' would cause trip the "Bad X-If-Delete-At header value" error handling. Now, we'll convert both the expirer header and the stored X-Delete-At to Timestamps, even though we expect them to have no fractional value. Note that we *could* have changed the expirer to continue sending headers that are valid ints, but Timestamps are already the normal Swift-way of passing and comparing times -- we should use that. Related-Change: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de Related-Change: Ie82622625d13177e08a363686ec632f63d24f4e9 Change-Id: Ida22c1c8c5bf21bdc72c33e225e75fb750f8444b |
||
|
Pete Zaitcev
|
fdaf052d73 |
Make test_greater_with_offset not fail on py36
Reviewer, beware: we determined that the test was using the facilities improperly. This patch adjusts the test but does not fix the code under test. The time.time() output looks like this: [zaitcev@lembas swift-tsrep]$ python2 Python 2.7.14 (default, Dec 11 2017, 14:52:53) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux2 >>> import time >>> time.time() 1519861559.96239 >>> time.time() 1519861561.046204 >>> time.time() 1519861561.732341 >>> (it's never beyond 6 digits on py2) [zaitcev@lembas swift-tsrep]$ python3 Python 3.6.3 (default, Oct 9 2017, 12:07:10) [GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux >>> import time >>> time.time() 1519861541.7662468 >>> time.time() 1519861542.893482 >>> time.time() 1519861546.56222 >>> time.time() 1519861547.3297756 >>> (can go beyond 6 digits on py3) When fraction is too long on py3, you get: >>> now = 1519830570.6949349 >>> now 1519830570.6949348 >>> timestamp = Timestamp(now, offset=1) >>> timestamp 1519830570.69493_0000000000000001 >>> value = '%f' % now >>> value '1519830570.694935' >>> timestamp > value False >>> Note that the test fails in exactly the same way on py2, if time.time() returns enough digits. Therefore, rounding changes are not the culprit. The real problem is the assumption that you can take a float T, print it with '%f' into S, then do arithmetic on T to get O, convert S, T, and O into Timestamp, then make comparisons. This does not work, because rounding happens twice: once when you interpolate %f, and then when you construct a Timestamp. The only valid operation is to accept a timestamp (e.g. from X-Delete-At) as a floating point number as a decimal string, and convert it once. Only then you can do arithmetics to find the expiration. Change-Id: Ie3b002abbd4734c675ee48a7535b8b846032f9d1 |
||
|
Tim Burke
|
8b8a2a3406 |
Tolerate 404s during setUp/tearDown in func tests
A couple times, I've seen tests fail in the gate because we got back a 404 while trying to clean out the test account. The story that gets us here seems to be: - One or more object servers take too long to respond to the initial DELETE request, so the test client gets back a 503 and sleeps so it can retry. - Meanwhile, the servers finish writing their tombstones and want to respond 204 (but probably *actually* respond 408 because the proxy killed the connection). - The test client sends its retry, and since the object servers now have tombstones, it gets back a 404. But the thing is, this is *outside of the test scope* anyway, we're just trying to get back to a sane state. If it's gone, s much the better! For an example of this, see the failures on patchset 3 of https://review.openstack.org/#/c/534978 (which both failed for the same reason on different tests). Change-Id: I9ab2fd430d4800f9f55275959a20e30f09d9e1a4 |
||
|
Tim Burke
|
36c42974d6 |
py3: Port more CLI tools
Bring under test - test/unit/cli/test_dispersion_report.py - test/unit/cli/test_info.py and - test/unit/cli/test_relinker.py I've verified that swift-*-info (at least) behave reasonably under py3, even swift-object-info when there's non-utf8 metadata on the data/meta file. Change-Id: Ifed4b8059337c395e56f5e9f8d939c34fe4ff8dd |
||
|
Zuul
|
78439d95f4 | Merge "py3: port common/memcached.py" | ||
|
Tim Burke
|
624b5310b4 |
py3: port common/wsgi.py
Note that we're punting on configuring socket buffer sizes (for now) Change-Id: I285a9b521fd0af381a227e0e824bc391817547f4 |
||
|
Kazuhiro MIYAHARA
|
1fadffeae0 |
Split expirer methods and parametrize task account
To prepare for implement general task queue mode to expirer, this patch splits expirer's method into smaller ones and parametrize task account. This change will make expirer's general task queue patch [1] more simple. This patch has following approaches: 1: Split methods into smaller ones 2: Parameterize task account name to adapt many task accounts in general task queue 3: Include task account names in log messages 4: Skip task account when the account has no task containers [1]: https://review.openstack.org/#/c/517389/ Change-Id: I907612f7c258495e9ccc53c1d57de4791b3e7ab7 |
||
|
Kota Tsuyuzaki
|
9e5f434574 |
Kill rsync coros when lockup detector tries to kill the process
Because the replicator in the master doesn't propergate the kill signal to the subprocess in the coroutine. With the behavior, the lockup detector causes a lot of rsync processes even it tries to reset the process. This patch fixes the replicator kill rsync procs when the lockup detector calls kill of eventlet threads. Change-Id: Ifd14ce82de1f7ebb636d6131849e0fadb113a701 |
||
|
Zuul
|
f1f8591c6a | Merge "Fix expirer's invalid task object names in unit tests" | ||
|
Zuul
|
d0f4fd6db5 | Merge "py3: port common/storage_policy.py" | ||
|
Kazuhiro MIYAHARA
|
b3f1558acd |
Fix expirer's invalid task object names in unit tests
Object-expirer's task name should be in format of "<timestamp>-<account>/<container>/<obj>". In object-expirer implementation, ValueError is catched and handled when expirer's task objects have invalid name. But in actual swift cluster, invalid task object name is not created because task object is created by object-server. However, without the ValueError catching, some unit tests fail, because the unit tests create invalid task object names. This patch fixes invalid task object names in unit tests. The ValueError catch is remained for unexpected errors, but in the case the task will be skipped. This patch will help to refactor expirer's task object parsing. Change-Id: I8fab8fd180481ce9e97c945904c5c89eec037110 |
||
|
Tim Burke
|
748b29ef80 |
Make If-None-Match:* work properly with 0-byte PUTs
When PUTting an object with `If-None-Match: *`, we rely 100-continue support: the proxy checks the responses from all object-servers, and if any of them respond 412, it closes down the connections. When there's actual data for the object, this ensures that even nodes that *don't* respond 412 will hit a ChunkReadTimeout and abort the PUT. However, if the client does a PUT with a Content-Length of 0, that would get sent all the way to the object server, which had all the information it needed to respond 201. After replication, the PUT propagates to the other nodes and the old object is lost, despite the client receiving a 412 indicating the operation failed. Now, when PUTting a zero-byte object, switch to a chunked transfer so the object-server still gets a ChunkReadTimeout. Change-Id: Ie88e41aca2d59246c3134d743c1531c8e996f9e4 |
||
|
Tim Burke
|
5cb0869743 |
py3: port common/memcached.py
Change-Id: I7f04b3977971f0581b04180e5372686d8186346f |
||
|
Samuel Merritt
|
e0d1869068 |
Fix suffix-byte-range responses for zero-byte EC objects.
The object servers are correctly returning 200s for such objects, but we were misinterpreting the result in the proxy. We had assumed that a satisfiable byte-range contained at least one byte, which seems reasonable unless you gaze long into RFC 7233. Suffix byte ranges (e.g. "bytes=-32123") are not asking for the last N bytes of an object; they are asking for *up to* the last N bytes, or the whole thing if fewer than N bytes are available. In the EC machinery, we had code that assumed "has no bytes" == "unsatisfiable", which is not true in that specific case. Now we correctly handle a suffix-byte-range request that is satisfiable but receives zero bytes. Change-Id: I8295a6c1436f50f86a4c626d87de6bfedd74ab09 Closes-Bug: 1736840 |
||
|
Zuul
|
54509a6791 | Merge "Tighten up assertions around expirer's concurrency" | ||
|
Zuul
|
353a7ad07b | Merge "Remove confusing assertion from expirer's unit test" | ||
|
Tim Burke
|
4b19ac7723 |
py3: port common/storage_policy.py
Change-Id: I7030280a8495628df9ed8edcc8abc31f901da72e |
||
|
Tim Burke
|
25540a415e |
Tighten up assertions around expirer's concurrency
In particular, test that each work item is only done *once*. Change-Id: I9cc610bffb2aa9a2f2b05f4c49e574ab56d05201 Related-Change: Ic0075a3718face8c509ed0524b63d9171f5b7d7a |
||
|
Kazuhiro MIYAHARA
|
532ac9e1c7 |
Ensure reverting test env if the env is temporarily changed
test_tempurl_keys_hidden_from_acl_readonly changes test env parameter temporarily for container HEAD. After that the test reverts the change. But if the HEAD failed with exception, the change is not reverted. With the non reverted change, some other tests fails even if the test have no problems. This patch ensures the reversion by using try-finally. Change-Id: I8cd7928da6211e5516992fe9f2bc8e568bcab443 |
||
|
Kazuhiro MIYAHARA
|
58f5d89066 |
Remove confusing assertion from expirer's unit test
In test_expirer.TestObjectExpirer.test_process_based_concurrency, an assertion checks that expirer execute tasks in round-robin order for target containers. But the assertion depends on task object path, because task assignation for each process depends on md5 of task object path. The dependency makes the assetion confusing. Now, we have test_expirer.TestObjectExpirer.test_round_robin_order which is added in [1]. So this patch remove the confusing assertion. This patch will help to refactor expirer's task object parsing. I will push patch for the refactoring after this patch. [1]: https://review.openstack.org/#/c/538171 Change-Id: Ic0075a3718face8c509ed0524b63d9171f5b7d7a |
||
|
Tim Burke
|
6060af8db9 |
Add more tests around ObjectExpirer.round_robin_order
Change-Id: I43b5e8d9513fd0566a61ff585dfdc1dde5b28343 Related-Change: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de |
||
|
Zuul
|
d296ec8be4 | Merge "Refactor expirer's task round robin implementation" | ||
|
Kazuhiro MIYAHARA
|
303635348b |
Refactor expirer's task round robin implementation
Object-expirer changes order of expiration tasks to avoid deleting objects in a certain container continuously. To make review for expirer's task queue update patch [1] easy, this patch refactors the implementation of the order change. In this patch, the order change is divided as a function. In [1], there will be two implementations for legacy task queue and for general task queue. The two implementations have similar codes. This patch helps to avoid copying codes in the two implementations. Other than dividing function, this patch tries to resolve: - Separate container iteration and object iteration to avoid the generator termination with (container, None) tuple. - Using Timestamp class for delete_timestamp to be consist with other modules - Change yielded delete task object info from tuple to dict because that includes several complex info (e.g. task_container, task_object, and target_path) - Fix minor docs and tests depends on the changes above [1]: https://review.openstack.org/#/c/517389 Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp> Change-Id: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de |
||
|
Zuul
|
ee282c1166 | Merge "Fix suffix-byte-range responses for zero-byte replicated objects." | ||
|
Zuul
|
db03703443 | Merge "Using assertIsNone() instead of assertEqual(None)" | ||
|
Samuel Merritt
|
47fed6f2f9 |
Add handoffs-only mode to DB replicators.
The object reconstructor has a handoffs-only mode that is very useful when a cluster requires rapid rebalancing, like when disks are nearing fullness. This mode's goal is to remove handoff partitions from disks without spending effort on primary partitions. The object replicator has a similar mode, though it varies in some details. This commit adds a handoffs-only mode to the account and container replicators. Change-Id: I588b151ee65ae49d204bd6bf58555504c15edf9f Closes-Bug: 1668399 |
||
|
Samuel Merritt
|
2bfd9c6a9b |
Make DB replicators ignore non-partition directories
If a cluster operator has some tooling that makes directories in /srv/node/<disk>/accounts, then the account replicator will treat those directories as partition dirs and may remove empty subdirectories contained therein. This wastes time and confuses the operator. This commit makes DB replicators skip partition directories whose names don't look like positive integers. This doesn't completely avoid the problem since an operator can still use an all-digit name, but it will skip directories like "tmp21945". Change-Id: I8d6682915a555f537fc0ce8c39c3d52c99ff3056 |
||
|
Zuul
|
696d26fedd | Merge "py3: port common/ring/ and common/utils.py" | ||
|
Zuul
|
ea8df4293f | Merge "kill orphans during probe test setup" | ||
|
Tim Burke
|
642f79965a |
py3: port common/ring/ and common/utils.py
I can't imagine us *not* having a py3 proxy server at some point, and that proxy server is going to need a ring. While we're at it (and since they were so close anyway), port * cli/ringbuilder.py and * common/linkat.py * common/daemon.py Change-Id: Iec8d97e0ce925614a86b516c4c6ed82809d0ba9b |
||
|
Zuul
|
6544ae1848 | Merge "Move eventlet patch before call to loadapp" | ||
|
Zuul
|
1038ddacae | Merge "Fix typos in swift" | ||
|
baiwenteng
|
a3d2aaba64 |
Fix typos in swift
Change-Id: I0982b0046a16fda0a39d9b31402b2e4b3160a5c4 |
||
|
Zuul
|
07a5f2f8db | Merge "Quarantine DB without *_stat row" | ||
|
Alistair Coles
|
1f4ebbc990 |
kill orphans during probe test setup
orphans processes sometimes cause probe test failures so get rid of them before each test. Change-Id: I4ba6748d30fbb28371f13aa95387c49bc8223402 |
||
|
Thiago da Silva
|
c9410c7dd4 |
Move eventlet patch before call to loadapp
Ran into an eventlet bug[0] while integration Swift/Barbican in TripleO. It is very similar to a previous bug related to keystonemiddleware[1]. Suggestion from urllib3[2] is to patch eventlet "as early as possible". Traceback[3] shows that urllib3 is being imported before the eventlet patch, so moved the patch to before the loadapp call. [0] - http://paste.openstack.org/show/658046/ [1] - https://bugs.launchpad.net/swift/+bug/1662473 [2] - https://github.com/shazow/urllib3/issues/1104 [3] - https://gist.github.com/thiagodasilva/12dad7dc4f940b046dd0863b6f82a78b Change-Id: I74e580f31349bdefd187cc5d6770a7041a936bef |
||
|
Ondřej Nový
|
bfe52a2e35 |
Quarantine DB without *_stat row
Closes-Bug: #1747689 Change-Id: Ief6bd0ba6cf675edd8ba939a36fb9d90d3f4447f |
||
|
Tim Burke
|
5b30c1f811 |
Fix flakey test_check_delete_headers_sets_delete_at
It was rare (saw it once in 10k runs running locally), but it's ocassionally blown up in the gate [1]. With this, no fails locally even after 100k runs. [1] http://logs.openstack.org/11/538011/3/gate/swift-tox-py27/06c06f0/job-output.txt.gz#_2018年02月07日_03_29_09_578389 Change-Id: I7701d2db2ec82b48559c5b74a2e08c4403fd5dec Related-Change: Ia126ad6988f387bbd2d1f5ddff0a56d457a1fc9b |
||
|
Zuul
|
4704eeaefb | Merge "Fix inconsistency of account info in expirer's unit tests" | ||
|
Samuel Merritt
|
98d185905a |
Cleanup for iterators in SegmentedIterable
We had a pair of large, complicated iterators to handle fetching all the segment data, and they were hard to read and think about. I tried to break them out into some simpler pieces: * one to handle coalescing multiple requests to the same segment * one to handle fetching the bytes from each segment * one to check that the download isn't taking too long * one to count the bytes and make sure we sent the right number * one to catch errors and handle cleanup It's more nesting, but each level now does just one thing. Change-Id: If6f5cbd79edeff6ecb81350792449ce767919bcc |
||
|
Zuul
|
d6e911c623 | Merge "Refactor expirer unit tests" | ||
|
Zuul
|
c97459b54a | Merge "Remove some cruft from ratelimit tests" | ||
|
Zuul
|
82844a3211 | Merge "Add support for data segments to SLO and SegmentedIterable" | ||
|
vxlinux
|
39910553df |
Solve the zombie process problem of Auditor
As the bug 1743310 reported,if we list the swift processes ,we will see a zombie process every one minute.The zombie processes numbers may be more than one. The related code as follows: swift/obj/auditor.py:386~397 while pids: pid = os.wait()[0] # ZBF scanner must be restarted as soon as it finishes # unless we're in run-once mode if self.conf_zero_byte_fps and pid == zbf_pid and \ len(pids) > 1 and not once: kwargs['device_dirs'] = override_devices # sleep between ZBF scanner forks self._sleep() zbf_pid = self.fork_child(zero_byte_fps=True, **kwargs) pids.add(zbf_pid) pids.discard(pid) The pids list includes a zbf pid and one or more pids with mode (ALL - parallel, objectXXX).If the zbf process is the last one finished,then the second zbf process will not be forked.Conversely, the zbf process will be forked again,before the fork procedure,the main process will sleep for self.interval seconds,30 seconds by default, during the self.interval time,if a non zbf process finished,this process will become a zombie,because the main process is in sleep. In this solution,we move the sleep from whileloop to the subprocess,and the main process will not be blocked in whileloop,so the subprocess will be recovered in time. Closes-Bug: #1743310 Change-Id: I61c766aa2a1c4bad0247a44a8e78ef38d9f3ae47 |
||
|
Kazuhiro MIYAHARA
|
8140b7e7ad |
Fix inconsistency of account info in expirer's unit tests
In expirer's unit tests, FakeInternalClient instances simulates expirer's task queue behavior. But get_account_info method of the FakeInternalClient returns container count = 1 and object count = 2, even if it simulate different count of containers or objects. This patch fixes the behavior. The return values of get_account_info will be equal to simulated container and object counts. This patch will make review for expirer's task queue upgrade patch [1] more easy. [1]: https://review.openstack.org/#/c/517389 Change-Id: Id5339ea7e10e4577ff22daeb91ec90f08704c98d |