00373dad617db30bda3e8b722ab8518f59e0cb10
Commit Graph

4066 Commits

Author SHA1 Message Date
Zuul
14a8797228 Merge "Make test_greater_with_offset not fail on py36" 2018年03月03日 23:55:53 +00:00
Tim Burke
afb6cb5835 Try to avoid leaving (killed) long-running rsyncs in the process table
Also, add some guards against a NameError in particularly-bad races.
Change-Id: If90662b6996e25bde74e0a202301b52a1d266e92
Related-Change: Ifd14ce82de1f7ebb636d6131849e0fadb113a701
2018年03月03日 21:56:56 +00:00
Zuul
cfb893eb87 Merge "Cleanup for iterators in SegmentedIterable" 2018年03月03日 03:21:10 +00:00
Zuul
ceb3c01bf6 Merge "Make statsd errors correspond to 5xx only" 2018年03月03日 03:08:41 +00:00
Zuul
95225d3b12 Merge "Solve the zombie process problem of Auditor" 2018年03月03日 02:43:43 +00:00
Tim Burke
57b632fbb5 Fix object-server to not 400 all expirer DELETEs
In the related changes, we switched to using
Timestamp.normal representations for the X-If-Delete-At header.
However, the object-server required that the header be an int,
and the trailing '.00000' would cause trip the
"Bad X-If-Delete-At header value" error handling.
Now, we'll convert both the expirer header and the stored X-Delete-At to
Timestamps, even though we expect them to have no fractional value.
Note that we *could* have changed the expirer to continue sending
headers that are valid ints, but Timestamps are already the normal
Swift-way of passing and comparing times -- we should use that.
Related-Change: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de
Related-Change: Ie82622625d13177e08a363686ec632f63d24f4e9
Change-Id: Ida22c1c8c5bf21bdc72c33e225e75fb750f8444b
2018年03月02日 15:25:38 +00:00
Pete Zaitcev
fdaf052d73 Make test_greater_with_offset not fail on py36
Reviewer, beware: we determined that the test was using the
facilities improperly. This patch adjusts the test but does
not fix the code under test.
The time.time() output looks like this:
[zaitcev@lembas swift-tsrep]$ python2
Python 2.7.14 (default, Dec 11 2017, 14:52:53)
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux2
>>> import time
>>> time.time()
1519861559.96239
>>> time.time()
1519861561.046204
>>> time.time()
1519861561.732341
>>>
(it's never beyond 6 digits on py2)
[zaitcev@lembas swift-tsrep]$ python3
Python 3.6.3 (default, Oct 9 2017, 12:07:10)
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
>>> import time
>>> time.time()
1519861541.7662468
>>> time.time()
1519861542.893482
>>> time.time()
1519861546.56222
>>> time.time()
1519861547.3297756
>>>
(can go beyond 6 digits on py3)
When fraction is too long on py3, you get:
>>> now = 1519830570.6949349
>>> now
1519830570.6949348
>>> timestamp = Timestamp(now, offset=1)
>>> timestamp
1519830570.69493_0000000000000001
>>> value = '%f' % now
>>> value
'1519830570.694935'
>>> timestamp > value
False
>>>
Note that the test fails in exactly the same way on py2, if time.time()
returns enough digits. Therefore, rounding changes are not the culprit.
The real problem is the assumption that you can take a float T, print
it with '%f' into S, then do arithmetic on T to get O, convert S, T,
and O into Timestamp, then make comparisons. This does not work,
because rounding happens twice: once when you interpolate %f, and
then when you construct a Timestamp. The only valid operation is
to accept a timestamp (e.g. from X-Delete-At) as a floating point
number as a decimal string, and convert it once. Only then you can
do arithmetics to find the expiration.
Change-Id: Ie3b002abbd4734c675ee48a7535b8b846032f9d1
2018年03月01日 21:04:42 -06:00
Tim Burke
8b8a2a3406 Tolerate 404s during setUp/tearDown in func tests
A couple times, I've seen tests fail in the gate because we got back a
404 while trying to clean out the test account. The story that gets us
here seems to be:
 - One or more object servers take too long to respond to the initial
 DELETE request, so the test client gets back a 503 and sleeps so
 it can retry.
 - Meanwhile, the servers finish writing their tombstones and want to
 respond 204 (but probably *actually* respond 408 because the proxy
 killed the connection).
 - The test client sends its retry, and since the object servers now
 have tombstones, it gets back a 404.
But the thing is, this is *outside of the test scope* anyway, we're just
trying to get back to a sane state. If it's gone, s much the better!
For an example of this, see the failures on patchset 3 of
https://review.openstack.org/#/c/534978 (which both failed for the same
reason on different tests).
Change-Id: I9ab2fd430d4800f9f55275959a20e30f09d9e1a4
2018年03月01日 23:30:00 +00:00
Tim Burke
36c42974d6 py3: Port more CLI tools
Bring under test
 - test/unit/cli/test_dispersion_report.py
 - test/unit/cli/test_info.py and
 - test/unit/cli/test_relinker.py
I've verified that swift-*-info (at least) behave reasonably under
py3, even swift-object-info when there's non-utf8 metadata on the
data/meta file.
Change-Id: Ifed4b8059337c395e56f5e9f8d939c34fe4ff8dd
2018年02月28日 21:10:01 +00:00
Zuul
78439d95f4 Merge "py3: port common/memcached.py" 2018年02月28日 19:35:15 +00:00
Tim Burke
624b5310b4 py3: port common/wsgi.py
Note that we're punting on configuring socket buffer sizes (for now)
Change-Id: I285a9b521fd0af381a227e0e824bc391817547f4
2018年02月28日 12:49:13 -05:00
Kazuhiro MIYAHARA
1fadffeae0 Split expirer methods and parametrize task account
To prepare for implement general task queue mode to expirer,
this patch splits expirer's method into smaller ones and parametrize task
account. This change will make expirer's general task queue patch [1] more
simple.
This patch has following approaches:
 1: Split methods into smaller ones
 2: Parameterize task account name to adapt many task accounts
 in general task queue
 3: Include task account names in log messages
 4: Skip task account when the account has no task containers
[1]: https://review.openstack.org/#/c/517389/
Change-Id: I907612f7c258495e9ccc53c1d57de4791b3e7ab7
2018年02月27日 22:49:31 +00:00
Kota Tsuyuzaki
9e5f434574 Kill rsync coros when lockup detector tries to kill the process
Because the replicator in the master doesn't propergate the kill
signal to the subprocess in the coroutine. With the behavior, the lockup
detector causes a lot of rsync processes even it tries to reset the process.
This patch fixes the replicator kill rsync procs when the lockup detector
calls kill of eventlet threads.
Change-Id: Ifd14ce82de1f7ebb636d6131849e0fadb113a701
2018年02月27日 19:34:51 +09:00
Zuul
f1f8591c6a Merge "Fix expirer's invalid task object names in unit tests" 2018年02月26日 23:27:01 +00:00
Zuul
d0f4fd6db5 Merge "py3: port common/storage_policy.py" 2018年02月26日 18:12:32 +00:00
Kazuhiro MIYAHARA
b3f1558acd Fix expirer's invalid task object names in unit tests
Object-expirer's task name should be in format of
"<timestamp>-<account>/<container>/<obj>". In object-expirer
implementation, ValueError is catched and handled when expirer's task
objects have invalid name. But in actual swift cluster, invalid task
object name is not created because task object is created by
object-server.
However, without the ValueError catching, some unit tests fail,
because the unit tests create invalid task object names.
This patch fixes invalid task object names in unit tests. The
ValueError catch is remained for unexpected errors, but in the case
the task will be skipped.
This patch will help to refactor expirer's task object parsing.
Change-Id: I8fab8fd180481ce9e97c945904c5c89eec037110
2018年02月26日 16:10:40 +00:00
Tim Burke
748b29ef80 Make If-None-Match:* work properly with 0-byte PUTs
When PUTting an object with `If-None-Match: *`, we rely 100-continue
support: the proxy checks the responses from all object-servers, and if
any of them respond 412, it closes down the connections. When there's
actual data for the object, this ensures that even nodes that *don't*
respond 412 will hit a ChunkReadTimeout and abort the PUT.
However, if the client does a PUT with a Content-Length of 0, that would
get sent all the way to the object server, which had all the information
it needed to respond 201. After replication, the PUT propagates to the
other nodes and the old object is lost, despite the client receiving a
412 indicating the operation failed.
Now, when PUTting a zero-byte object, switch to a chunked transfer so
the object-server still gets a ChunkReadTimeout.
Change-Id: Ie88e41aca2d59246c3134d743c1531c8e996f9e4
2018年02月26日 13:12:44 +00:00
Tim Burke
5cb0869743 py3: port common/memcached.py
Change-Id: I7f04b3977971f0581b04180e5372686d8186346f
2018年02月26日 12:39:16 +00:00
Samuel Merritt
e0d1869068 Fix suffix-byte-range responses for zero-byte EC objects.
The object servers are correctly returning 200s for such objects, but
we were misinterpreting the result in the proxy. We had assumed that a
satisfiable byte-range contained at least one byte, which seems
reasonable unless you gaze long into RFC 7233.
Suffix byte ranges (e.g. "bytes=-32123") are not asking for the last N
bytes of an object; they are asking for *up to* the last N bytes, or
the whole thing if fewer than N bytes are available. In the EC
machinery, we had code that assumed "has no bytes" == "unsatisfiable",
which is not true in that specific case. Now we correctly handle a
suffix-byte-range request that is satisfiable but receives zero bytes.
Change-Id: I8295a6c1436f50f86a4c626d87de6bfedd74ab09
Closes-Bug: 1736840
2018年02月26日 12:17:53 +00:00
Zuul
54509a6791 Merge "Tighten up assertions around expirer's concurrency" 2018年02月26日 11:13:12 +00:00
Zuul
353a7ad07b Merge "Remove confusing assertion from expirer's unit test" 2018年02月26日 11:13:08 +00:00
Tim Burke
4b19ac7723 py3: port common/storage_policy.py
Change-Id: I7030280a8495628df9ed8edcc8abc31f901da72e
2018年02月26日 10:57:41 +00:00
Tim Burke
25540a415e Tighten up assertions around expirer's concurrency
In particular, test that each work item is only done *once*.
Change-Id: I9cc610bffb2aa9a2f2b05f4c49e574ab56d05201
Related-Change: Ic0075a3718face8c509ed0524b63d9171f5b7d7a
2018年02月26日 10:37:23 +00:00
Kazuhiro MIYAHARA
532ac9e1c7 Ensure reverting test env if the env is temporarily changed
test_tempurl_keys_hidden_from_acl_readonly changes test env parameter
temporarily for container HEAD. After that the test reverts the change.
But if the HEAD failed with exception, the change is not reverted.
With the non reverted change, some other tests fails even if the test
have no problems.
This patch ensures the reversion by using try-finally.
Change-Id: I8cd7928da6211e5516992fe9f2bc8e568bcab443
2018年02月23日 07:24:12 +00:00
Kazuhiro MIYAHARA
58f5d89066 Remove confusing assertion from expirer's unit test
In test_expirer.TestObjectExpirer.test_process_based_concurrency,
an assertion checks that expirer execute tasks in round-robin order
for target containers. But the assertion depends on task object path,
because task assignation for each process depends on md5 of task
object path. The dependency makes the assetion confusing.
Now, we have test_expirer.TestObjectExpirer.test_round_robin_order which
is added in [1]. So this patch remove the confusing assertion.
This patch will help to refactor expirer's task object parsing.
I will push patch for the refactoring after this patch.
[1]: https://review.openstack.org/#/c/538171
Change-Id: Ic0075a3718face8c509ed0524b63d9171f5b7d7a
2018年02月22日 11:05:46 -08:00
Tim Burke
6060af8db9 Add more tests around ObjectExpirer.round_robin_order
Change-Id: I43b5e8d9513fd0566a61ff585dfdc1dde5b28343
Related-Change: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de
2018年02月22日 10:26:00 -08:00
Zuul
d296ec8be4 Merge "Refactor expirer's task round robin implementation" 2018年02月22日 12:35:07 +00:00
Kazuhiro MIYAHARA
303635348b Refactor expirer's task round robin implementation
Object-expirer changes order of expiration tasks to avoid deleting
objects in a certain container continuously.
To make review for expirer's task queue update patch [1] easy,
this patch refactors the implementation of the order change. In this
patch, the order change is divided as a function.
In [1], there will be two implementations for legacy task queue
and for general task queue. The two implementations have similar
codes. This patch helps to avoid copying codes in the two implementations.
Other than dividing function, this patch tries to resolve:
- Separate container iteration and object iteration to avoid the generator
 termination with (container, None) tuple.
- Using Timestamp class for delete_timestamp to be consist with other modules
- Change yielded delete task object info from tuple to dict because that
 includes several complex info (e.g. task_container, task_object,
 and target_path)
- Fix minor docs and tests depends on the changes above
[1]: https://review.openstack.org/#/c/517389
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: Ibf61eb1f767a48cb457dd494e1f7c12acfd205de
2018年02月22日 18:43:11 +09:00
Zuul
ee282c1166 Merge "Fix suffix-byte-range responses for zero-byte replicated objects." 2018年02月20日 07:38:10 +00:00
Zuul
db03703443 Merge "Using assertIsNone() instead of assertEqual(None)" 2018年02月19日 20:45:18 +00:00
Samuel Merritt
47fed6f2f9 Add handoffs-only mode to DB replicators.
The object reconstructor has a handoffs-only mode that is very useful
when a cluster requires rapid rebalancing, like when disks are nearing
fullness. This mode's goal is to remove handoff partitions from disks
without spending effort on primary partitions. The object replicator
has a similar mode, though it varies in some details.
This commit adds a handoffs-only mode to the account and container
replicators.
Change-Id: I588b151ee65ae49d204bd6bf58555504c15edf9f
Closes-Bug: 1668399
2018年02月16日 16:56:13 -08:00
Samuel Merritt
2bfd9c6a9b Make DB replicators ignore non-partition directories
If a cluster operator has some tooling that makes directories in
/srv/node/<disk>/accounts, then the account replicator will treat
those directories as partition dirs and may remove empty
subdirectories contained therein. This wastes time and confuses the
operator.
This commit makes DB replicators skip partition directories whose
names don't look like positive integers. This doesn't completely avoid
the problem since an operator can still use an all-digit name, but it
will skip directories like "tmp21945".
Change-Id: I8d6682915a555f537fc0ce8c39c3d52c99ff3056
2018年02月16日 16:56:13 -08:00
Zuul
696d26fedd Merge "py3: port common/ring/ and common/utils.py" 2018年02月16日 08:23:37 +00:00
Zuul
ea8df4293f Merge "kill orphans during probe test setup" 2018年02月15日 07:22:02 +00:00
Tim Burke
642f79965a py3: port common/ring/ and common/utils.py
I can't imagine us *not* having a py3 proxy server at some point, and
that proxy server is going to need a ring.
While we're at it (and since they were so close anyway), port
* cli/ringbuilder.py and
* common/linkat.py
* common/daemon.py
Change-Id: Iec8d97e0ce925614a86b516c4c6ed82809d0ba9b
2018年02月12日 06:42:24 +00:00
Zuul
6544ae1848 Merge "Move eventlet patch before call to loadapp" 2018年02月09日 22:29:27 +00:00
Zuul
1038ddacae Merge "Fix typos in swift" 2018年02月09日 07:35:22 +00:00
baiwenteng
a3d2aaba64 Fix typos in swift
Change-Id: I0982b0046a16fda0a39d9b31402b2e4b3160a5c4
2018年02月09日 12:22:08 +08:00
Zuul
07a5f2f8db Merge "Quarantine DB without *_stat row" 2018年02月09日 02:05:18 +00:00
Alistair Coles
1f4ebbc990 kill orphans during probe test setup
orphans processes sometimes cause probe test failures so
get rid of them before each test.
Change-Id: I4ba6748d30fbb28371f13aa95387c49bc8223402
2018年02月08日 16:43:18 -08:00
Thiago da Silva
c9410c7dd4 Move eventlet patch before call to loadapp
Ran into an eventlet bug[0] while integration Swift/Barbican
in TripleO. It is very similar to a previous bug related
to keystonemiddleware[1]. Suggestion from urllib3[2] is to
patch eventlet "as early as possible". Traceback[3] shows that
urllib3 is being imported before the eventlet patch, so moved
the patch to before the loadapp call.
[0] - http://paste.openstack.org/show/658046/
[1] - https://bugs.launchpad.net/swift/+bug/1662473
[2] - https://github.com/shazow/urllib3/issues/1104
[3] - https://gist.github.com/thiagodasilva/12dad7dc4f940b046dd0863b6f82a78b
Change-Id: I74e580f31349bdefd187cc5d6770a7041a936bef
2018年02月08日 18:52:56 -05:00
Ondřej Nový
bfe52a2e35 Quarantine DB without *_stat row
Closes-Bug: #1747689
Change-Id: Ief6bd0ba6cf675edd8ba939a36fb9d90d3f4447f
2018年02月07日 19:35:05 +01:00
Tim Burke
5b30c1f811 Fix flakey test_check_delete_headers_sets_delete_at
It was rare (saw it once in 10k runs running locally), but it's
ocassionally blown up in the gate [1]. With this, no fails locally even
after 100k runs.
[1] http://logs.openstack.org/11/538011/3/gate/swift-tox-py27/06c06f0/job-output.txt.gz#_2018年02月07日_03_29_09_578389
Change-Id: I7701d2db2ec82b48559c5b74a2e08c4403fd5dec
Related-Change: Ia126ad6988f387bbd2d1f5ddff0a56d457a1fc9b
2018年02月07日 05:50:12 +00:00
Zuul
4704eeaefb Merge "Fix inconsistency of account info in expirer's unit tests" 2018年02月05日 22:25:51 +00:00
Samuel Merritt
98d185905a Cleanup for iterators in SegmentedIterable
We had a pair of large, complicated iterators to handle fetching all
the segment data, and they were hard to read and think about. I tried
to break them out into some simpler pieces:
 * one to handle coalescing multiple requests to the same segment
 * one to handle fetching the bytes from each segment
 * one to check that the download isn't taking too long
 * one to count the bytes and make sure we sent the right number
 * one to catch errors and handle cleanup
It's more nesting, but each level now does just one thing.
Change-Id: If6f5cbd79edeff6ecb81350792449ce767919bcc
2018年02月02日 11:30:49 -08:00
Zuul
d6e911c623 Merge "Refactor expirer unit tests" 2018年02月02日 06:40:37 +00:00
Zuul
c97459b54a Merge "Remove some cruft from ratelimit tests" 2018年02月01日 18:08:19 +00:00
Zuul
82844a3211 Merge "Add support for data segments to SLO and SegmentedIterable" 2018年02月01日 12:52:55 +00:00
vxlinux
39910553df Solve the zombie process problem of Auditor
As the bug 1743310 reported,if we list the swift processes ,we will 
see a zombie process every one minute.The zombie processes numbers may 
be more than one. 
The related code as follows: swift/obj/auditor.py:386~397
 while pids:
 pid = os.wait()[0]
 # ZBF scanner must be restarted as soon as it finishes
 # unless we're in run-once mode
 if self.conf_zero_byte_fps and pid == zbf_pid and \
 len(pids) > 1 and not once:
 kwargs['device_dirs'] = override_devices
 # sleep between ZBF scanner forks
 self._sleep()
 zbf_pid = self.fork_child(zero_byte_fps=True, **kwargs)
 pids.add(zbf_pid)
 pids.discard(pid)
The pids list includes a zbf pid and one or more pids with mode 
(ALL - parallel, objectXXX).If the zbf process is the last one 
finished,then the second zbf process will not be forked.Conversely, 
the zbf process will be forked again,before the fork procedure,the main 
process will sleep for self.interval seconds,30 seconds by default,
during the self.interval time,if a non zbf process finished,this process 
will become a zombie,because the main process is in sleep.
In this solution,we move the sleep from whileloop to the subprocess,and
the main process will not be blocked in whileloop,so the subprocess
will be recovered in time.
Closes-Bug: #1743310
Change-Id: I61c766aa2a1c4bad0247a44a8e78ef38d9f3ae47
2018年02月01日 12:30:54 +00:00
Kazuhiro MIYAHARA
8140b7e7ad Fix inconsistency of account info in expirer's unit tests
In expirer's unit tests, FakeInternalClient instances simulates
expirer's task queue behavior. But get_account_info method of
the FakeInternalClient returns container count = 1 and object
count = 2, even if it simulate different count of containers or
objects.
This patch fixes the behavior. The return values of get_account_info
will be equal to simulated container and object counts.
This patch will make review for expirer's task queue upgrade patch [1]
more easy.
[1]: https://review.openstack.org/#/c/517389
Change-Id: Id5339ea7e10e4577ff22daeb91ec90f08704c98d
2018年02月01日 09:43:46 +00:00