00373dad617db30bda3e8b722ab8518f59e0cb10
219 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Tim Burke
|
5652dec43b |
container-updater: Always report zero objects/bytes used for shards
Otherwise, a sharded container AUTH_test/sharded will have its stats included in the totals for both AUTH_test *and* .shards_AUTH_test Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I7fa74e13347601c5f44fd7e6cf65656cc3ebc2c5 |
||
|
Zuul
|
c568b4b100 | Merge "Resolve TODO's in test/probe/test_sharder.py" | ||
|
Alistair Coles
|
a59c5e3bae |
Resolve TODO's in test/probe/test_sharder.py
Resolve outstanding TODO's. One TODO is removed because there isn't an easy way to arrange for an async pending to be targeted at a shard container. Change-Id: I0b003904f73461ddb995b2e6a01e92f14283278d |
||
|
Zuul
|
ec066392b5 | Merge "Make If-None-Match:* work properly with 0-byte PUTs" | ||
|
Zuul
|
9d2a1a1d14 | Merge "Make the decision between primary/handoff sets more obvious" | ||
|
Tim Burke
|
8c386fff40 |
Make the decision between primary/handoff sets more obvious
Change-Id: I419de59df3317d67c594fe768f5696de24148280 |
||
|
Alistair Coles
|
37ee89e47a |
Avoid premature shrinking in sharder probe test
Previously test_misplaced_object_movement() deleted objects from both shards and then relied on override-partitions option to selectively run the sharder on root or shard containers and thereby control when each shard range was identified for shrinking. This approach is flawed when the second shard container lands in the same partition as the root: running the sharder on the empty second shard's partition would also cause the sharder to process the root and identify the second shard for shrinking, resulting in premature shrinking of the second shard. Now, objects are only deleted from each shard range as that shard is wanted to shrink. Change-Id: I9f51621e8414e446e4d3f3b5027f6c40e01192c3 Drive-by: use the run_sharders() helper more often. |
||
|
Alistair Coles
|
c35285f14b |
Use correct policy when faking misplaced objects in probe test
Before, merge_objects() always used storage policy index of 0 when inserting a fake misplaced object into a shard container. If the shard broker had a different policy index then the misplaced object would not show in listings causing test_misplaced_object_movement() to fail. This test bug might be exposed by having policy index 0 be an EC policy, since the probe test requires a replication policy and would therefore choose a non-zero policy index. The fix is simply to specify the shard's policy index when inserting the fake object. Change-Id: Iec3f8ec29950220bb1b2ead9abfdfb1a261517d6 |
||
|
Matthew Oliver
|
2641814010 |
Add sharder daemon, manage_shard_ranges tool and probe tests
The sharder daemon visits container dbs and when necessary executes the sharding workflow on the db. The workflow is, in overview: - perform an audit of the container for sharding purposes. - move any misplaced objects that do not belong in the container to their correct shard. - move shard ranges from FOUND state to CREATED state by creating shard containers. - move shard ranges from CREATED to CLEAVED state by cleaving objects to shard dbs and replicating those dbs. By default this is done in batches of 2 shard ranges per visit. Additionally, when the auto_shard option is True (NOT yet recommeneded in production), the sharder will identify shard ranges for containers that have exceeded the threshold for sharding, and will also manage the sharding and shrinking of shard containers. The manage_shard_ranges tool provides a means to manually identify shard ranges and merge them to a container in order to trigger sharding. This is currently the recommended way to shard a container. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f |
||
|
Alistair Coles
|
9d742b85ad |
Refactoring, test infrastructure changes and cleanup
...in preparation for the container sharding feature. Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I4455677abb114a645cff93cd41b394d227e805de |
||
|
Tim Burke
|
b640631daf |
Apply remote metadata in _handle_sync_response
We've already got it in the response, may as well apply it now rather than wait for the other end to get around to running its replicators. Change-Id: Ie36a6dd075beda04b9726dfa2bba9ffed025c9ef |
||
|
Tim Burke
|
748b29ef80 |
Make If-None-Match:* work properly with 0-byte PUTs
When PUTting an object with `If-None-Match: *`, we rely 100-continue support: the proxy checks the responses from all object-servers, and if any of them respond 412, it closes down the connections. When there's actual data for the object, this ensures that even nodes that *don't* respond 412 will hit a ChunkReadTimeout and abort the PUT. However, if the client does a PUT with a Content-Length of 0, that would get sent all the way to the object server, which had all the information it needed to respond 201. After replication, the PUT propagates to the other nodes and the old object is lost, despite the client receiving a 412 indicating the operation failed. Now, when PUTting a zero-byte object, switch to a chunked transfer so the object-server still gets a ChunkReadTimeout. Change-Id: Ie88e41aca2d59246c3134d743c1531c8e996f9e4 |
||
|
Alistair Coles
|
1f4ebbc990 |
kill orphans during probe test setup
orphans processes sometimes cause probe test failures so get rid of them before each test. Change-Id: I4ba6748d30fbb28371f13aa95387c49bc8223402 |
||
|
Samuel Merritt
|
745581ff2f |
Don't make async_pendings during object expiration
After deleting an object, the object expirer deletes the corresponding row from the expirer queue by making DELETE requests directly to the container servers. The same thing happens after attempting to delete an object, but failing because the object has already been deleted. If the DELETE requests fail, then the expirer will encounter that row again on its next pass and retry the DELETE at that time. Therefore, it is not necessary for the object server to write an async_pending for that queue row's deletion. Currently, however, two of the object servers do write such async_pendings. Given Rc container replicas, that's 2 * Rc updates from async_pendings and another Rc from the object expirer directly. Given a typical Rc of 3, that's 9 container updates per expiring object. This commit makes the object server write no async_pendings for DELETE requests coming from the object expirer. This reduces the number of container server requests to Rc (typically 3), all issued directly from the object expirer. Closes-Bug: 1076202 Change-Id: Icd63c80c73f864d2561e745c3154fbfda02bd0cc |
||
|
Clay Gerrard
|
7afc6a06ee |
Remove un-needed hack in probetest
If you ran this probe test with ssync before the related change it would demonstrate the related bug. The hack isn't harmful, but it isn't needed anymore. Related-Change-Id: I7f90b732c3268cb852b64f17555c631d668044a8 Related-Bug: 1652323 Change-Id: I09e3984a0500a0f4eceec392e7970b84070a5b39 |
||
|
Clay Gerrard
|
1d5cf3e730 |
add symlink to probetest for reconciler
Change-Id: Ib2c5616f2965ab92b1c76d573e869206c91464c6 |
||
|
Robert Francis
|
99b89aea10 |
Symlink implementation.
Add a symbolic link ("symlink") object support to Swift. This
object will reference another object. GET and HEAD
requests for a symlink object will operate on the referenced object.
DELETE and PUT requests for a symlink object will operate on the
symlink object, not the referenced object, and will delete or
overwrite it, respectively.
POST requests are *not* forwarded to the referenced object and should
be sent directly. POST requests sent to a symlink object will
result in a 307 Error.
Historical information on symlink design can be found here:
https://github.com/openstack/swift-specs/blob/master/specs/in_progress/symlinks.rst.
https://etherpad.openstack.org/p/swift_symlinks
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Janie Richling <jrichli@us.ibm.com>
Co-Authored-By: Kazuhiro MIYAHARA <miyahara.kazuhiro@lab.ntt.co.jp>
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I838ed71bacb3e33916db8dd42c7880d5bb9f8e18
Signed-off-by: Thiago da Silva <thiago@redhat.com>
|
||
|
Steve Kowalik
|
5a06e3da3b |
No longer import nose
Since Python 2.7, unittest in the standard library has included mulitple facilities for skipping tests by decorators as well as an exception. Switch to that directly, rather than importing nose. Change-Id: I4009033473ea24f0d0faed3670db844f40051f30 |
||
|
Zuul
|
70a47b3187 | Merge "Return 404 on a GET if tombstone is newer" | ||
|
Clay Gerrard
|
feee399840 |
Use check_drive consistently
We added check_drive to the account/container servers to unify how all the storage wsgi servers treat device dirs/mounts. Thus pushes that unification down into the consistency engine. Drive-by: * use FakeLogger less * clean up some repeititon in probe utility for device re-"mounting" Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764 Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a |
||
|
Thiago da Silva
|
8d88209537 |
Return 404 on a GET if tombstone is newer
Currently the proxy keeps iterating through the connections in hope of finding a success even if it already has found a tombstone (404). This change changes the code a little bit to compare the timestamp of a 200 and a 404, if the tombstone is newer, then it should be returned, instead of returning a stale 200. Closes-Bug: #1560574 Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: Ia81d6832709d18fe9a01ad247d75bf765e8a89f4 Signed-off-by: Thiago da Silva <thiago@redhat.com> |
||
|
Romain LE DISEZ
|
e199192cae |
Replace replication_one_per_device by custom count
This commit replaces boolean replication_one_per_device by an integer replication_concurrency_per_device. The new configuration parameter is passed to utils.lock_path() which now accept as an argument a limit for the number of locks that can be acquired for a specific path. Instead of trying to lock path/.lock, utils.lock_path() now tries to lock files path/.lock-X, where X is in the range (0, N), N being the limit for the number of locks allowed for the path. The default value of limit is set to 1. Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572 |
||
|
Kota Tsuyuzaki
|
1e79f828ad |
Remove all post_as_copy related code and configes
It was deprecated and we discussed on this topic in Denver PTG for Queen cycle. Main motivation for this work is that deprecated post_as_copy option and its gate blocks future symlink work. Change-Id: I411893db1565864ed5beb6ae75c38b982a574476 |
||
|
Jenkins
|
8ca5bf2364 | Merge "Add probe test for ssync of unexpired metadata to an expired object" | ||
|
Alistair Coles
|
e109c7800f |
Add probe test for ssync of unexpired metadata to an expired object
Verify that metadata can be sync'd to a frag that has missed a POST and consequently that frag appears to be expired, when in fact the POST removed the X-Delete-At header. Tests the fix added by the Related-Change. Related-Bug: #1683689 Related-Change: I919994ead2b20dbb6c5671c208823e8b7f513715 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I9af9fc26098893db4043cc9a8d05d772772d4259 |
||
|
Tim Burke
|
00ca1ce6fe |
Tolerate swiftclient *not* mutatinng args
Change-Id: If82fe9e1d2da8c5122881f34dfbaaa7944c66265 Related-Change: Ia1638c216eff9db6fbe416bc0570c27cfdcfe730 |
||
|
Romain LE DISEZ
|
69df458254 |
Allow to rebuild a fragment of an expired object
When a fragment of an expired object was missing, the reconstructor ssync job would send a DELETE sub-request. This leads to situation where, for the same object and timestamp, some nodes have a data file, while others can have a tombstone file. This patch forces the reconstructor to reconstruct a data file, even for expired objects. DELETE requests are only sent for tombstoned objects. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Closes-Bug: #1652323 Change-Id: I7f90b732c3268cb852b64f17555c631d668044a8 |
||
|
Thiago da Silva
|
d0bfd036af |
ready yet? nope, please wait!
Related-Change: Iab923c4f48ac7a5dd41237761ed91d01a59dc77c Change-Id: Id4e17569e9ec856663e1539eaf72872296698367 Signed-off-by: Thiago da Silva <thiago@redhat.com> |
||
|
Jenkins
|
e94b383655 | Merge "Add support to increase object ring partition power" | ||
|
liuyamin
|
006a378193 |
Add license in swift code file
Source code should be licensed under the Apache 2.0 license. Add Apache License in swift/probe/__init__.py file. Change-Id: I3b6bc2ec5fe5caac87ee23f637dbcc7a5d8fc331 |
||
|
Christian Schwede
|
e1140666d6 |
Add support to increase object ring partition power
This patch adds methods to increase the partition power of an existing object ring without downtime for the users using a 3-step process. Data won't be moved to other nodes; objects using the new increased partition power will be located on the same device and are hardlinked to avoid data movement. 1. A new setting "next_part_power" will be added to the rings, and once the proxy server reloaded the rings it will send this value to the object servers on any write operation. Object servers will now create a hard-link in the new location to the original DiskFile object. Already existing data will be relinked using a new tool in the new locations using hardlinks. 2. The actual partition power itself will be increased. Servers will now use the new partition power to read from and write to. No longer required hard links in the old object location have to be removed now by the relinker tool; the relinker tool reads the next_part_power setting to find object locations that need to be cleaned up. 3. The "next_part_power" flag will be removed. This mostly implements the spec in [1]; however it's not using an "epoch" as described there. The idea of the epoch was to store data using different partition powers in their own namespace to avoid conflicts with auditors and replicators as well as being able to abort such an operation and just remove the new tree. This would require some heavy change of the on-disk data layout, and other object-server implementations would be required to adopt this scheme too. Instead the object-replicator is now aware that there is a partition power increase in progress and will skip replication of data in that storage policy; the relinker tool should be simply run and afterwards the partition power will be increased. This shouldn't take that much time (it's only walking the filesystem and hardlinking); impact should be low therefore. The relinker should be run on all storage nodes at the same time in parallel to decrease the required time (though this is not mandatory). Failures during relinking should not affect cluster operations - relinking can be even aborted manually and restarted later. Auditors are not quarantining objects written to a path with a different partition power and therefore working as before (though they are reading each object twice in the worst case before the no longer needed hard links are removed). Co-Authored-By: Alistair Coles <alistair.coles@hpe.com> Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Co-Authored-By: Tim Burke <tim.burke@gmail.com> [1] https://specs.openstack.org/openstack/swift-specs/specs/in_progress/ increasing_partition_power.html Change-Id: I7d6371a04f5c1c4adbb8733a71f3c177ee5448bb |
||
|
Jenkins
|
75c1bea7a5 | Merge "Cleanup db replicator probetest" | ||
|
Jenkins
|
d46b0f29f9 | Merge "Limit number of revert tombstone SSYNC requests" | ||
|
Mahati Chamarthy
|
188c07e12a |
Limit number of revert tombstone SSYNC requests
Revert tombstone only parts try to talk to all primary nodes - this fixes it to randomize selection within part_nodes. Corresponding probe test is modified to reflect this change. The primary improvement of this patch is the reconstuctor at a handoff node is being able to delete local tombstones when it succeeds to sync to less than all primary nodes. (Before this patch, it requires all nodes are responsible for the REVERT requests) The number of primary nodes to communicate with the reconstructor can be in dicsussion more but, right now with this patch, it's (replicas - k + 1) that is able to prevent stale read. *BONUS* - Fix mis-testsetting (was setting less replicas than ec_k + ec_m) for reconstructor ring in the unit test Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I05ce8fe75f1c4a7971cc8995b003df818b69b3c1 Closes-Bug: #1668857 |
||
|
Jenkins
|
c36438aa26 | Merge "Use config_number method instead of node_id + 1" | ||
|
Jenkins
|
e640563b18 | Merge "Sync metadata in 'rsync_then_merge' in db_replicator" | ||
|
Clay Gerrard
|
6687d2fcd0 |
Cleanup db replicator probetest
Use a manager and config_number helper for clarity - then make a one last assertion on the final consistent state. Change-Id: I5030c314076003d17c41b8b136bcbda252474bad Related-Change-Id: Icdf0a936fc456c5462471938cbc365bd012b05d4 |
||
|
Kota Tsuyuzaki
|
c23b6db264 |
Use config_number method instead of node_id + 1
Change-Id: I596a2ac947f7e5f9c0cb3286779ece5e40feefd0 |
||
|
Jenkins
|
e11eb88ad9 | Merge "Remove deprecated vm_test_mode option" | ||
|
Jenkins
|
1b24b0eb7a | Merge "Make probe tests work when policy-0 is EC" | ||
|
Tim Burke
|
f64fa46f3a |
Make probe tests work when policy-0 is EC
test_object_metadata_replication requires a replicated policy. We even have it subclass ReplProbeTest, but then we hardcoded the policy index! Stop doing that. Change-Id: I8871cc0beceb0909abaf59babe40c3cafbcd0cc9 |
||
|
Vu Cong Tuan
|
cf3c970a77 |
Trivial fix typos
Change-Id: I7e1e3b2f92183b2a249299659f0778fe838212e2 |
||
|
Tim Burke
|
675145ef4a |
Remove deprecated vm_test_mode option
This was deprecated in the 2.5.0 release (i.e. Liberty cycle), and we've been warning about it ever since. A year and a half seems like a long enough time. Change-Id: I5688e8f7dedb534071e67d799252bf0b2ccdd9b6 Related-Change: Iad91df50dadbe96c921181797799b4444323ce2e |
||
|
Daisuke Morita
|
843184f3fe |
Sync metadata in 'rsync_then_merge' in db_replicator
In previous 'rsync_then_merge' remote objects are merged with rsync'ed local objects, but remote metadata is not merged with local one. Account/Container replicator sometimes uses rsync for db sync if there is a big difference of record history in db files between 'local' and 'remote' servers. If replicator needs to rsync local db to remote but metadata in local db is older, older info of metadata can be distributed then some metadata values can be missing or go back to older. This patch fixes this problem by merging 'remote' metadata with rsync'ed local db file. Closes-Bug: #1570118 Change-Id: Icdf0a936fc456c5462471938cbc365bd012b05d4 |
||
|
Romain LE DISEZ
|
091157fc7f |
Fix encoding issue in ssync_sender.send_put()
EC object metadata can currently have a mixture of bytestrings and unicode. The ssync_sender.send_put() method raises an UnicodeDecodeError when it attempts to concatenate the metadata values, if any bytestring has non-ascii characters. The root cause of this issue is that the object server uses unicode for the keys of some object metadata items that are received in the footer of an EC PUT request, whereas all other object metadata keys and values are persisted as bytestrings. This patch fixes the bug by changing diskfile write_metadata() function to encode all unicode metadata keys and values as utf8 encoded bytes before writing to disk. To cope with existing objects that have a mixture of unicode and bytestring metadata, the diskfile read_metadata() function is also changed so that all returned unicode metadata keys and values are utf8 encoded. This ensures that ssync_sender.send_put() (and any other caller of diskfile read_metadata) only reads bytestrings from object metadata. Closes-Bug: #1678018 Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: Ic23c55754ee142f6f5388dcda592a3afc9845c39 |
||
|
Alistair Coles
|
83750cf79c |
Fix UnicodeDecodeError in reconstructor _full_path function
Object paths can have non-ascii characters. Device dicts will have unicode values. Forming a string using both will cause the object path to be coerced to UTF8, which currently causes a UnicodeDecodeError. This causes _get_response() to not return and the recosntructor hangs. The call to _full_path() is moved outside of _get_response() (where its result is used in the exception handler logging) so that _get_response() will always return even if _full_path() raises an exception. Unit tests are refactored to split out a new class with those tests using an object name and the _full_path method, so that the class can be subclassed to use an object name with non-ascii characters. Existing probe tests are subclassed to repeat using non-ascii chars in object paths. Change-Id: I4c570c08c770636d57b1157e19d5b7034fd9ed4e Closes-Bug: 1679175 |
||
|
XieYingYun
|
36b1a2f69f |
Fix some reST field lists in docstrings
Probably the most common format for documenting arguments is reST field lists [1]. This change updates some docstrings to comply with the field lists syntax. [1] http://sphinx-doc.org/domains.html#info-field-lists Change-Id: I87e77a9bbd5bcb834b35460ce0adff5bc59d9168 |
||
|
Timur Alperovich
|
2e199be604 |
Probe tests fail, as requests checks for strings.
The requests library checks that the headers are either strings or bytes. Currently, the two test_object_expirer tests fail with the message: InvalidHeader: Header value 1487879553 must be of type str or bytes, not <type 'int'> The header in question is "x-delete-at". The patch converts it to a string, before making a Swift Client request. Change-Id: I738697cb6b696f0e346345f75e0069048961f2ff |
||
|
Thiago da Silva
|
04502a9f64 |
Fix test comment and remove extra parameter
Fixed the comment in the test to match exactly what's being removed and what the expected result is. Also, removed that extra '/' parameter which was causing the assert to test at the wrong directory level. Change-Id: I2f27f0d12c08375c61047a3f861c94a3dd3915c6 Signed-off-by: Thiago da Silva <thiago@redhat.com> |
||
|
Jenkins
|
c8a2b77313 | Merge "Fix test_delete_propagate probe test" |