c374a7a85152699fe1edc6078076f66fcbce3dfb
9386 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Tim Burke
|
c374a7a851 |
Allow floats for all intervals
Change-Id: I91e9bc02d94fe7ea6e89307305705c383087845a |
||
|
Zuul
|
c5fe114c96 | Merge "sharder: stall cleaving at shard range gaps" | ||
|
Zuul
|
cf6095c906 | Merge "Fix shrinking making acceptors prematurely active" | ||
|
Zuul
|
c672a1cd2c | Merge "relinker: Only mark partitions "done" if there were no (new) errors" | ||
|
Tim Burke
|
926c61bccf |
relinker: Only mark partitions "done" if there were no (new) errors
This way operators can re-run the relinker in the face of errors without needing to manually clear the state file. Change-Id: Ida1c1c0c8a695b1b226121b426b8226a43f3056b Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> |
||
|
Zuul
|
443677f104 | Merge "object: Plumb logger_thread_locals through _finalize_put" | ||
|
Alistair Coles
|
ed6586c460 |
sharder: stall cleaving at shard range gaps
Previously the sharder cleaving process would skip over gaps in shard ranges. Gaps are not normally expected, but could occur if, for example, multiple inconsistent decisions are made to configure shards for shrinking, resulting in a shrinking shard having insufficient acceptor shard to cover its namespace. In these circumstances the shrinking shard's cleaving process should stall when it encounters a gap in the acceptors. This is achieved by always checking that the lower bound of the next shard range to cleave is less than or equal to the current cleaving cursor. Cleaving will resume when a suitable acceptor becomes available to cover the namespace gap. Change-Id: I1046a5cf809d2a905ede5e1f285939c91843074d |
||
|
Alistair Coles
|
29418998b7 |
Fix shrinking making acceptors prematurely active
During sharding a shard range is moved to CLEAVED state when cleaved from its parent. However, during shrinking an acceptor shard should not be moved to CLEAVED state when the shrinking shard cleaves to it, because the shrinking shard is not the acceptor's parent and does not know if the acceptor has yet been cleaved from its parent. The existing attempt to prevent a shrinking shard updating its acceptor state relied on comparing the acceptor namespace to the shrinking shard namespace: if the acceptor namespace fully enclosed the shrinkng shard then it was inferred that shrinking was taking place. That check is sufficient for normal shrinking of one shard into an expanding acceptor, but is not sufficient when shrinking in order to fix overlaps, when a shard might shrink into more than one acceptor, none of which completely encloses the shrinking shard. Fortunately, since [1], it is possible to determine that a shard is shrinking from its own shard range state being either SHRINKING or SHRUNK. It is still advantageous to delete and merge the shrinking shard range into the acceptor when the acceptor fully encloses the shrinking shard because that increases the likelihood of the root being updated with the deleted shard range in a timely manner. [1] Related-Change: I9034a5715406b310c7282f1bec9625fe7acd57b6 Change-Id: I91110bc747323e757d8b63003ad3d38f915c1f35 |
||
|
Zuul
|
7cfdb50f93 | Merge "reconstructor: extract closure for handle_response" | ||
|
Zuul
|
020a13ed3c | Merge "reconstructor: log more details when rebuild fails" | ||
|
Zuul
|
cd87034eba | Merge "swift-manage-shard-ranges: fix exit codes" | ||
|
Zuul
|
b174d665b6 | Merge "Add sharding to swift-recon" | ||
|
Zuul
|
b709a1d4aa | Merge "Cleanup tests' import of debug_logger" | ||
|
Zuul
|
ae06540381 | Merge "ec: Don't copy EC metadata to replicated objects" | ||
|
Pete Zaitcev
|
ba00ff4376 |
Add sharding to swift-recon
Note that this does not modify the recon middleware: we already support the /sharding endpoint in it. But inexplicably there's no CLI that interrogates and parses the sharding information. The overview_container_sharding.rst document tells operators to read raw JSON. It has the advantage of always having the full picture, but these days we deserve a digest by CLI. Change-Id: Iac71f68f6633764e0c926ca60990be3b16ef6855 |
||
|
Clay Gerrard
|
ab8accbb0a |
reconstructor: extract closure for handle_response
Since _get_response was already a method, and called after the inline definition of the handle_response closure (but before we called the closure) I found the control flow to be confusingly different from visual layout of the code. Passing a few extra params around felt worth doing a: def _get_response(self, ... def _handle_response(self, ... ... and then using them, in that order, in the next method we define _make_fragment_requests (which is now quite short and obvoius, despite doing some concurrency with a GreenAsyncPile) Related-Change-Id: I3f87933f788685775ce59f3724f17d5db948d502 Change-Id: I8bca2d0804569952d31aee7de4ffe60ede4343d2 |
||
|
Clay Gerrard
|
2a312d1cd5 |
Cleanup tests' import of debug_logger
Change-Id: I19ca860deaa6dbf388bdcd1f0b0f77f72ff19689 |
||
|
Alistair Coles
|
7960097f02 |
reconstructor: log more details when rebuild fails
When the reconstructor fails to gather enough fragments to rebuild a missing fragment, log more details about the responses that it *did* get: - log total number of ok responses, as well as the number of useful responses, to reveal if, for example, there might have been duplicate frag indexes or mixed etags. - log the mix of error status codes received to reveal if, for example, they were all 404s. Also refactor reconstruct_fa to track all state related to a timestamp in a small data encapsulation class rather than in multiple dicts. Related-Bug: 1655608 Change-Id: I3f87933f788685775ce59f3724f17d5db948d502 |
||
|
Tim Burke
|
8823b45b93 |
docs: Get rid of useless page
I think this was mostly an artifact of the old admin-guide getting pushed in-tree. Change-Id: I61e63bd66ffd8207599b61721e7f0e4dc45d2e01 |
||
|
Zuul
|
8bf93f1f40 | Merge "Include sharding cycle time in recon" | ||
|
Zuul
|
3d4833bbd5 | Merge "relinker: Add start/end logs to parallel_process" | ||
|
Zuul
|
079ffbd5ab | Merge "Add more detail to ECFragGetter logging" | ||
|
Clay Gerrard
|
5516bf46c0 |
Add more detail to ECFragGetter logging
* Add new useful log messages at appropriate levels for development and production logging. * Remove more i18n translations. * Differentiate EC errors from replicated object controller messages. Related-Change-Id: I0654815543be3df059eb2875d9b3669dbd97f5b4 Change-Id: I5d22278a2be5bc1b4a9e97e3cf12b5e630ea7893 |
||
|
Tim Burke
|
f2a4c50dce |
Include sharding cycle time in recon
Change-Id: Id7e828a56c8a62a1f3e9a1dbbff5a56c928ac6b8 |
||
|
Tim Burke
|
9d6006f646 |
auditors: Log and dump recon *before* sleeping
No good reason to delay getting the info about a completed run out there. Change-Id: I7d5c19304a5c5f83558e91624de356caaf0ab4d5 |
||
|
Zuul
|
4137326378 | Merge "Refactor db auditors into a db_auditor base class" | ||
|
Zuul
|
a8e5d7646d | Merge "Use underscores instead of dashes in setup.cfg" | ||
|
Zuul
|
bd17780e7b | Merge "fix not clear cause for invalid username" | ||
|
Tim Burke
|
bacef722a3 |
Use underscores instead of dashes in setup.cfg
We've been seeing warnings like UserWarning: Usage of dash-separated 'home-page' will not be supported in future versions. Please use the underscore name 'home_page' instead for a while in our pep8 jobs; this ought to clean them up. As best I can tell, setuptools has supported either format for several years. Change-Id: I4d538859f278ac26a6b6d6845df103ebd4c49da3 |
||
|
Tim Burke
|
7087fb0d7e |
object: Plumb logger_thread_locals through _finalize_put
It doesn't explain how we saw the traceback we did (self.logger.txn_id should have been set a few lines earlier in __call__!), but at least now the transaction ID should come across if we're on the happy path. Change-Id: Ifa6f81bc02c7c84ad1f4c9ff694b645348c7c6c4 Related-Bug: #1922810 |
||
|
Tiago Primini
|
717d21ccbd |
fix not clear cause for invalid username
- add a message saying the reason for the value error exception - add a unit test to validate the expected message Change-Id: I1d6cc0faa3a43852c46089e509d48cc3ee9f9cf8 Closes-Bug: #1911811 |
||
|
Clay Gerrard
|
4a4d899680 |
Refactor EC multipart/byteranges control flow
The multipart document handling in the proxy is consumed via iteration, but the error handling code is not consistent with how it applies conversions of IO errors/timeouts and retry failures to StopIteration. In an effort to make the code more obvious and easier to debug and maintain I've added comments and additional tests as well as tightening up StopIteration exception handling. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I0654815543be3df059eb2875d9b3669dbd97f5b4 |
||
|
Zuul
|
c9b5d44e9e | Merge "Update AUTHORS" | ||
|
Zuul
|
e8580f0346 | Merge "s3api: Add config option to return 429s on ratelimit" | ||
|
Alistair Coles
|
e76ba21077 |
relinker: Add start/end logs to parallel_process
Change-Id: I2a5021c4833ff713fb6d0d39f16a6d0e523e3697 |
||
|
Zuul
|
c0fad7714e | Merge "Fix reclaim to use RECLAIM_PAGE_SIZE batches" | ||
|
Alistair Coles
|
7f35b1cc8b |
swift-manage-shard-ranges: fix exit codes
Previously swift-manage-shard-ranges would variously return either 1 or 2 in cases of invalid arguments, errors or unexpected outcomes, or if the user chose to quit before a change was applied. This patch applies a more consistent pattern to the exit codes: 0 = success 1 = an unexpected outcome, including errors 2 = invalid command line arguments or conf file options 3 = user quit Some errors that previously resulted in an exit code 2 will now exit with code 1. Change-Id: Icf170fef26ed36aab3bf845c5560f1e579a69c2b |
||
|
Alistair Coles
|
751deb9881 |
Fix reclaim to use RECLAIM_PAGE_SIZE batches
The batched reclaim would previously take batches of size RECLAIM_PAGE_SIZE + 1 from the database: fixed this to be RECLAIM_PAGE_SIZE. Change-Id: I9e15762e6886b3f63f20da2452de55e699a32900 |
||
|
Zuul
|
7bd6548510 | Merge "swift-account-audit: Log the bad status" | ||
|
Matthew Oliver
|
4cb52b44dd |
Refactor db auditors into a db_auditor base class
The container and account auditors, and their tests are almost identical. This patch reduces the code duplication by refactoring into a base class called DatabaseAuditor. This also means the container and account auditor tests have also mostly been refactored. Change-Id: I9765d65f12afec295d9eaae52858e4e7272c9c4c |
||
|
Tim Burke
|
c8de76c7fd |
swift-account-audit: Log the bad status
Change-Id: Ib28d1948a571acf31926df82dd8c24910c227053 |
||
|
Alistair Coles
|
122840cc04 |
probe test: use helper functions more widely
Use the recently added assert_subprocess_success [1] helper function more widely. Add run_custom_sharder helper. Add container-sharder key to the ProbeTest.configs dict. [1] Related-Change: I9ec411462e4aaf9f21aba6c5fd7698ff75a07de3 Change-Id: Ic2bc4efeba5ae5bc8881f0deaf4fd9e10213d3b7 |
||
|
Romain LE DISEZ
|
c13c9cc675 |
Update AUTHORS
It's been a real pleasure to work on Swift all these years with you guys. You're doing an amazing job in the best mind. Don't change anything! Change-Id: I1805fa2b471882c500755c6136b0d5a9ba7cd5b3 |
||
|
Zuul
|
d9a6fe4362 | Merge "sharder: Prevent ValueError when no cleaving contexts" | ||
|
Tim Burke
|
e53c82cd32 |
sharder: Prevent ValueError when no cleaving contexts
Otherwise, we trip an error in logs: ValueError: max() arg is an empty sequence Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: I68f52f28edf9bdb9c534983cf353b72ecfaac426 |
||
|
Zuul
|
4ff1ff1086 | Merge "Use debug_logger instead of FakeLogger in relinker tests" | ||
|
Zuul
|
6c745fc9c0 | Merge "sharding: constrain fill_gaps to own shard range bounds" | ||
|
Zuul
|
4780f812bc | Merge "relinker: trivial comment and test fixes" | ||
|
Zuul
|
346d46cd88 | Merge "relinker: Parallelize per disk" | ||
|
Tim Burke
|
1895213d25 |
Update some constraints for py2
We've recently started seeing some failures in the gate related to these projects, and they have final py2-supporting versions. Change-Id: If81fc352c8b2b1f03f3fa7b79c56dfcf981ced70 |