94241a81c0ed164f618c33770eafc535a1a6cb3d
468 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Jenkins
|
94241a81c0 | Merge "Improve domain_remap docs" | ||
|
Jenkins
|
ec3fffabfa | Merge "Add account_autocreate=true to internal-client.conf-sample" | ||
|
Alistair Coles
|
93fc9d2de8 |
Add cautionary note re delay_reaping in account-server.conf-sample
Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c Related-Bug: #1514528 |
||
|
Alistair Coles
|
c662e5fc8e |
Add account_autocreate=true to internal-client.conf-sample
Closes-Bug: #1698426 Change-Id: I8a29a685bb12e60f4da4a0dc8270b408241ec415 |
||
|
Jenkins
|
23de16b0bf | Merge "Move listing formatting out to proxy middleware" | ||
|
Jenkins
|
3fda32470b | Merge "Remove all post_as_copy related code and configes" | ||
|
Kota Tsuyuzaki
|
1e79f828ad |
Remove all post_as_copy related code and configes
It was deprecated and we discussed on this topic in Denver PTG for Queen cycle. Main motivation for this work is that deprecated post_as_copy option and its gate blocks future symlink work. Change-Id: I411893db1565864ed5beb6ae75c38b982a574476 |
||
|
Tim Burke
|
4806434cb0 |
Move listing formatting out to proxy middleware
Make some json -> (text, xml) stuff in a common module, reference that in account/container servers so we don't break existing clients (including out-of-date proxies), but have the proxy controllers always force a json listing. This simplifies operations on listings (such as the ones already happening in decrypter, or the ones planned for symlink and sharding) by only needing to consider a single response type. There is a downside of larger backend requests for text/plain listings, but it seems like a net win? Change-Id: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d |
||
|
junboli
|
df00122e74 |
doc migration: update the doc link address[2/3]
Update the doc link brought by the doc migration. Although we had some effort to fix these, it still left lots of bad doc link, I separate these changes into 3 patches aim to fix all of these, this is the 2st patch for doc/manpages. Change-Id: Id426c5dd45a812ef801042834c93701bb6e63a05 |
||
|
Jenkins
|
b0142d0cd2 | Merge "Retrieve encryption root secret from Barbican" | ||
|
Alistair Coles
|
9c3c388091 |
Improve domain_remap docs
* Make the conditions for remapping clearer * Mention the path_root * Mention '-' -> '_' replacement in account names * Make example consistent with default options Change-Id: Ifd3f3775bb8b13367d964010f35813018b5b41b3 |
||
|
shangxiaobj
|
c93c0c0c6e |
[Trivialfix]Fix typos in swift
Fix typos that found in swift. Change-Id: I52fad1a4882cec4456f22174b46d54e42ec66d97 |
||
|
Mathias Bjoerkqvist
|
77bd74da09 |
Retrieve encryption root secret from Barbican
This patch adds support for retrieving the encryption root secret from an external key management system. In practice, this is currently limited to Barbican. Change-Id: I1700e997f4ae6fa1a7e68be6b97539a24046e80b |
||
|
Jenkins
|
019515e153 | Merge "Update and optimize documentation links" | ||
|
junboli
|
0d07e3fdb1 |
Update and optimize documentation links
* Update URLs according to document migration * Update the dead and outdated links Change-Id: Id92552f4a2d0fb79ddefc55a08636f2e7aeb07cb |
||
|
Clay Gerrard
|
701a172afa |
Add multiple worker processes strategy to reconstructor
This change adds a new Strategy concept to the daemon module similar to how we manage WSGI workers. We need to leverage multiple python processes to get the concurrency properties we need. More workers will rebalance much faster on dense chassis with many devices. Currently the default is still only one process, and no workers. Set reconstructor_workers in the [object-reconstructor] section to some whole number <= the number of devices on a node to get that many reconstructor workers. Each worker will operate on a different subset of disks. Once mode works as before, but tends to want to update recon drops a little bit more. If you change the rings, the strategy will shutdown workers and spawn new ones. You can kill the worker pids and the daemon strategy will respawn them. New per-disk reconstructor stats are dumped to recon under the object_reconstruction_per_disk key. To maintain legacy compatibility and replication monitoring based on cycle times they are aggregated every stats_interval (default 5 mins). Change-Id: I28925a37f3985c9082b5a06e76af4dc3ec813abe |
||
|
Alistair Coles
|
9c5628b4f1 |
Add reconstructor section to deployment guide
Change-Id: I062998e813718828b7adf4e7c3f877b6a31633c0 Closes-Bug: #1626290 |
||
|
Jenkins
|
c3f6e82ae1 | Merge "Write-affinity aware object deletion" | ||
|
Jenkins
|
f1e1dbb80a | Merge "Make eventlet.tpool's thread count configurable in object server" | ||
|
Lingxian Kong
|
831eb6e3ce |
Write-affinity aware object deletion
When deleting objects in multi-region swift delpoyment with write affinity configured, users always get 404 when deleting object before it's replcated to approriate nodes. This patch adds a config item 'write_affinity_handoff_delete_count' so that operator could define how many local handoff nodes should swift send request to get more candidates for the final response, or by default just leave it to swift to calculate the appropriate number. Change-Id: Ic4ef82e4fc1a91c85bdbc6bf41705a76f16d1341 Closes-Bug: #1503161 |
||
|
Samuel Merritt
|
d9c4913e3b |
Make eventlet.tpool's thread count configurable in object server
If you're running servers_per_port > 0 and threads_per_disk = 0 (as it should be with servers_per_port on), each object-server process will have 20 IO threads waiting around to service eventlet.tpool calls. This is far too many; with servers_per_port, there's no real benefit to having so many IO threads. This commit makes it so that, when servers_per_port > 0, each object server defaults to having one main thread and one IO thread. Also, eventlet's tpool size is now configurable via the object-server config file. If a tpool size is set, that's what we'll use regardless of servers_per_port. This allows operators with an excess of threads to remove some regardless of servers_per_port. Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027 Closes-Bug: 1554233 |
||
|
Jenkins
|
2d18ecdf4b | Merge "Replace slowdown option with *_per_second option" | ||
|
Jenkins
|
169d1d8ab8 | Merge "Require that known-bad EC schemes be deprecated" | ||
|
Ondřej Nový
|
a8bc94c7e3 |
Replace slowdown option with *_per_second option
container and object updaters sleeps "slowdown" (default 0.01) seconds after every processed container/object. Because time.sleep call adds overhead, use ratelimit_sleep from common.utils instead. Same as in auditor. Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212 |
||
|
Tim Burke
|
2c3ac543f4 |
Require that known-bad EC schemes be deprecated
We said we were going to do it, we've had two releases saying we'd do it, we've even backported our saying it to Newton -- let's actually do it. Upgrade Consideration ===================== Erasure-coded storage policies using isa_l_rs_vand and nparity >= 5 must be configured as deprecated, preventing any new containers from being created with such a policy. This configuration is known to harm data durability. Any data in such policies should be migrated to a new policy. See https://bugs.launchpad.net/swift/+bug/1639691 for more information. UpgradeImpact Related-Change: I50159c9d19f2385d5f60112e9aaefa1a68098313 Change-Id: I8f9de0bec01032d9d9b58848e2a76ac92e65ab09 Closes-Bug: 1639691 |
||
|
Jenkins
|
7bbe02b290 | Merge "Allow to configure the nameservers in cname_lookup" | ||
|
Jenkins
|
e11eb88ad9 | Merge "Remove deprecated vm_test_mode option" | ||
|
Romain LE DISEZ
|
420e73fabd |
Allow to configure the nameservers in cname_lookup
For various reasons, an operator might want to use specifics nameservers instead of the systems ones to resolve CNAME in cname_lookup. This patch creates a new configuration variable nameservers which accepts a list of nameservers separated by commas. If not specified or empty, systems namservers are used as previously. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: I34219e6ab7e45678c1a80ff76a1ac0730c64ddde |
||
|
Tim Burke
|
5ecf828b17 |
Follow-up for per-policy proxy configs
* Only use one StringIO in ConfigString * Rename the write_affinity_node_count function to be write_affinity_node_count_fn * Use comprehensions instead of six.moves.filter * Rename OverrideConf to ProxyOverrideOptions * Make ProxyOverrideOptions's __repr__ eval()able * Various conf -> options renames * Stop trying to handle a KeyError that should never come up * Be explicit about how deep we need to copy in proxy/test_server.py * Drop an unused return value * Add a test for a non-"proxy-server" app name * Combine bad-section-name tests * Try to clean up (at least a little) a self-described "hokey test" Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc Change-Id: I4e81175d5445049bc1f48b3ac02c5bc0f77e6f59 |
||
|
Tim Burke
|
675145ef4a |
Remove deprecated vm_test_mode option
This was deprecated in the 2.5.0 release (i.e. Liberty cycle), and we've been warning about it ever since. A year and a half seems like a long enough time. Change-Id: I5688e8f7dedb534071e67d799252bf0b2ccdd9b6 Related-Change: Iad91df50dadbe96c921181797799b4444323ce2e |
||
|
Alistair Coles
|
45884c1102 |
Enable per policy proxy config options
This is an alternative approach to that proposed in [1] Adds support for optional per-policy config sections to be added in proxy-server.conf. This is highly desirable to allow per-policy affinity options to be set for use with duplicated EC policies [2] and composite rings [3]. Certain options found in per-policy conf sections will override their equivalents that may be set in the [app:proxy-server] section. Currently the options handled that way are: sorting_method read_affinity write_affinity write_affinity_node_count For example: [proxy-server:policy:0] sorting_method = affinity read_affinity = r1=100 write_affinity = r1 write_affinity_node_count = 1 * replicas The corresponding attributes of the proxy-server Application are now available from instances of an OverrideConf object that is obtained from Application.get_policy_options(policy). [1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389 [2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305 [3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797 Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp> Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc |
||
|
Alistair Coles
|
f02ec4de81 |
Add read and write affinity options to deployment guide
Add entries for these options in the deployment guide and make the text in proxy-server.conf-sample and man page consistent. Change-Id: I5854ddb3e5864ddbeaf9ac2c930bfafdb47517c3 |
||
|
lijunbo
|
21396bc106 |
keep consistent naming convention of swift and urls
Change-Id: Iddd4f69abf77a5c643ce8b164fc6cfd72c068229 |
||
|
Jenkins
|
b43414c905 | Merge "Accept storage_domain as a list in domain_remap" | ||
|
Jenkins
|
1e9b8888bf | Merge "Enable cluster-wide CORS Expose-Headers setting" | ||
|
Romain LE DISEZ
|
9b47de3095 |
Enable cluster-wide CORS Expose-Headers setting
An operator proposing a web UX to its customers might want to allow web browser to access some headers by default (eg: X-Storage-Policy, X-Container-Read, ...). This commit adds a new setting to the proxy-server to allow some headers to be added cluster-wide to the CORS header Access-Control-Expose-Headers. Change-Id: I5ca90a052f27c98a514a96ee2299bfa1b6d46334 |
||
|
Kota Tsuyuzaki
|
40ba7f6172 |
EC Fragment Duplication - Foundational Global EC Cluster Support
This patch enables efficent PUT/GET for global distributed cluster[1]. Problem: Erasure coding has the capability to decrease the amout of actual stored data less then replicated model. For example, ec_k=6, ec_m=3 parameter can be 1.5x of the original data which is smaller than 3x replicated. However, unlike replication, erasure coding requires availability of at least some ec_k fragments of the total ec_k + ec_m fragments to service read (e.g. 6 of 9 in the case above). As such, if we stored the EC object into a swift cluster on 2 geographically distributed data centers which have the same volume of disks, it is likely the fragments will be stored evenly (about 4 and 5) so we still need to access a faraway data center to decode the original object. In addition, if one of the data centers was lost in a disaster, the stored objects will be lost forever, and we have to cry a lot. To ensure highly durable storage, you would think of making *more* parity fragments (e.g. ec_k=6, ec_m=10), unfortunately this causes *significant* performance degradation due to the cost of mathmetical caluculation for erasure coding encode/decode. How this resolves the problem: EC Fragment Duplication extends on the initial solution to add *more* fragments from which to rebuild an object similar to the solution described above. The difference is making *copies* of encoded fragments. With experimental results[1][2], employing small ec_k and ec_m shows enough performance to store/retrieve objects. On PUT: - Encode incomming object with small ec_k and ec_m <- faster! - Make duplicated copies of the encoded fragments. The # of copies are determined by 'ec_duplication_factor' in swift.conf - Store all fragments in Swift Global EC Cluster The duplicated fragments increase pressure on existing requirements when decoding objects in service to a read request. All fragments are stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index encoded by PyECLib, there *will* be duplicates. Anytime we must decode the original object data, we must only consider the ec_k fragments as unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and avoided if possible. On GET: This patch inclues following changes: - Change GET Path to sort primary nodes grouping as subsets, so that each subset will includes unique fragments - Change Reconstructor to be more aware of possibly duplicate fragments For example, with this change, a policy could be configured such that swift.conf: ec_num_data_fragments = 2 ec_num_parity_fragments = 1 ec_duplication_factor = 2 (object ring must have 6 replicas) At Object-Server: node index (from object ring): 0 1 2 3 4 5 <- keep node index for reconstruct decision X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual fragment index for backend (PyEClib) Additional improvements to Global EC Cluster Support will require features such as Composite Rings, and more efficient fragment rebalance/reconstruction. 1: http://goo.gl/IYiNPk (Swift Design Spec Repository) 2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo) Doc-Impact Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: Idd155401982a2c48110c30b480966a863f6bd305 |
||
|
Romain LE DISEZ
|
5c93d6f238 |
Accept storage_domain as a list in domain_remap
Middleware domain_remap can work with cname_lookup middleware. This last middleware accept that storage_domain is a list of domains. To be consistent, domain_remap should have the same behavior. Closes-Bug: #1664647 Change-Id: Iacc6619968cc7c677bf63e0b8d101a20c86ce599 |
||
|
Clay Gerrard
|
da557011ec |
Deprecate broken handoffs_first in favor of handoffs_only
The handoffs_first mode in the replicator has the useful behavior of processing all handoff parts across all disks until there aren't any handoffs anymore on the node [1] and then it seemingly tries to drop back into normal operation. In practice I've only ever heard of handoffs_first used while rebalancing and turned off as soon as the rebalance finishes - it's not recommended to run with handoffs_first mode turned on and it emits a warning on startup if option is enabled. The handoffs_first mode on the reconstructor doesn't work - it was prioritizing handoffs *per-part* [2] - which is really unfortunate because in the reconstructor during a rebalance it's often *much* more attractive from an efficiency disk/network perspective to revert a partition from a handoff than it is to rebuild an entire partition from another primary using the other EC fragments in the cluster. This change deprecates handoffs_first in favor of handoffs_only in the reconstructor which is far more useful - and just like handoffs_first mode in the replicator - it gives the operator the option of forcing the consistency engine to focus on rebalance. The handoffs_only behavior is somewhat consistent with the replicator's handoffs_first option (any error on any handoff in the replicactor will make it essentially handoff only forever) but the option does what you want and is named correctly in the reconstructor. For consistency with the replicator the reconstructor will mostly honor the handoffs_first option, but if you set handoffs_only in the config it always takes precedence. Having handoffs_first in your config always results in a warning, but if handoff_only is not set and handoffs_first is true the reconstructor will assume you need handoffs_only and behaves as such. When running in handoffs_only mode the reconstructor will start to log a warning every cycle if you leave it running in handoffs_only after it finishes reverting handoffs. However you should be monitoring on-disk partitions and disable the option as soon as the cluster finishes the full rebalance cycle. 1. Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea fixed replicator handoffs_first "mode" 2. Unlike replication each partition in a EC policy can have a different kind of job per frag_index, but the cardinality of jobs is typically only one (either sync or revert) unless there's been a bunch of errors during write and then handoffs partitions maybe hold a number of different fragments. Known-Issues: handoffs_only is not documented outside of the example config, see lp bug #1626290 Closes-Bug: #1653018 Change-Id: Idde4b6cf92fab6c45f2c0c2733277701eb436898 |
||
|
Jenkins
|
63b351893d | Merge "Default object_post_as_copy to False" | ||
|
Tim Burke
|
4ee20dba48 |
Default object_post_as_copy to False
Additionally, emit deprecation warnings when running POST-as-COPY Change-Id: I11324e711057f7332577fd38f9bff82bdc6aac90 |
||
|
Mahati Chamarthy
|
69f7be99a6 |
Move documented reclaim_age option to correct location
The reclaim_age is a DiskFile option, it doesn't make sense for two different object services or nodes to use different values. I also driveby cleanup the reclaim_age plumbing from get_hashes to cleanup_ondisk_files since it's a method on the Manager and has access to the configured reclaim_age. This fixes a bug where finalize_put wouldn't use the [DEFAULT]/object-server configured reclaim_age - which is normally benign but leads to weird behavior on DELETE requests with really small reclaim_age. There's a couple of places in the replicator and reconstructor that reach into their manager to borrow the reclaim_age when emptying out the aborted PUTs that failed to cleanup their files in tmp - but that timeout doesn't really need to be coupled with reclaim_age and that method could have just as reasonably been implemented on the Manager. UpgradeImpact: Previously the reclaim_age was documented to be configurable in various object-* services config sections, but that did not work correctly unless you also configured the option for the object-server because of REPLICATE request rehash cleanup. All object services must use the same reclaim_age. If you require a non-default reclaim age it should be set in the [DEFAULT] section. If there are different non-default values, the greater should be used for all object services and configured only in the [DEFAULT] section. If you specify a reclaim_age value in any object related config you should move it to *only* the [DEFAULT] section before you upgrade. If you configure a reclaim_age less that your consistency window you are likely to be eaten by a Grue. Closes-Bug: #1626296 Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> |
||
|
Ondřej Nový
|
ae7dddd801 |
Added comment for "user" option in drive-audit config
Change-Id: I24362826bee85ac3304e9b63504c9465da673014 |
||
|
Tim Burke
|
f850ff065e |
SLO: Concurrently HEAD segments
Before creating a static large object, we must verify that all of the referenced segments exist. Previously, this was done sequentially; due to latency between proxy and object nodes, clients must be careful to either keep their segment count low or use very long (minute+) timeouts. We mitigate this somewhat by enforcing a hard limit on segment count, but even then, HEADing a thousand segments (the default limit) with an average latency of (say) 100ms will require more than a minute and a half. Further, the nested-SLO approach requires multiple requests from the client -- as a result, Swift3 is in the position of enforcing a lower limit than S3's 10,000 (which will break some clients) or requiring that clients have timeouts on the order of 15-20 minutes (!). Now, we'll perform the segment HEADs in parallel, with a concurrency factor set by the operator. This is very similar to (and builds upon) the parallel-bulk-delete work. By default, two HEAD requests will be allowed at a time. As a side-effect, we'll also only ever HEAD a path once per manifest. Previously, if a manifest alternated between two paths repeatedly (for instance, because the user wanted to splice together various ranges from two sub-SLOs), then each entry in the manifest would trigger a fresh HEAD request. Upgrade Consideration ===================== If operators would like to preserve the prior (single-threaded) SLO creation behavior, they must add the following line to their [filter:slo] proxy config section: concurrency = 1 This may be done prior to upgrading Swift. UpgradeImpact Closes-Bug: #1637133 Related-Change: I128374d74a4cef7a479b221fd15eec785cc4694a Change-Id: I567949567ecdbd94fa06d1dd5d3cdab0d97207b6 |
||
|
Jenkins
|
ae24c802a9 | Merge "Set owner of drive-audit recon cache to swift user" | ||
|
Ondřej Nový
|
99a13d9386 |
Fixed rysnc -> rsync typo
Change-Id: I671b4206072c6e22f4ae38033502336ec32e86ad |
||
|
Ondřej Nový
|
9847796f01 |
Set owner of drive-audit recon cache to swift user
Fixies this problem: * swift-drive-audit needs to be run by root, because only root have "umount" permission * swift-object servers typically runs as user swift * if swift-drive-audit is run by root, /var/cache/swift/drive.recon is owned by root, with 0o600 * recon middleware (inside swift-object-server) can't read this cache file: swift-object: Error reading recon cache file This patch adds "user" option to drive-audit config file. Recon cache is chowned to this user. Change-Id: Ibf20543ee690b7c5a37fabd1540fd5c0c7b638c9 |
||
|
Pete Zaitcev
|
f62df7b80c |
Add a configurable URL base to staticweb
This came to light because someone ran Tempest against a standard installation of RDO, which helpfuly terminates SSL for Swift in a pre-configured load-balancer. In such a case, staticweb has no way to know what scheme to use and guesses wrong, causing Tempest to fail. Related upstream bug: https://bugs.launchpad.net/mos/+bug/1537071 Change-Id: Ie15cf2aff4f7e6bcf68b67ae733c77bb9353587a Closes-Bug: 1572011 |
||
|
Jenkins
|
23c2d69ee1 | Merge "Add more comment to authtoken sample options" | ||
|
gecong1973
|
a09e42732a |
Fix a typo in proxy-server.conf-sample
TrivialFix Change-Id: If650e25979a9488c93fe93621c905003946c27e5 |