94241a81c0ed164f618c33770eafc535a1a6cb3d
Commit Graph

468 Commits

Author SHA1 Message Date
Jenkins
94241a81c0 Merge "Improve domain_remap docs" 2017年10月09日 21:34:31 +00:00
Jenkins
ec3fffabfa Merge "Add account_autocreate=true to internal-client.conf-sample" 2017年10月05日 13:17:56 +00:00
Alistair Coles
93fc9d2de8 Add cautionary note re delay_reaping in account-server.conf-sample
Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
Related-Bug: #1514528 
2017年09月27日 22:52:47 +01:00
Alistair Coles
c662e5fc8e Add account_autocreate=true to internal-client.conf-sample
Closes-Bug: #1698426
Change-Id: I8a29a685bb12e60f4da4a0dc8270b408241ec415
2017年09月26日 11:52:10 +01:00
Jenkins
23de16b0bf Merge "Move listing formatting out to proxy middleware" 2017年09月20日 01:15:28 +00:00
Jenkins
3fda32470b Merge "Remove all post_as_copy related code and configes" 2017年09月19日 19:19:00 +00:00
Kota Tsuyuzaki
1e79f828ad Remove all post_as_copy related code and configes
It was deprecated and we discussed on this topic in Denver PTG
for Queen cycle. Main motivation for this work is that deprecated
post_as_copy option and its gate blocks future symlink work.
Change-Id: I411893db1565864ed5beb6ae75c38b982a574476
2017年09月16日 05:50:41 +00:00
Tim Burke
4806434cb0 Move listing formatting out to proxy middleware
Make some json -> (text, xml) stuff in a common module, reference that in
account/container servers so we don't break existing clients (including
out-of-date proxies), but have the proxy controllers always force a json
listing.
This simplifies operations on listings (such as the ones already happening in
decrypter, or the ones planned for symlink and sharding) by only needing to
consider a single response type.
There is a downside of larger backend requests for text/plain listings, but
it seems like a net win?
Change-Id: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
2017年09月15日 06:38:26 +00:00
junboli
df00122e74 doc migration: update the doc link address[2/3]
Update the doc link brought by the doc migration.
Although we had some effort to fix these, it still left lots of bad
doc link, I separate these changes into 3 patches aim to fix all of
these, this is the 2st patch for doc/manpages.
Change-Id: Id426c5dd45a812ef801042834c93701bb6e63a05
2017年09月15日 06:31:00 +00:00
Jenkins
b0142d0cd2 Merge "Retrieve encryption root secret from Barbican" 2017年08月21日 21:19:09 +00:00
Alistair Coles
9c3c388091 Improve domain_remap docs
* Make the conditions for remapping clearer
* Mention the path_root
* Mention '-' -> '_' replacement in account names
* Make example consistent with default options
Change-Id: Ifd3f3775bb8b13367d964010f35813018b5b41b3
2017年08月18日 09:23:45 +01:00
shangxiaobj
c93c0c0c6e [Trivialfix]Fix typos in swift
Fix typos that found in swift.
Change-Id: I52fad1a4882cec4456f22174b46d54e42ec66d97
2017年08月04日 07:50:10 +00:00
Mathias Bjoerkqvist
77bd74da09 Retrieve encryption root secret from Barbican
This patch adds support for retrieving the encryption root secret from
an external key management system. In practice, this is currently
limited to Barbican.
Change-Id: I1700e997f4ae6fa1a7e68be6b97539a24046e80b
2017年08月02日 15:53:09 +03:00
Jenkins
019515e153 Merge "Update and optimize documentation links" 2017年08月01日 21:27:18 +00:00
junboli
0d07e3fdb1 Update and optimize documentation links
* Update URLs according to document migration
 * Update the dead and outdated links
Change-Id: Id92552f4a2d0fb79ddefc55a08636f2e7aeb07cb
2017年08月01日 15:12:00 +01:00
Clay Gerrard
701a172afa Add multiple worker processes strategy to reconstructor
This change adds a new Strategy concept to the daemon module similar to
how we manage WSGI workers. We need to leverage multiple python
processes to get the concurrency properties we need. More workers will
rebalance much faster on dense chassis with many devices.
Currently the default is still only one process, and no workers. Set
reconstructor_workers in the [object-reconstructor] section to some
whole number <= the number of devices on a node to get that many
reconstructor workers.
Each worker will operate on a different subset of disks.
Once mode works as before, but tends to want to update recon drops a
little bit more.
If you change the rings, the strategy will shutdown workers and spawn
new ones.
You can kill the worker pids and the daemon strategy will respawn them.
New per-disk reconstructor stats are dumped to recon under the
object_reconstruction_per_disk key. To maintain legacy compatibility
and replication monitoring based on cycle times they are aggregated
every stats_interval (default 5 mins).
Change-Id: I28925a37f3985c9082b5a06e76af4dc3ec813abe
2017年07月26日 16:55:10 -07:00
Alistair Coles
9c5628b4f1 Add reconstructor section to deployment guide
Change-Id: I062998e813718828b7adf4e7c3f877b6a31633c0
Closes-Bug: #1626290 
2017年07月20日 11:40:17 +01:00
Jenkins
c3f6e82ae1 Merge "Write-affinity aware object deletion" 2017年07月06日 14:00:05 +00:00
Jenkins
f1e1dbb80a Merge "Make eventlet.tpool's thread count configurable in object server" 2017年07月04日 11:49:24 +00:00
Lingxian Kong
831eb6e3ce Write-affinity aware object deletion
When deleting objects in multi-region swift delpoyment with write
affinity configured, users always get 404 when deleting object before
it's replcated to approriate nodes.
This patch adds a config item 'write_affinity_handoff_delete_count' so
that operator could define how many local handoff nodes should swift
send request to get more candidates for the final response, or by
default just leave it to swift to calculate the appropriate number.
Change-Id: Ic4ef82e4fc1a91c85bdbc6bf41705a76f16d1341
Closes-Bug: #1503161 
2017年06月27日 22:42:02 +12:00
Samuel Merritt
d9c4913e3b Make eventlet.tpool's thread count configurable in object server
If you're running servers_per_port > 0 and threads_per_disk = 0 (as it
should be with servers_per_port on), each object-server process will
have 20 IO threads waiting around to service eventlet.tpool
calls. This is far too many; with servers_per_port, there's no real
benefit to having so many IO threads.
This commit makes it so that, when servers_per_port > 0, each object
server defaults to having one main thread and one IO thread.
Also, eventlet's tpool size is now configurable via the object-server
config file. If a tpool size is set, that's what we'll use regardless
of servers_per_port. This allows operators with an excess of threads
to remove some regardless of servers_per_port.
Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027
Closes-Bug: 1554233
2017年06月23日 16:16:03 +10:00
Jenkins
2d18ecdf4b Merge "Replace slowdown option with *_per_second option" 2017年06月22日 01:18:26 +00:00
Jenkins
169d1d8ab8 Merge "Require that known-bad EC schemes be deprecated" 2017年06月22日 01:11:03 +00:00
Ondřej Nový
a8bc94c7e3 Replace slowdown option with *_per_second option
container and object updaters sleeps "slowdown" (default 0.01) seconds
after every processed container/object. Because time.sleep call adds overhead,
use ratelimit_sleep from common.utils instead. Same as in auditor.
Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212
2017年06月16日 19:22:00 +00:00
Tim Burke
2c3ac543f4 Require that known-bad EC schemes be deprecated
We said we were going to do it, we've had two releases saying we'd do
it, we've even backported our saying it to Newton -- let's actually do
it.
Upgrade Consideration
=====================
Erasure-coded storage policies using isa_l_rs_vand and nparity >= 5 must
be configured as deprecated, preventing any new containers from being
created with such a policy. This configuration is known to harm data
durability. Any data in such policies should be migrated to a new
policy. See https://bugs.launchpad.net/swift/+bug/1639691 for more
information.
UpgradeImpact
Related-Change: I50159c9d19f2385d5f60112e9aaefa1a68098313
Change-Id: I8f9de0bec01032d9d9b58848e2a76ac92e65ab09
Closes-Bug: 1639691
2017年06月16日 17:58:43 +00:00
Jenkins
7bbe02b290 Merge "Allow to configure the nameservers in cname_lookup" 2017年06月12日 19:48:45 +00:00
Jenkins
e11eb88ad9 Merge "Remove deprecated vm_test_mode option" 2017年06月06日 05:42:14 +00:00
Romain LE DISEZ
420e73fabd Allow to configure the nameservers in cname_lookup
For various reasons, an operator might want to use specifics nameservers
instead of the systems ones to resolve CNAME in cname_lookup. This patch
creates a new configuration variable nameservers which accepts a list of
nameservers separated by commas. If not specified or empty, systems
namservers are used as previously.
Co-Authored-By: Tim Burke <tim.burke@gmail.com>
Change-Id: I34219e6ab7e45678c1a80ff76a1ac0730c64ddde
2017年06月01日 14:02:08 -07:00
Tim Burke
5ecf828b17 Follow-up for per-policy proxy configs
* Only use one StringIO in ConfigString
* Rename the write_affinity_node_count function to be
 write_affinity_node_count_fn
* Use comprehensions instead of six.moves.filter
* Rename OverrideConf to ProxyOverrideOptions
* Make ProxyOverrideOptions's __repr__ eval()able
* Various conf -> options renames
* Stop trying to handle a KeyError that should never come up
* Be explicit about how deep we need to copy in proxy/test_server.py
* Drop an unused return value
* Add a test for a non-"proxy-server" app name
* Combine bad-section-name tests
* Try to clean up (at least a little) a self-described "hokey test"
Related-Change: I3f718f425f525baa80045ba067950c752bcaaefc
Change-Id: I4e81175d5445049bc1f48b3ac02c5bc0f77e6f59
2017年06月01日 20:36:19 +00:00
Tim Burke
675145ef4a Remove deprecated vm_test_mode option
This was deprecated in the 2.5.0 release (i.e. Liberty cycle), and we've
been warning about it ever since. A year and a half seems like a long
enough time.
Change-Id: I5688e8f7dedb534071e67d799252bf0b2ccdd9b6
Related-Change: Iad91df50dadbe96c921181797799b4444323ce2e
2017年05月25日 13:02:42 -07:00
Alistair Coles
45884c1102 Enable per policy proxy config options
This is an alternative approach to that proposed in [1]
Adds support for optional per-policy config sections
to be added in proxy-server.conf. This is highly desirable
to allow per-policy affinity options to be set for use with
duplicated EC policies [2] and composite rings [3].
Certain options found in per-policy conf sections will
override their equivalents that may be set in the
[app:proxy-server] section. Currently the options
handled that way are:
 sorting_method
 read_affinity
 write_affinity
 write_affinity_node_count
For example:
 [proxy-server:policy:0]
 sorting_method = affinity
 read_affinity = r1=100
 write_affinity = r1
 write_affinity_node_count = 1 * replicas
The corresponding attributes of the proxy-server Application
are now available from instances of an OverrideConf object
that is obtained from Application.get_policy_options(policy).
[1] Related-Change: I9104fc789ba85ab3ab5ccd34096125b482821389
[2] Related-Change: Idd155401982a2c48110c30b480966a863f6bd305
[3] Related-Change: I0d8928b55020592f8e75321d1f7678688301d797
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I3f718f425f525baa80045ba067950c752bcaaefc
2017年05月23日 20:22:30 +01:00
Alistair Coles
f02ec4de81 Add read and write affinity options to deployment guide
Add entries for these options in the deployment guide and
make the text in proxy-server.conf-sample and man page
consistent.
Change-Id: I5854ddb3e5864ddbeaf9ac2c930bfafdb47517c3
2017年05月18日 10:42:44 -07:00
lijunbo
21396bc106 keep consistent naming convention of swift and urls
Change-Id: Iddd4f69abf77a5c643ce8b164fc6cfd72c068229
2017年03月23日 02:28:41 +00:00
Jenkins
b43414c905 Merge "Accept storage_domain as a list in domain_remap" 2017年03月16日 12:35:48 +00:00
Jenkins
1e9b8888bf Merge "Enable cluster-wide CORS Expose-Headers setting" 2017年03月13日 19:24:20 +00:00
Romain LE DISEZ
9b47de3095 Enable cluster-wide CORS Expose-Headers setting
An operator proposing a web UX to its customers might want to allow web
browser to access some headers by default (eg: X-Storage-Policy,
 X-Container-Read, ...). This commit adds a new setting to the
proxy-server to allow some headers to be added cluster-wide to the CORS
header Access-Control-Expose-Headers.
Change-Id: I5ca90a052f27c98a514a96ee2299bfa1b6d46334
2017年02月25日 19:00:28 +01:00
Kota Tsuyuzaki
40ba7f6172 EC Fragment Duplication - Foundational Global EC Cluster Support
This patch enables efficent PUT/GET for global distributed cluster[1].
Problem:
Erasure coding has the capability to decrease the amout of actual stored
data less then replicated model. For example, ec_k=6, ec_m=3 parameter
can be 1.5x of the original data which is smaller than 3x replicated.
However, unlike replication, erasure coding requires availability of at
least some ec_k fragments of the total ec_k + ec_m fragments to service
read (e.g. 6 of 9 in the case above). As such, if we stored the
EC object into a swift cluster on 2 geographically distributed data
centers which have the same volume of disks, it is likely the fragments
will be stored evenly (about 4 and 5) so we still need to access a
faraway data center to decode the original object. In addition, if one
of the data centers was lost in a disaster, the stored objects will be
lost forever, and we have to cry a lot. To ensure highly durable
storage, you would think of making *more* parity fragments (e.g.
ec_k=6, ec_m=10), unfortunately this causes *significant* performance
degradation due to the cost of mathmetical caluculation for erasure
coding encode/decode.
How this resolves the problem:
EC Fragment Duplication extends on the initial solution to add *more*
fragments from which to rebuild an object similar to the solution
described above. The difference is making *copies* of encoded fragments.
With experimental results[1][2], employing small ec_k and ec_m shows
enough performance to store/retrieve objects.
On PUT:
- Encode incomming object with small ec_k and ec_m <- faster!
- Make duplicated copies of the encoded fragments. The # of copies
 are determined by 'ec_duplication_factor' in swift.conf
- Store all fragments in Swift Global EC Cluster
The duplicated fragments increase pressure on existing requirements
when decoding objects in service to a read request. All fragments are
stored with their X-Object-Sysmeta-Ec-Frag-Index. In this change, the
X-Object-Sysmeta-Ec-Frag-Index represents the actual fragment index
encoded by PyECLib, there *will* be duplicates. Anytime we must decode
the original object data, we must only consider the ec_k fragments as
unique according to their X-Object-Sysmeta-Ec-Frag-Index. On decode no
duplicate X-Object-Sysmeta-Ec-Frag-Index may be used when decoding an
object, duplicate X-Object-Sysmeta-Ec-Frag-Index should be expected and
avoided if possible.
On GET:
This patch inclues following changes:
- Change GET Path to sort primary nodes grouping as subsets, so that
 each subset will includes unique fragments
- Change Reconstructor to be more aware of possibly duplicate fragments
For example, with this change, a policy could be configured such that
swift.conf:
ec_num_data_fragments = 2
ec_num_parity_fragments = 1
ec_duplication_factor = 2
(object ring must have 6 replicas)
At Object-Server:
node index (from object ring): 0 1 2 3 4 5 <- keep node index for
 reconstruct decision
X-Object-Sysmeta-Ec-Frag-Index: 0 1 2 0 1 2 <- each object keeps actual
 fragment index for
 backend (PyEClib)
Additional improvements to Global EC Cluster Support will require
features such as Composite Rings, and more efficient fragment
rebalance/reconstruction.
1: http://goo.gl/IYiNPk (Swift Design Spec Repository)
2: http://goo.gl/frgj6w (Slide Share for OpenStack Summit Tokyo)
Doc-Impact
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Change-Id: Idd155401982a2c48110c30b480966a863f6bd305
2017年02月22日 10:56:13 -08:00
Romain LE DISEZ
5c93d6f238 Accept storage_domain as a list in domain_remap
Middleware domain_remap can work with cname_lookup middleware. This last
middleware accept that storage_domain is a list of domains. To be
consistent, domain_remap should have the same behavior.
Closes-Bug: #1664647
Change-Id: Iacc6619968cc7c677bf63e0b8d101a20c86ce599
2017年02月18日 10:41:27 +01:00
Clay Gerrard
da557011ec Deprecate broken handoffs_first in favor of handoffs_only
The handoffs_first mode in the replicator has the useful behavior of
processing all handoff parts across all disks until there aren't any
handoffs anymore on the node [1] and then it seemingly tries to drop
back into normal operation. In practice I've only ever heard of
handoffs_first used while rebalancing and turned off as soon as the
rebalance finishes - it's not recommended to run with handoffs_first
mode turned on and it emits a warning on startup if option is enabled.
The handoffs_first mode on the reconstructor doesn't work - it was
prioritizing handoffs *per-part* [2] - which is really unfortunate
because in the reconstructor during a rebalance it's often *much* more
attractive from an efficiency disk/network perspective to revert a
partition from a handoff than it is to rebuild an entire partition from
another primary using the other EC fragments in the cluster.
This change deprecates handoffs_first in favor of handoffs_only in the
reconstructor which is far more useful - and just like handoffs_first
mode in the replicator - it gives the operator the option of forcing the
consistency engine to focus on rebalance. The handoffs_only behavior is
somewhat consistent with the replicator's handoffs_first option (any
error on any handoff in the replicactor will make it essentially handoff
only forever) but the option does what you want and is named correctly
in the reconstructor.
For consistency with the replicator the reconstructor will mostly honor
the handoffs_first option, but if you set handoffs_only in the config it
always takes precedence. Having handoffs_first in your config always
results in a warning, but if handoff_only is not set and handoffs_first
is true the reconstructor will assume you need handoffs_only and behaves
as such.
When running in handoffs_only mode the reconstructor will start to log a
warning every cycle if you leave it running in handoffs_only after it
finishes reverting handoffs. However you should be monitoring on-disk
partitions and disable the option as soon as the cluster finishes the
full rebalance cycle.
1. Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea fixed replicator
handoffs_first "mode"
2. Unlike replication each partition in a EC policy can have a different
kind of job per frag_index, but the cardinality of jobs is typically
only one (either sync or revert) unless there's been a bunch of errors
during write and then handoffs partitions maybe hold a number of
different fragments.
Known-Issues:
handoffs_only is not documented outside of the example config, see lp
bug #1626290
Closes-Bug: #1653018
Change-Id: Idde4b6cf92fab6c45f2c0c2733277701eb436898
2017年02月13日 21:13:29 -08:00
Jenkins
63b351893d Merge "Default object_post_as_copy to False" 2017年01月24日 20:58:34 +00:00
Tim Burke
4ee20dba48 Default object_post_as_copy to False
Additionally, emit deprecation warnings when running POST-as-COPY
Change-Id: I11324e711057f7332577fd38f9bff82bdc6aac90
2017年01月20日 12:37:01 -05:00
Mahati Chamarthy
69f7be99a6 Move documented reclaim_age option to correct location
The reclaim_age is a DiskFile option, it doesn't make sense for two
different object services or nodes to use different values.
I also driveby cleanup the reclaim_age plumbing from get_hashes to
cleanup_ondisk_files since it's a method on the Manager and has access
to the configured reclaim_age. This fixes a bug where finalize_put
wouldn't use the [DEFAULT]/object-server configured reclaim_age - which
is normally benign but leads to weird behavior on DELETE requests with
really small reclaim_age.
There's a couple of places in the replicator and reconstructor that
reach into their manager to borrow the reclaim_age when emptying out
the aborted PUTs that failed to cleanup their files in tmp - but that
timeout doesn't really need to be coupled with reclaim_age and that
method could have just as reasonably been implemented on the Manager.
UpgradeImpact: Previously the reclaim_age was documented to be
configurable in various object-* services config sections, but that did
not work correctly unless you also configured the option for the
object-server because of REPLICATE request rehash cleanup. All object
services must use the same reclaim_age. If you require a non-default
reclaim age it should be set in the [DEFAULT] section. If there are
different non-default values, the greater should be used for all object
services and configured only in the [DEFAULT] section.
If you specify a reclaim_age value in any object related config you
should move it to *only* the [DEFAULT] section before you upgrade. If
you configure a reclaim_age less that your consistency window you are
likely to be eaten by a Grue.
Closes-Bug: #1626296
Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
2017年01月13日 03:10:47 +00:00
Ondřej Nový
ae7dddd801 Added comment for "user" option in drive-audit config
Change-Id: I24362826bee85ac3304e9b63504c9465da673014
2016年11月21日 22:13:11 +01:00
Tim Burke
f850ff065e SLO: Concurrently HEAD segments
Before creating a static large object, we must verify that all of the
referenced segments exist. Previously, this was done sequentially; due
to latency between proxy and object nodes, clients must be careful to
either keep their segment count low or use very long (minute+) timeouts.
We mitigate this somewhat by enforcing a hard limit on segment count,
but even then, HEADing a thousand segments (the default limit) with an
average latency of (say) 100ms will require more than a minute and a
half.
Further, the nested-SLO approach requires multiple requests from the
client -- as a result, Swift3 is in the position of enforcing a lower
limit than S3's 10,000 (which will break some clients) or requiring that
clients have timeouts on the order of 15-20 minutes (!).
Now, we'll perform the segment HEADs in parallel, with a concurrency
factor set by the operator. This is very similar to (and builds upon)
the parallel-bulk-delete work. By default, two HEAD requests will be
allowed at a time.
As a side-effect, we'll also only ever HEAD a path once per manifest.
Previously, if a manifest alternated between two paths repeatedly (for
instance, because the user wanted to splice together various ranges from
two sub-SLOs), then each entry in the manifest would trigger a fresh
HEAD request.
Upgrade Consideration
=====================
If operators would like to preserve the prior (single-threaded) SLO
creation behavior, they must add the following line to their
[filter:slo] proxy config section:
 concurrency = 1
This may be done prior to upgrading Swift.
UpgradeImpact
Closes-Bug: #1637133
Related-Change: I128374d74a4cef7a479b221fd15eec785cc4694a
Change-Id: I567949567ecdbd94fa06d1dd5d3cdab0d97207b6
2016年11月16日 12:12:06 -08:00
Jenkins
ae24c802a9 Merge "Set owner of drive-audit recon cache to swift user" 2016年10月27日 13:26:25 +00:00
Ondřej Nový
99a13d9386 Fixed rysnc -> rsync typo
Change-Id: I671b4206072c6e22f4ae38033502336ec32e86ad
2016年10月19日 20:17:00 +02:00
Ondřej Nový
9847796f01 Set owner of drive-audit recon cache to swift user
Fixies this problem:
* swift-drive-audit needs to be run by root, because only root have
 "umount" permission
* swift-object servers typically runs as user swift
* if swift-drive-audit is run by root, /var/cache/swift/drive.recon is
 owned by root, with 0o600
* recon middleware (inside swift-object-server) can't read this cache
 file: swift-object: Error reading recon cache file
This patch adds "user" option to drive-audit config file. Recon cache
is chowned to this user.
Change-Id: Ibf20543ee690b7c5a37fabd1540fd5c0c7b638c9
2016年10月19日 17:16:42 +00:00
Pete Zaitcev
f62df7b80c Add a configurable URL base to staticweb
This came to light because someone ran Tempest against a standard
installation of RDO, which helpfuly terminates SSL for Swift in
a pre-configured load-balancer. In such a case, staticweb has no
way to know what scheme to use and guesses wrong, causing Tempest
to fail.
Related upstream bug:
 https://bugs.launchpad.net/mos/+bug/1537071
Change-Id: Ie15cf2aff4f7e6bcf68b67ae733c77bb9353587a
Closes-Bug: 1572011
2016年10月03日 21:08:15 -06:00
Jenkins
23c2d69ee1 Merge "Add more comment to authtoken sample options" 2016年09月30日 04:21:18 +00:00
gecong1973
a09e42732a Fix a typo in proxy-server.conf-sample
TrivialFix
Change-Id: If650e25979a9488c93fe93621c905003946c27e5
2016年09月27日 17:14:13 +08:00