89eced960c5bf5c2e14b6245c70b615dc23d45a6
516 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Tim Burke
|
d748851766 |
s3token: Add note about config change when upgrading from swift3
Change-Id: I2610cbdc9b7bc2b4d614eaedb4f3369d7a424ab3 |
||
|
Clay Gerrard
|
ea8e545a27 |
Rebuild frags for unmounted disks
Change the behavior of the EC reconstructor to perform a fragment rebuild to a handoff node when a primary peer responds with 507 to the REPLICATE request. Each primary node in a EC ring will sync with exactly three primary peers, in addition to the left & right nodes we now select a third node from the far side of the ring. If any of these partners respond unmounted the reconstructor will rebuild it's fragments to a handoff node with the appropriate index. To prevent ssync (which is uninterruptible) receiving a 409 (Conflict) we must give the remote handoff node the correct backend_index for the fragments it will recieve. In the common case we will use determistically different handoffs for each fragment index to prevent multiple unmounted primary disks from forcing a single handoff node to hold more than one rebuilt fragment. Handoff nodes will continue to attempt to revert rebuilt handoff fragments to the appropriate primary until it is remounted or rebalanced. After a rebalance of EC rings (potentially removing unmounted/failed devices), it's most IO efficient to run in handoffs_only mode to avoid unnecessary rebuilds. Closes-Bug: #1510342 Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec |
||
|
Zuul
|
3043c54f28 | Merge "s3api: Allow concurrent multi-deletes" | ||
|
Tim Burke
|
00be3f595e |
s3api: Allow concurrent multi-deletes
Previously, a thousand-item multi-delete request would consider each object to delete serially, and not start trying to delete one until the previous was deleted (or hit an error). Now, allow operators to configure a concurrency factor to allow multiple deletes at the same time. Default the concurrency to 2, like we did for slo and bulk. See also: http://lists.openstack.org/pipermail/openstack-dev/2016-May/095737.html Change-Id: If235931635094b7251e147d79c8b7daa10cdcb3d Related-Change: I128374d74a4cef7a479b221fd15eec785cc4694a |
||
|
Tim Burke
|
692a03473f |
s3api: Change default location to us-east-1
This is more likely to be the default region that a client would try for v4 signatures. UpgradeImpact: ============== Deployers with clusters that relied on the old implicit default location of US should explicitly set location = US in the [filter:s3api] section of proxy-server.conf before upgrading. Change-Id: Ib6659a7ad2bd58d711002125e7820f6e86383be8 |
||
|
Clay Gerrard
|
06cf5d298f |
Add databases_per_second to db daemons
Most daemons have a "go as fast as you can then sleep for 30 seconds" strategy towards resource utilization; the object-updater and object-auditor however have some "X_per_second" options that allow operators much better control over how they spend their I/O budget. This change extends that pattern into the account-replicator, container-replicator, and container-sharder which have been known to peg CPUs when they're not IO limited. Partial-Bug: #1784753 Change-Id: Ib7f2497794fa2f384a1a6ab500b657c624426384 |
||
|
Zuul
|
5cc4a72c76 | Merge "Configure diskfile per storage policy" | ||
|
Alistair Coles
|
904e7c97f1 |
Add more doc and test for cors_expose_headers option
In follow-up to the related change, mention the new cors_expose_headers option (and other proxy-server.conf options) in the CORS doc. Add a test for the cors options being loaded into the proxy server. Improve CORS comments in docs. Change-Id: I647d8f9e9cbd98de05443638628414b1e87d1a76 Related-Change: I5ca90a052f27c98a514a96ee2299bfa1b6d46334 |
||
|
Zuul
|
5d46c0d8b3 | Merge "Adding keep_idle config value to socket" | ||
|
FatemaKhalid
|
cfeb32c66b |
Adding keep_idle config value to socket
User can cofigure KEEPIDLE time for sockets in TCP connection. The default value is the old value which is 600. Change-Id: Ib7fb166deb8a87ae4e97ba0671048b1ec079a2ef Closes-Bug:1759606 |
||
|
Tim Burke
|
5a8cfd6e06 |
Add another user for s3api func tests
Previously we'd use two users, one admin and one unprivileged. Ceph's s3-tests, however, assume that both users should have access to create buckets. Further, there are different errors that may be returned depending on whether you are the *bucket* owner or not when using s3_acl. So now we've got: test:tester1 (admin) test:tester2 (also admin) test:tester3 (unprivileged) Change-Id: I0b67c53de3bcadc2c656d86131fca5f2c3114f14 |
||
|
Romain LE DISEZ
|
673fda7620 |
Configure diskfile per storage policy
With this commit, each storage policy can define the diskfile to use to access objects. Selection of the diskfile is done in swift.conf. Example: [storage-policy:0] name = gold policy_type = replication default = yes diskfile = egg:swift#replication.fs The diskfile configuration item accepts the same format than middlewares declaration: [[scheme:]egg_name#]entry_point The egg_name is optional and default to "swift". The scheme is optional and default to the only valid value "egg". The upstream entry points are "replication.fs" and "erasure_coding.fs". Co-Authored-By: Alexandre Lécuyer <alexandre.lecuyer@corp.ovh.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I070c21bc1eaf1c71ac0652cec9e813cadcc14851 |
||
|
Alistair Coles
|
2722e49a8c |
Add support for multiple root encryption secrets
For some use cases operators would like to periodically introduce a new encryption root secret that would be used when new object data is written. However, existing encrypted data does not need to be re-encrypted with keys derived from the new root secret. Older root secret(s) would still be used as necessary to decrypt older object data. This patch modifies the KeyMaster class to support multiple root secrets indexed via unique secret_id's, and to store the id of the root secret used for an encryption operation in the crypto meta. The decrypter is modified to fetch appropriate keys based on the secret id in retrieved crypto meta. The changes are backwards compatible with previous crypto middleware configurations and existing encrypted object data. Change-Id: I40307acf39b6c1cc9921f711a8da55d03924d232 |
||
|
Zuul
|
00373dad61 | Merge "Add keymaster to fetch root secret from KMIP service" | ||
|
Samuel Merritt
|
8e651a2d3d |
Add fallocate_reserve to account and container servers.
The object server can be configured to leave a certain amount of disk space free; default is 1%. This is useful in avoiding 100%-full filesystems, as those can get Swift in a state where the filesystem is too full to write tombstones, so you can't delete objects to free up space. When a cluster has accounts/containers and objects on the same disks, then you can wind up with a 100%-full disk since account and container servers don't respect fallocate_reserve. This commit makes account and container servers respect fallocate_reserve so that disks shared between account/container and object rings won't get 100% full. When a disk's free space falls below the configured reserve, account and container PUT, POST, and REPLICATE requests will fail with a 507 status code. These are the operations that can significantly increase the disk space used by a given database. I called the parameter "fallocate_reserve" for consistency with the object server. No actual fallocate() call happens under Swift's control in the account or container servers (sqlite3 might make such a call, but it's out of our hands). Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0 |
||
|
Alistair Coles
|
1951dc7e9a |
Add keymaster to fetch root secret from KMIP service
Add a new middleware that can be used to fetch an encryption root secret from a KMIP service. The middleware uses a PyKMIP client to interact with a KMIP endpoint. The middleware is configured with a unique identifier for the key to be fetched and options required for the PyKMIP client. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: Ib0943fb934b347060fc66c091673a33bcfac0a6d |
||
|
Zuul
|
ea33638d0c | Merge "object-updater: add concurrent updates" | ||
|
Samuel Merritt
|
d5c532a94e |
object-updater: add concurrent updates
The object updater now supports two configuration settings: "concurrency" and "updater_workers". The latter controls how many worker processes are spawned, while the former controls how many concurrent container updates are performed by each worker process. This should speed the processing of async_pendings. There is a change to the semantics of the configuration options. Previously, "concurrency" controlled the number of worker processes spawned, and "updater_workers" did not exist. I switched the meanings for consistency with other configuration options. In the object reconstructor, object replicator, object server, object expirer, container replicator, container server, account replicator, account server, and account reaper, "concurrency" refers to the number of concurrent tasks performed within one process (for reference, the container updater and object auditor use "concurrency" to mean number of processes). On upgrade, a node configured with concurrency=N will still handle async updates N-at-a-time, but will do so using only one process instead of N. UpgradeImpact: If you have a config file like this: [object-updater] concurrency = <N> and you want to take advantage of faster updates, then do this: [object-updater] concurrency = 8 # the default; you can omit this line updater_workers = <N> If you want updates to be processed exactly as before, do this: [object-updater] concurrency = 1 updater_workers = <N> Change-Id: I17e18088e61f664e1b9942d66423666d0cae1689 |
||
|
Zuul
|
c01c43d982 | Merge "Adds read_only middleware" | ||
|
Greg Lange
|
5d601b78f3 |
Adds read_only middleware
This patch adds a read_only middleware to swift. It gives the ability to make an entire cluster or individual accounts read only. When a cluster or an account is in read only mode, requests that would result in writes to the cluser are not allowed. DocImpact Change-Id: I7e0743aecd60b171bbcefcc8b6e1f3fd4cef2478 |
||
|
Thiago da Silva
|
36dbd38e48 |
Add s3api headers to allowed_headers by default
Previously, these headers had to be added by operators to their object-server.conf when enabling swift3 middleware. Since s3api is now imported into swift we should go ahead and add these headers by default too. Change-Id: Ib82e175096716e42aecdab48f01f079e09da6a1d Signed-off-by: Thiago da Silva <thiago@redhat.com> |
||
|
Darrell Bishop
|
661838d968 |
Add support for PROXY protocol v1 (only)
...to the proxy-server. The point is to allow the Swift proxy server to log accurate client IP addresses when there is a proxy or SSL-terminator between the client and the Swift proxy server. Example servers supporting this PROXY protocol: stud (v1 only) stunnel haproxy hitch (v2 only) varnish See http://www.haproxy.org/download/1.7/doc/proxy-protocol.txt The feature is enabled by adding this to your proxy config file: [app:proxy-server] use = egg:swift#proxy ... require_proxy_protocol = true The protocol specification states: The receiver MUST be configured to only receive the protocol described in this specification and MUST not try to guess whether the protocol header is present or not. so valid deployments are: 1) require_proxy_protocol = false (or missing; default is false) and NOT behind a proxy that adds or proxies existing PROXY lines. 2) require_proxy_protocol = true and IS behind a proxy that adds or proxies existing PROXY lines. Specifically, in the default configuration, one cannot send the swift proxy PROXY lines (no change from before this patch). When this feature is enabled, one _must_ send PROXY lines. Change-Id: Icb88902f0a89b8d980c860be032d5e822845d03a |
||
|
Matthew Oliver
|
2641814010 |
Add sharder daemon, manage_shard_ranges tool and probe tests
The sharder daemon visits container dbs and when necessary executes the sharding workflow on the db. The workflow is, in overview: - perform an audit of the container for sharding purposes. - move any misplaced objects that do not belong in the container to their correct shard. - move shard ranges from FOUND state to CREATED state by creating shard containers. - move shard ranges from CREATED to CLEAVED state by cleaving objects to shard dbs and replicating those dbs. By default this is done in batches of 2 shard ranges per visit. Additionally, when the auto_shard option is True (NOT yet recommeneded in production), the sharder will identify shard ranges for containers that have exceeded the threshold for sharding, and will also manage the sharding and shrinking of shard containers. The manage_shard_ranges tool provides a means to manually identify shard ranges and merge them to a container in order to trigger sharding. This is currently the recommended way to shard a container. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Change-Id: I7f192209d4d5580f5a0aa6838f9f04e436cf6b1f |
||
|
Zuul
|
3313392462 | Merge "Import swift3 into swift repo as s3api middleware" | ||
|
Kota Tsuyuzaki
|
636b922f3b |
Import swift3 into swift repo as s3api middleware
This attempts to import openstack/swift3 package into swift upstream repository, namespace. This is almost simple porting except following items. 1. Rename swift3 namespace to swift.common.middleware.s3api 1.1 Rename also some conflicted class names (e.g. Request/Response) 2. Port unittests to test/unit/s3api dir to be able to run on the gate. 3. Port functests to test/functional/s3api and setup in-process testing 4. Port docs to doc dir, then address the namespace change. 5. Use get_logger() instead of global logger instance 6. Avoid global conf instance Ex. fix various minor issue on those steps (e.g. packages, dependencies, deprecated things) The details and patch references in the work on feature/s3api are listed at https://trello.com/b/ZloaZ23t/s3api (completed board) Note that, because this is just a porting, no new feature is developed since the last swift3 release, and in the future work, Swift upstream may continue to work on remaining items for further improvements and the best compatibility of Amazon S3. Please read the new docs for your deployment and keep track to know what would be changed in the future releases. Change-Id: Ib803ea89cfee9a53c429606149159dd136c036fd Co-Authored-By: Thiago da Silva <thiago@redhat.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> |
||
|
Zuul
|
47efb5b969 | Merge "Multiprocess object replicator" | ||
|
Samuel Merritt
|
c28004deb0 |
Multiprocess object replicator
Add a multiprocess mode to the object replicator. Setting the "replicator_workers" setting to a positive value N will result in the replicator using up to N worker processes to perform replication tasks. At most one worker per disk will be spawned, so one can set replicator_workers=99999999 to always get one worker per disk regardless of the number of disks in each node. This is the same behavior that the object reconstructor has. Worker process logs will have a bit of information prepended so operators can tell which messages came from which worker. It looks like this: [worker 1/2 pid=16529] 154/154 (100.00%) partitions replicated in 1.02s (150.87/sec, 0s remaining) The prefix is "[worker M/N pid=P] ", where M is the worker's index, N is the total number of workers, and P is the process ID. Every message from the replicator's logger will have the prefix; this includes messages from down in diskfile, but does not include things printed to stdout or stderr. Drive-by fix: don't dump recon stats when replicating only certain policies. When running the object replicator with replicator_workers > 0 and "--policies=X,Y,Z", the replicator would update recon stats after running. Since it only ran on a subset of objects, it should not update recon, much like it doesn't update recon when run with --devices or --partitions. Change-Id: I6802a9ad9f1f9b9dafb99d8b095af0fdbf174dc5 |
||
|
wangqi
|
708b24aef1 |
Deprecate auth_uri option
Option auth_uri from group keystone_authtoken is deprecated[1]. Use option www_authenticate_uri from group keystone_authtoken. [1]https://review.openstack.org/#/c/508522/ Change-Id: I43bbc8b8c986e54a9a0829a0631d78d4077306f8 |
||
|
melissaml
|
3bc267d10c |
fix a typo in documentation
Change-Id: I0492ae1d50493585ead919904d6d9502b7738266 |
||
|
Samuel Merritt
|
47fed6f2f9 |
Add handoffs-only mode to DB replicators.
The object reconstructor has a handoffs-only mode that is very useful when a cluster requires rapid rebalancing, like when disks are nearing fullness. This mode's goal is to remove handoff partitions from disks without spending effort on primary partitions. The object replicator has a similar mode, though it varies in some details. This commit adds a handoffs-only mode to the account and container replicators. Change-Id: I588b151ee65ae49d204bd6bf58555504c15edf9f Closes-Bug: 1668399 |
||
|
Zuul
|
82844a3211 | Merge "Add support for data segments to SLO and SegmentedIterable" | ||
|
Zuul
|
bf172e2936 | Merge "tempurl: Make the digest algorithm configurable" | ||
|
Tim Burke
|
5a4d3bdfc4 |
tempurl: Make the digest algorithm configurable
... and add support for SHA-256 and SHA-512 by default. This allows us to start moving toward replacing SHA-1-based signatures. We've known this would eventually be necessary for a while [1], and earlier this year we've seen SHA-1 collisions [2]. Additionally, allow signatures to be base64-encoded, provided they start with a digest name followed by a colon. Trailing padding is optional for base64-encoded signatures, and both normal and "url-safe" modes are supported. For example, all of the following SHA-1 signatures are equivalent: da39a3ee5e6b4b0d3255bfef95601890afd80709 sha1:2jmj7l5rSw0yVb/vlWAYkK/YBwk= sha1:2jmj7l5rSw0yVb/vlWAYkK/YBwk sha1:2jmj7l5rSw0yVb_vlWAYkK_YBwk= sha1:2jmj7l5rSw0yVb_vlWAYkK_YBwk (Note that "normal" base64 encodings will require that you url encode all "+" characters as "%2B" so they aren't misinterpretted as spaces.) This was done for two reasons: 1. A hex-encoded SHA-512 is rather lengthy at 128 characters -- 88 isn't *that* much better, but it's something. 2. This will allow us to more-easily add support for different digests with the same bit length in the future. Base64-encoding is required for SHA-512 signatures; hex-encoding is supported for SHA-256 signatures so we aren't needlessly breaking from what Rackspace is doing. [1] https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html [2] https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html Change-Id: Ia9dd1a91cc3c9c946f5f029cdefc9e66bcf01046 Related-Bug: #1733634 |
||
|
Joel Wright
|
11bf9e4588 |
Add support for data segments to SLO and SegmentedIterable
This patch updates the SLO middleware and SegmentedIterable to add support for user-specified inlined-data segments. Such segments will contain base64-encoded data to be added before/after an object-backed segment within an SLO. To accommodate the potential extra data we increase the default SLO maximum manifest size from 2MiB to 8MiB. The default maximum number of segments remains 1000, but this will only be enforced for object-backed segments. This patch is a prerequisite for a future patch enabling the download of large objects as tarballs. The TLO patch will be added as a dependent patch later. UpgradeImpact ============= During a rolling upgrade, an updated proxy may write a manifest that out-of-date proxies will not be able to read. This will resolve itself once the upgrade completes on all nodes. Change-Id: Ib8dc216a84d370e6da7d6b819af79582b671d699 |
||
|
Zuul
|
9ae5de09e8 | Merge "fix barbican integration" | ||
|
Zuul
|
17eb570a6c | Merge "Improve object-updater's stats logging" | ||
|
Samuel Merritt
|
f64c00b00a |
Improve object-updater's stats logging
The object updater has five different stats, but its logging only told you two of them (successes and failures), and it only told you after finishing all the async_pendings for a device. If you have a cluster that's been sick and has millions upon millions of async_pendings laying around, then your object-updaters are frustratingly silent. I've seen one cluster with around 8 million async_pendings per disk where the object-updaters only emitted stats every 12 hours. Yes, if you have StatsD logging set up properly, you can go look at your graphs and get real-time feedback on what it's doing. If you don't have that, all you get is a frustrating silence. Now, the object updater tells you all of its stats (successes, failures, quarantines due to bad pickles, unlinks, and errors), and it tells you incremental progress every five minutes. The logging at the end of a pass remains and has been expanded to also include all stats. Also included is a small change to what counts as an error: unmounted drives no longer do. The goal is that only abnormal things count as errors, like permission problems, malformed filenames, and so on. These are things that should never happen, but if they do, may require operator intervention. Drives fail, so logging an error upon encountering an unmounted drive is not useful. Change-Id: Idbddd507f0b633d14dffb7a9834fce93a10359ab |
||
|
Alistair Coles
|
6e394bba0a |
Add request_tries option to object-expirer.conf-sample
...and update the object-expirer man page. Change-Id: Idca1b8e3b7d5b40481af0d60477510e2557b88c0 |
||
|
Thiago da Silva
|
a9964a7fc3 |
fix barbican integration
Added auth_url to the context we pass to castellan library. In a change [1] intended to deprecate the use of auth_endpoint passed as the oslo config, it actually completely removed the use of it[2], so this change became necessary or the integration is broken. [1] - https://review.openstack.org/#/c/483457 [2] - https://review.openstack.org/#/c/483457/6/castellan/key_manager/barbican_key_manager.py@143 Change-Id: I933367fa46aa0a3dc9aedf078b1be715bfa8c054 |
||
|
Zuul
|
61c770a6d9 | Merge "add symlink to container sync default and sample config" | ||
|
Clay Gerrard
|
90134ee968 |
add symlink to container sync default and sample config
Change-Id: I83e51daa994b9527eccbb8a88952c630aacd39c3 |
||
|
Alistair Coles
|
8df263184b |
Symlink doc clean up
Cleanup for docs and docstrings. Related-Change: I838ed71bacb3e33916db8dd42c7880d5bb9f8e18 Change-Id: Ie8de0565dfaca5bd8a5693a75e6ee14ded5b7161 |
||
|
Robert Francis
|
99b89aea10 |
Symlink implementation.
Add a symbolic link ("symlink") object support to Swift. This
object will reference another object. GET and HEAD
requests for a symlink object will operate on the referenced object.
DELETE and PUT requests for a symlink object will operate on the
symlink object, not the referenced object, and will delete or
overwrite it, respectively.
POST requests are *not* forwarded to the referenced object and should
be sent directly. POST requests sent to a symlink object will
result in a 307 Error.
Historical information on symlink design can be found here:
https://github.com/openstack/swift-specs/blob/master/specs/in_progress/symlinks.rst.
https://etherpad.openstack.org/p/swift_symlinks
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: Janie Richling <jrichli@us.ibm.com>
Co-Authored-By: Kazuhiro MIYAHARA <miyahara.kazuhiro@lab.ntt.co.jp>
Co-Authored-By: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Change-Id: I838ed71bacb3e33916db8dd42c7880d5bb9f8e18
Signed-off-by: Thiago da Silva <thiago@redhat.com>
|
||
|
Thiago da Silva
|
b86bf15a64 |
remove left-over old doc in config file
Removing the rest of fast post documentation from proxy configuration file. Change-Id: I6108be20606f256a020a8583878e2302e3c98ff1 |
||
|
Zuul
|
2596b3ca9d | Merge "Let clients request heartbeats during SLO PUTs" | ||
|
Tim Burke
|
77a8a4455d |
Let clients request heartbeats during SLO PUTs
An SLO PUT requires that we HEAD every referenced object; as a result, it can be a very time-intensive operation. This makes it difficult as a client to differentiate between a proxy-server that's still doing work and one that's crashed but left the socket open. Now, clients can opt-in to receiving heartbeats during long-running PUTs by including the query parameter heartbeat=on With heartbeating turned on, the proxy will start its response immediately with 202 Accepted then send a single whitespace character periodically until the request completes. At that point, a final summary chunk will be sent which includes a "Response Status" key indicating success or failure and (if successful) an "Etag" key indicating the Etag of the resulting SLO. This mechanism is very similar to the way bulk extractions and deletions work, and even the way SLO behaves for ?multipart-manifest=delete requests. Note that this is opt-in: this prevents us from sending the 202 response to existing clients that may mis-interpret it as an immediate indication of success. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Related-Bug: 1718811 Change-Id: I65cee5f629c87364e188aa05a06d563c3849c8f3 |
||
|
Romain LE DISEZ
|
e199192cae |
Replace replication_one_per_device by custom count
This commit replaces boolean replication_one_per_device by an integer replication_concurrency_per_device. The new configuration parameter is passed to utils.lock_path() which now accept as an argument a limit for the number of locks that can be acquired for a specific path. Instead of trying to lock path/.lock, utils.lock_path() now tries to lock files path/.lock-X, where X is in the range (0, N), N being the limit for the number of locks allowed for the path. The default value of limit is set to 1. Change-Id: I3c3193344c7a57a8a4fc7932d1b10e702efd3572 |
||
|
Clay Gerrard
|
45ca39fc68 |
add mangle_client_paths to example config
Change-Id: Ic1126fc95e8152025fccf25356c253facce3e3ec |
||
|
Jenkins
|
94241a81c0 | Merge "Improve domain_remap docs" | ||
|
Jenkins
|
ec3fffabfa | Merge "Add account_autocreate=true to internal-client.conf-sample" |