7015ac2fdc34bbb46b6be6084fe6a5bd41acc74c
547 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
Zuul
|
7015ac2fdc | Merge "py3: Work with proper native string paths in crypto meta" | ||
|
Zuul
|
a4f2252e2b | Merge "proxy-logging: Be able to configure log_route" | ||
|
Zuul
|
50800aba37 | Merge "Update SAIO & docker image to use 62xx ports" | ||
|
Tim Burke
|
7d429318dd |
py3: Work with proper native string paths in crypto meta
Previously, we would work with these paths as WSGI strings -- this would work fine when all data were read and written on the same major version of Python, but fail pretty badly during and after upgrading Python. In particular, if a py3 proxy-server tried to read existing data that was written down by a py2 proxy-server, it would hit an error and respond 500. Worse, if an un-upgraded py2 proxy tried to read data that was freshly-written by a py3 proxy, it would serve corrupt data back to the client (including a corrupt/invalid ETag and Content-Type). Now, ensure that both py2 and py3 write down paths as native strings. Make an effort to still work with WSGI-string metadata, though it can be ambiguous as to whether a string is a WSGI string or not. The heuristic used is if * the path from metadata does not match the (native-string) request path and * the path from metadata (when interpreted as a WSGI string) can be "un-wsgi-fied" without any encode/decode errors and * the native-string path from metadata *does* match the native-string request path then trust the path from the request. By contrast, we usually prefer the path from metadata in case there was a pipeline misconfiguration (see related bug). Add the ability to read and write a new, unambiguous version of metadata that always has the path as a native string. To support rolling upgrades, a new config option is added: meta_version_to_write. This defaults to 2 to support rolling upgrades without configuration changes, but the default may change to 3 in a future release. UpgradeImpact ============= When upgrading from Swift 2.20.0 or Swift 2.19.1 or earlier, set meta_version_to_write = 1 in your keymaster's configuration. Regardless of prior Swift version, set meta_version_to_write = 3 after upgrading all proxy servers. When switching from Python 2 to Python 3, first upgrade Swift while on Python 2, then upgrade to Python 3. Change-Id: I00c6693c42c1a0220b64d8016d380d5985339658 Closes-Bug: #1888037 Related-Bug: #1813725 |
||
|
Tim Burke
|
9eb81f6e69 |
Allow replication servers to handle all request methods
Previously, the replication_server setting could take one of three states: * If unspecified, the server would handle all available methods. * If "true", "yes", "on", etc. it would only handle replication methods (REPLICATE, SSYNC). * If any other value (including blank), it would only handle non-replication methods. However, because SSYNC tunnels PUTs, POSTs, and DELETEs through the same object-server app that's responding to SSYNC, setting `replication_server = true` would break the protocol. This has been the case ever since ssync was introduced. Now, get rid of that second state -- operators can still set `replication_server = false` as a principle-of-least-privilege guard to ensure proxy-servers can't make replication requests, but replication servers will be able to serve all traffic. This will allow replication servers to be used as general internal-to-the-cluster endpoints, leaving non-replication servers to handle client-driven traffic. Closes-Bug: #1446873 Change-Id: Ica2b41a52d11cb10c94fa8ad780a201318c4fc87 |
||
|
Tim Burke
|
314347a3cb |
Update SAIO & docker image to use 62xx ports
Note that existing SAIOs with 60xx ports should still work fine. Change-Id: If5dd79f926fa51a58b3a732b212b484a7e9f00db Related-Change: Ie1c778b159792c8e259e2a54cb86051686ac9d18 |
||
|
Tim Burke
|
2ffe598f48 |
proxy-logging: Be able to configure log_route
This lets you have separate loggers for the left and right proxy-logging middlewares, so you can have a config like [pipeline:main] pipeline = ... proxy-logging-client ... proxy-logging-subrequest proxy-server [proxy-logging-client] use = egg:swift#proxy_logging access_log_statsd_metric_prefix = client-facing [proxy-logging-subrequest] use = egg:swift#proxy_logging access_log_route = subrequest access_log_statsd_metric_prefix = subrequest to isolate subrequest metrics from client-facing metrics. Change-Id: If41e3d542b30747da7ca289708e9d24873c46e2e |
||
|
Tim Burke
|
1db11df4f2 |
ratelimit: Allow multiple placements
We usually want to have ratelimit fairly far left in the pipeline -- the assumption is that something like an auth check will be fairly expensive and we should try to shield the auth system so it doesn't melt under the load of a misbehaved swift client. But with S3 requests, we can't know the account/container that a request is destined for until *after* auth. Fortunately, we've already got some code to make s3api play well with ratelimit. So, let's have our cake and eat it, too: allow operators to place ratelimit once, before auth, for swift requests and again, after auth, for s3api. They'll both use the same memcached keys (so users can't switch APIs to effectively double their limit), but still only have each S3 request counted against the limit once. Change-Id: If003bb43f39427fe47a0f5a01dbcc19e1b3b67ef |
||
|
Clay Gerrard
|
e264ca88e2 |
Recommend better rsync.conf settings
Swift doesn't recommend any rsync hostname allow/deny rules for inside your cluster network and I've never heard of anyone using it. The reverse lookup on connect (even those denied for max connections) can be overwhelming during a rebalance. Since rsync allows explicit control of the behavior after 3.1 we should suggest operators use it, It's also nominally more efficient in all cases. Possible drawback is maybe in the future a Swift operator has good reason to use host allow/deny rules and don't realize the rsync settings we recommend are mutually exclusive with their customizations. Change-Id: I2fdffdf1cc0a77f994c1d7894d5a1c8e5d951755 |
||
|
John Dickinson
|
d358b9130d |
added value and notes to a sample config file for s3token
Change-Id: I18accffb2cf6ba6a3fff6fd5d95f06a424d1d919 |
||
|
Romain LE DISEZ
|
27fd97cef9 |
Middleware that allows a user to have quoted Etags
Users have complained for a while that Swift's ETags don't match the expected RFC formats. We've resisted fixing this for just as long, worrying that the fix would break innumerable clients that expect the value to be a hex-encoded MD5 digest and *nothing else*. But, users keep asking for it, and some consumers (including some CDNs) break if we *don't* have quoted etags -- so, let's make it an option. With this middleware, Swift users can set metadata per-account or even per-container to explicitly request RFC compliant etags or not. Swift operators also get an option to change the default behavior cluster-wide; it defaults to the old, non-compliant format. See also: - https://tools.ietf.org/html/rfc2616#section-3.11 - https://tools.ietf.org/html/rfc7232#section-2.3 Closes-Bug: 1099087 Closes-Bug: 1424614 Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: I380c6e34949d857158e11eb428b3eda9975d855d |
||
|
Clay Gerrard
|
2759d5d51c |
New Object Versioning mode
This patch adds a new object versioning mode. This new mode provides a new set of APIs for users to interact with older versions of an object. It also changes the naming scheme of older versions and adds a version-id to each object. This new mode is not backwards compatible or interchangeable with the other two modes (i.e., stack and history), especially due to the changes in the namimg scheme of older versions. This new mode will also serve as a foundation for adding S3 versioning compatibility in the s3api middleware. Note that this does not (yet) support using a versioned container as a source in container-sync. Container sync should be enhanced to sync previous versions of objects. Change-Id: Ic7d39ba425ca324eeb4543a2ce8d03428e2225a1 Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Tim Burke <tim.burke@gmail.com> Co-Authored-By: Thiago da Silva <thiagodasilva@gmail.com> |
||
|
Clay Gerrard
|
4601548dab |
Deprecate per-service auto_create_account_prefix
If we move it to constraints it's more globally accessible in our code, but more importantly it's more obvious to ops that everything breaks if you try to mis-configure different values per-service. Change-Id: Ib8f7d08bc48da12be5671abe91a17ae2b49ecfee |
||
|
Zuul
|
7d97e9e519 | Merge "Add option for debug query logging" | ||
|
Clay Gerrard
|
698717d886 |
Allow internal clients to use reserved namespace
Reserve the namespace starting with the NULL byte for internal use-cases. Backend services will allow path names to include the NULL byte in urls and validate names in the reserved namespace. Database services will filter all names starting with the NULL byte from responses unless the request includes the header: X-Backend-Allow-Reserved-Names: true The proxy server will not allow path names to include the NULL byte in urls unless a middlware has set the X-Backend-Allow-Reserved-Names header. Middlewares can use the reserved namespace to create objects and containers that can not be directly manipulated by clients. Any objects and bytes created in the reserved namespace will be aggregated to the user's account totals. When deploying internal proxys developers and operators may configure the gatekeeper middleware to translate the X-Allow-Reserved-Names header to the Backend header so they can manipulate the reserved namespace directly through the normal API. UpgradeImpact: it's not safe to rollback from this change Change-Id: If912f71d8b0d03369680374e8233da85d8d38f85 |
||
|
Zuul
|
cf33b3dac7 | Merge "proxy: stop sending chunks to objects with a Queue" | ||
|
Romain LE DISEZ
|
2f1111a436 |
proxy: stop sending chunks to objects with a Queue
During a PUT of an object, the proxy instanciates one Putter per object-server that will store data (either the full object or a fragment, depending on the storage policy). Each Putter is owning a Queue that will be used to bufferize data chunks before they are written to the socket connected to the object-server. The chunks are moved from the queue to the socket by a greenthread. There is one greenthread per Putter. If the client is uploading faster than the object-servers can manage, the Queue could grow and consume a lot of memory. To avoid that, the queue is bounded (default: 10). Having a bounded queue also allows to ensure that all object-servers will get the data at the same rate because if one queue is full, the greenthread reading from the client socket will block when trying to write to the queue. So the global rate is the one of the slowest object-server. The thing is, every operating system manages socket buffers for incoming and outgoing data. Concerning the send buffer, the behavior is such that if the buffer is full, a call to write() will block, otherwise the call will return immediately. It behaves a lot like the Putter's Queue, except that the size of the buffer is dynamic so it adapts itself to the speed of the receiver. Thus, managing a queue in addition to the socket send buffer is a duplicate queueing/buffering that provides no interest but is, as shown by profiling and benchmarks, very CPU costly. This patch removes the queuing mecanism. Instead, the greenthread reading data from the client will directly write to the socket. If an object-server is getting slow, the buffer will fulfill, blocking the reader greenthread. Benchmark shows a CPU consumption reduction of more than 30% will the observed rate for an upload is increasing by about 45%. Change-Id: Icf8f800cb25096f93d3faa1e6ec091eb29500758 |
||
|
Clay Gerrard
|
e7cd8df5e9 |
Add option for debug query logging
Change-Id: Ic16b505a37748f50dc155212671efb45e2c5051f |
||
|
Tim Burke
|
405a2b2a55 |
py3: Fix swift-drive-audit
Walking through the kernel logs backwards requires that we open them in binary mode. Add a new option to allow users to specify which encoding should be used to interpret those logs; default to the same encoding that open() uses for its default. Change-Id: Iae332bb58388b5521445e75beba6ee2e9f06bfa6 Closes-Bug: #1847955 |
||
|
Zuul
|
cf18e1f47b | Merge "sharding: Cache shard ranges for object writes" | ||
|
Tim Burke
|
a1af3811a7 |
sharding: Cache shard ranges for object writes
Previously, we issued a GET to the root container for every object PUT, POST, and DELETE. This puts load on the container server, potentially leading to timeouts, error limiting, and erroneous 404s (!). Now, cache the complete set of 'updating' shards, and find the shard for this particular update in the proxy. Add a new config option, recheck_updating_shard_ranges, to control the cache time; it defaults to one hour. Set to 0 to fall back to previous behavior. Note that we should be able to tolerate stale shard data just fine; we already have to worry about async pendings that got written down with one shard but may not get processed until that shard has itself sharded or shrunk into another shard. Also note that memcache has a default value limit of 1MiB, which may be exceeded if a container has thousands of shards. In that case, set() will act like a delete(), causing increased memcache churn but otherwise preserving existing behavior. In the future, we may want to add support for gzipping the cached shard ranges as they should compress well. Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919 Closes-Bug: 1781291 |
||
|
zengjia
|
0ae1ad63c1 |
Update auth_url in install docs
Beginning with the Queens release, the keystone install guide recommends running all interfaces on the same port.This patch updates the swift install guide to reflect that change Change-Id: Id00cfd2c921da352abdbbbb6668b921f3cb31a1a Closes-bug: #1754104 |
||
|
Zuul
|
e62f07d988 | Merge "py3: port staticweb and domain_remap func tests" | ||
|
Zuul
|
9367bff8fc | Merge "py3: add swift-dsvm-functional-py3 job" | ||
|
Tim Burke
|
9d1b749740 |
py3: port staticweb and domain_remap func tests
Drive-by: Tighten domain_remap assertions on listings, which required that we fix proxy pipeline placement. Add a note about it to the sample config. Change-Id: I41835148051294088a2c0fb4ed4e7a7b61273e5f |
||
|
Tim Burke
|
345f577ff1 |
s3token: fix conf option name
Related-Change: Ica740c28b47aa3f3b38dbfed4a7f5662ec46c2c4 Change-Id: I71f411a2e99fa8259b86f11ed29d1b816ff469cb |
||
|
Tim Burke
|
4f7c44a9d7 |
Add information about secret_cache_duration to sample config
Related-Change-Id: Id0c01da6aa6ca804c8f49a307b5171b87ec92228 Change-Id: Ica740c28b47aa3f3b38dbfed4a7f5662ec46c2c4 |
||
|
Tim Burke
|
39a54fecdc |
py3: add swift-dsvm-functional-py3 job
Note that keystone wants to stick some UTF-8 encoded bytes into memcached, but we want to store it as JSON... or something? Also, make sure we can hit memcache for containers with invalid UTF-8. Although maybe it'd be better to catch that before we ever try memcache? Change-Id: I1fbe133c8ec73ef6644ecfcbb1931ddef94e0400 |
||
|
Clay Gerrard
|
34bd4f7fa3 |
Clarify usage of dequeue_from_legacy option
Change-Id: Iae9aa7a91b9afc19cb8613b5bc31de463b853dde |
||
|
Kazuhiro MIYAHARA
|
443f029a58 |
Enable to configure object-expirer in object-server.conf
To prepare for object-expirer's general task queue feature [1], this patch enables to configure object-expirer in object-server.conf. Object-expirer.conf can be used in the same manner as before, but deprecated. If both of object-server.conf with "object-expirer" section and object-expirer.conf are in a node, only object-server.conf is used. Object-expirer.conf is used only if all object-server.conf doesn't have "object-expirer" section. There are two differences between "object-expirer.conf" style and "object-server.conf" style. The first difference is `dequeue_from_legacy` default value. `dequeue_from_legacy` defines task queue mode. In "object-expirer.conf" style, the default mode is legacy queue. In "object-server.conf" style, the default mode is general queue. But general mode means no-op mode for now, because general task queue is not implemented yet. The second difference is internal client config. In "object-expirer.conf" style, config file of internal client is the object-expirer.conf itself. In "object-server.conf" style, config file of internal client is another file. [1]: https://review.openstack.org/#/c/517389/ Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: Ib21568f9b9d8547da87a99d65ae73a550e9c3230 |
||
|
Gilles Biannic
|
a4cc353375 |
Make log format for requests configurable
Add the log_msg_template option in proxy-server.conf and log_format in a/c/o-server.conf. It is a string parsable by Python's format() function. Some fields containing user data might be anonymized by using log_anonymization_method and log_anonymization_salt. Change-Id: I29e30ef45fe3f8a026e7897127ffae08a6a80cd9 |
||
|
Tim Burke
|
d748851766 |
s3token: Add note about config change when upgrading from swift3
Change-Id: I2610cbdc9b7bc2b4d614eaedb4f3369d7a424ab3 |
||
|
Clay Gerrard
|
ea8e545a27 |
Rebuild frags for unmounted disks
Change the behavior of the EC reconstructor to perform a fragment rebuild to a handoff node when a primary peer responds with 507 to the REPLICATE request. Each primary node in a EC ring will sync with exactly three primary peers, in addition to the left & right nodes we now select a third node from the far side of the ring. If any of these partners respond unmounted the reconstructor will rebuild it's fragments to a handoff node with the appropriate index. To prevent ssync (which is uninterruptible) receiving a 409 (Conflict) we must give the remote handoff node the correct backend_index for the fragments it will recieve. In the common case we will use determistically different handoffs for each fragment index to prevent multiple unmounted primary disks from forcing a single handoff node to hold more than one rebuilt fragment. Handoff nodes will continue to attempt to revert rebuilt handoff fragments to the appropriate primary until it is remounted or rebalanced. After a rebalance of EC rings (potentially removing unmounted/failed devices), it's most IO efficient to run in handoffs_only mode to avoid unnecessary rebuilds. Closes-Bug: #1510342 Change-Id: Ief44ed39d97f65e4270bf73051da9a2dd0ddbaec |
||
|
Zuul
|
3043c54f28 | Merge "s3api: Allow concurrent multi-deletes" | ||
|
Tim Burke
|
00be3f595e |
s3api: Allow concurrent multi-deletes
Previously, a thousand-item multi-delete request would consider each object to delete serially, and not start trying to delete one until the previous was deleted (or hit an error). Now, allow operators to configure a concurrency factor to allow multiple deletes at the same time. Default the concurrency to 2, like we did for slo and bulk. See also: http://lists.openstack.org/pipermail/openstack-dev/2016-May/095737.html Change-Id: If235931635094b7251e147d79c8b7daa10cdcb3d Related-Change: I128374d74a4cef7a479b221fd15eec785cc4694a |
||
|
Tim Burke
|
692a03473f |
s3api: Change default location to us-east-1
This is more likely to be the default region that a client would try for v4 signatures. UpgradeImpact: ============== Deployers with clusters that relied on the old implicit default location of US should explicitly set location = US in the [filter:s3api] section of proxy-server.conf before upgrading. Change-Id: Ib6659a7ad2bd58d711002125e7820f6e86383be8 |
||
|
Clay Gerrard
|
06cf5d298f |
Add databases_per_second to db daemons
Most daemons have a "go as fast as you can then sleep for 30 seconds" strategy towards resource utilization; the object-updater and object-auditor however have some "X_per_second" options that allow operators much better control over how they spend their I/O budget. This change extends that pattern into the account-replicator, container-replicator, and container-sharder which have been known to peg CPUs when they're not IO limited. Partial-Bug: #1784753 Change-Id: Ib7f2497794fa2f384a1a6ab500b657c624426384 |
||
|
Zuul
|
5cc4a72c76 | Merge "Configure diskfile per storage policy" | ||
|
Alistair Coles
|
904e7c97f1 |
Add more doc and test for cors_expose_headers option
In follow-up to the related change, mention the new cors_expose_headers option (and other proxy-server.conf options) in the CORS doc. Add a test for the cors options being loaded into the proxy server. Improve CORS comments in docs. Change-Id: I647d8f9e9cbd98de05443638628414b1e87d1a76 Related-Change: I5ca90a052f27c98a514a96ee2299bfa1b6d46334 |
||
|
Zuul
|
5d46c0d8b3 | Merge "Adding keep_idle config value to socket" | ||
|
FatemaKhalid
|
cfeb32c66b |
Adding keep_idle config value to socket
User can cofigure KEEPIDLE time for sockets in TCP connection. The default value is the old value which is 600. Change-Id: Ib7fb166deb8a87ae4e97ba0671048b1ec079a2ef Closes-Bug:1759606 |
||
|
Tim Burke
|
5a8cfd6e06 |
Add another user for s3api func tests
Previously we'd use two users, one admin and one unprivileged. Ceph's s3-tests, however, assume that both users should have access to create buckets. Further, there are different errors that may be returned depending on whether you are the *bucket* owner or not when using s3_acl. So now we've got: test:tester1 (admin) test:tester2 (also admin) test:tester3 (unprivileged) Change-Id: I0b67c53de3bcadc2c656d86131fca5f2c3114f14 |
||
|
Romain LE DISEZ
|
673fda7620 |
Configure diskfile per storage policy
With this commit, each storage policy can define the diskfile to use to access objects. Selection of the diskfile is done in swift.conf. Example: [storage-policy:0] name = gold policy_type = replication default = yes diskfile = egg:swift#replication.fs The diskfile configuration item accepts the same format than middlewares declaration: [[scheme:]egg_name#]entry_point The egg_name is optional and default to "swift". The scheme is optional and default to the only valid value "egg". The upstream entry points are "replication.fs" and "erasure_coding.fs". Co-Authored-By: Alexandre Lécuyer <alexandre.lecuyer@corp.ovh.com> Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I070c21bc1eaf1c71ac0652cec9e813cadcc14851 |
||
|
Alistair Coles
|
2722e49a8c |
Add support for multiple root encryption secrets
For some use cases operators would like to periodically introduce a new encryption root secret that would be used when new object data is written. However, existing encrypted data does not need to be re-encrypted with keys derived from the new root secret. Older root secret(s) would still be used as necessary to decrypt older object data. This patch modifies the KeyMaster class to support multiple root secrets indexed via unique secret_id's, and to store the id of the root secret used for an encryption operation in the crypto meta. The decrypter is modified to fetch appropriate keys based on the secret id in retrieved crypto meta. The changes are backwards compatible with previous crypto middleware configurations and existing encrypted object data. Change-Id: I40307acf39b6c1cc9921f711a8da55d03924d232 |
||
|
Zuul
|
00373dad61 | Merge "Add keymaster to fetch root secret from KMIP service" | ||
|
Samuel Merritt
|
8e651a2d3d |
Add fallocate_reserve to account and container servers.
The object server can be configured to leave a certain amount of disk space free; default is 1%. This is useful in avoiding 100%-full filesystems, as those can get Swift in a state where the filesystem is too full to write tombstones, so you can't delete objects to free up space. When a cluster has accounts/containers and objects on the same disks, then you can wind up with a 100%-full disk since account and container servers don't respect fallocate_reserve. This commit makes account and container servers respect fallocate_reserve so that disks shared between account/container and object rings won't get 100% full. When a disk's free space falls below the configured reserve, account and container PUT, POST, and REPLICATE requests will fail with a 507 status code. These are the operations that can significantly increase the disk space used by a given database. I called the parameter "fallocate_reserve" for consistency with the object server. No actual fallocate() call happens under Swift's control in the account or container servers (sqlite3 might make such a call, but it's out of our hands). Change-Id: I083442eef14bf83c0ea717b1decb3e6b56dbf1d0 |
||
|
Alistair Coles
|
1951dc7e9a |
Add keymaster to fetch root secret from KMIP service
Add a new middleware that can be used to fetch an encryption root secret from a KMIP service. The middleware uses a PyKMIP client to interact with a KMIP endpoint. The middleware is configured with a unique identifier for the key to be fetched and options required for the PyKMIP client. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: Ib0943fb934b347060fc66c091673a33bcfac0a6d |
||
|
Zuul
|
ea33638d0c | Merge "object-updater: add concurrent updates" | ||
|
Samuel Merritt
|
d5c532a94e |
object-updater: add concurrent updates
The object updater now supports two configuration settings: "concurrency" and "updater_workers". The latter controls how many worker processes are spawned, while the former controls how many concurrent container updates are performed by each worker process. This should speed the processing of async_pendings. There is a change to the semantics of the configuration options. Previously, "concurrency" controlled the number of worker processes spawned, and "updater_workers" did not exist. I switched the meanings for consistency with other configuration options. In the object reconstructor, object replicator, object server, object expirer, container replicator, container server, account replicator, account server, and account reaper, "concurrency" refers to the number of concurrent tasks performed within one process (for reference, the container updater and object auditor use "concurrency" to mean number of processes). On upgrade, a node configured with concurrency=N will still handle async updates N-at-a-time, but will do so using only one process instead of N. UpgradeImpact: If you have a config file like this: [object-updater] concurrency = <N> and you want to take advantage of faster updates, then do this: [object-updater] concurrency = 8 # the default; you can omit this line updater_workers = <N> If you want updates to be processed exactly as before, do this: [object-updater] concurrency = 1 updater_workers = <N> Change-Id: I17e18088e61f664e1b9942d66423666d0cae1689 |
||
|
Zuul
|
c01c43d982 | Merge "Adds read_only middleware" |