c93c0c0c6eff5b354e2d97dd61daae13cbd2f278
Commit Graph

94 Commits

Author SHA1 Message Date
shangxiaobj
c93c0c0c6e [Trivialfix]Fix typos in swift
Fix typos that found in swift.
Change-Id: I52fad1a4882cec4456f22174b46d54e42ec66d97
2017年08月04日 07:50:10 +00:00
Clay Gerrard
701a172afa Add multiple worker processes strategy to reconstructor
This change adds a new Strategy concept to the daemon module similar to
how we manage WSGI workers. We need to leverage multiple python
processes to get the concurrency properties we need. More workers will
rebalance much faster on dense chassis with many devices.
Currently the default is still only one process, and no workers. Set
reconstructor_workers in the [object-reconstructor] section to some
whole number <= the number of devices on a node to get that many
reconstructor workers.
Each worker will operate on a different subset of disks.
Once mode works as before, but tends to want to update recon drops a
little bit more.
If you change the rings, the strategy will shutdown workers and spawn
new ones.
You can kill the worker pids and the daemon strategy will respawn them.
New per-disk reconstructor stats are dumped to recon under the
object_reconstruction_per_disk key. To maintain legacy compatibility
and replication monitoring based on cycle times they are aggregated
every stats_interval (default 5 mins).
Change-Id: I28925a37f3985c9082b5a06e76af4dc3ec813abe
2017年07月26日 16:55:10 -07:00
Alistair Coles
9c5628b4f1 Add reconstructor section to deployment guide
Change-Id: I062998e813718828b7adf4e7c3f877b6a31633c0
Closes-Bug: #1626290 
2017年07月20日 11:40:17 +01:00
Jenkins
f1e1dbb80a Merge "Make eventlet.tpool's thread count configurable in object server" 2017年07月04日 11:49:24 +00:00
Samuel Merritt
d9c4913e3b Make eventlet.tpool's thread count configurable in object server
If you're running servers_per_port > 0 and threads_per_disk = 0 (as it
should be with servers_per_port on), each object-server process will
have 20 IO threads waiting around to service eventlet.tpool
calls. This is far too many; with servers_per_port, there's no real
benefit to having so many IO threads.
This commit makes it so that, when servers_per_port > 0, each object
server defaults to having one main thread and one IO thread.
Also, eventlet's tpool size is now configurable via the object-server
config file. If a tpool size is set, that's what we'll use regardless
of servers_per_port. This allows operators with an excess of threads
to remove some regardless of servers_per_port.
Change-Id: I8f8914b7e70f2510393eb7c5e6be9708631ac027
Closes-Bug: 1554233
2017年06月23日 16:16:03 +10:00
Ondřej Nový
a8bc94c7e3 Replace slowdown option with *_per_second option
container and object updaters sleeps "slowdown" (default 0.01) seconds
after every processed container/object. Because time.sleep call adds overhead,
use ratelimit_sleep from common.utils instead. Same as in auditor.
Change-Id: I362aa0f13c78ad03ce1f76ee0257b0646f981212
2017年06月16日 19:22:00 +00:00
Clay Gerrard
da557011ec Deprecate broken handoffs_first in favor of handoffs_only
The handoffs_first mode in the replicator has the useful behavior of
processing all handoff parts across all disks until there aren't any
handoffs anymore on the node [1] and then it seemingly tries to drop
back into normal operation. In practice I've only ever heard of
handoffs_first used while rebalancing and turned off as soon as the
rebalance finishes - it's not recommended to run with handoffs_first
mode turned on and it emits a warning on startup if option is enabled.
The handoffs_first mode on the reconstructor doesn't work - it was
prioritizing handoffs *per-part* [2] - which is really unfortunate
because in the reconstructor during a rebalance it's often *much* more
attractive from an efficiency disk/network perspective to revert a
partition from a handoff than it is to rebuild an entire partition from
another primary using the other EC fragments in the cluster.
This change deprecates handoffs_first in favor of handoffs_only in the
reconstructor which is far more useful - and just like handoffs_first
mode in the replicator - it gives the operator the option of forcing the
consistency engine to focus on rebalance. The handoffs_only behavior is
somewhat consistent with the replicator's handoffs_first option (any
error on any handoff in the replicactor will make it essentially handoff
only forever) but the option does what you want and is named correctly
in the reconstructor.
For consistency with the replicator the reconstructor will mostly honor
the handoffs_first option, but if you set handoffs_only in the config it
always takes precedence. Having handoffs_first in your config always
results in a warning, but if handoff_only is not set and handoffs_first
is true the reconstructor will assume you need handoffs_only and behaves
as such.
When running in handoffs_only mode the reconstructor will start to log a
warning every cycle if you leave it running in handoffs_only after it
finishes reverting handoffs. However you should be monitoring on-disk
partitions and disable the option as soon as the cluster finishes the
full rebalance cycle.
1. Ia324728d42c606e2f9e7d29b4ab5fcbff6e47aea fixed replicator
handoffs_first "mode"
2. Unlike replication each partition in a EC policy can have a different
kind of job per frag_index, but the cardinality of jobs is typically
only one (either sync or revert) unless there's been a bunch of errors
during write and then handoffs partitions maybe hold a number of
different fragments.
Known-Issues:
handoffs_only is not documented outside of the example config, see lp
bug #1626290
Closes-Bug: #1653018
Change-Id: Idde4b6cf92fab6c45f2c0c2733277701eb436898
2017年02月13日 21:13:29 -08:00
Mahati Chamarthy
69f7be99a6 Move documented reclaim_age option to correct location
The reclaim_age is a DiskFile option, it doesn't make sense for two
different object services or nodes to use different values.
I also driveby cleanup the reclaim_age plumbing from get_hashes to
cleanup_ondisk_files since it's a method on the Manager and has access
to the configured reclaim_age. This fixes a bug where finalize_put
wouldn't use the [DEFAULT]/object-server configured reclaim_age - which
is normally benign but leads to weird behavior on DELETE requests with
really small reclaim_age.
There's a couple of places in the replicator and reconstructor that
reach into their manager to borrow the reclaim_age when emptying out
the aborted PUTs that failed to cleanup their files in tmp - but that
timeout doesn't really need to be coupled with reclaim_age and that
method could have just as reasonably been implemented on the Manager.
UpgradeImpact: Previously the reclaim_age was documented to be
configurable in various object-* services config sections, but that did
not work correctly unless you also configured the option for the
object-server because of REPLICATE request rehash cleanup. All object
services must use the same reclaim_age. If you require a non-default
reclaim age it should be set in the [DEFAULT] section. If there are
different non-default values, the greater should be used for all object
services and configured only in the [DEFAULT] section.
If you specify a reclaim_age value in any object related config you
should move it to *only* the [DEFAULT] section before you upgrade. If
you configure a reclaim_age less that your consistency window you are
likely to be eaten by a Grue.
Closes-Bug: #1626296
Change-Id: I2b9189941ac29f6e3be69f76ff1c416315270916
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
2017年01月13日 03:10:47 +00:00
Ondřej Nový
99a13d9386 Fixed rysnc -> rsync typo
Change-Id: I671b4206072c6e22f4ae38033502336ec32e86ad
2016年10月19日 20:17:00 +02:00
Peter Lisák
ed772236c7 Change schedule priority of daemon/server in config
The goal is to modify schedule priority and I/O scheduling class and
priority of daemon/server via configuration.
Setting is optional, default keeps current behaviour.
Use case:
Prioritize object-server to object-auditor, because all user's requests
needed to be served in peak hours and audit could wait.
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
DocImpact
Change-Id: I1018a18f4706daabdb84574ffd9a58d831e68396
2016年08月10日 23:56:15 +02:00
Jenkins
a403faadd4 Merge "Allow fallocate_reserve to be a percentage" 2016年05月12日 08:18:39 +00:00
Jenkins
6a88f27eb0 Merge "Remove threads_per_disk setting" 2016年05月11日 01:36:43 +00:00
Shashirekha Gundur
cf48e75c25 change default ports for servers
Changing the recommended ports for Swift services
from ports 6000-6002 to unused ports 6200-6202;
so they do not conflict with X-Windows or other services.
Updated SAIO docs.
DocImpact
Closes-Bug: #1521339
Change-Id: Ie1c778b159792c8e259e2a54cb86051686ac9d18
2016年04月29日 14:47:38 -04:00
Christian Schwede
9d6a055b31 Remove threads_per_disk setting
This patch removes the threads_per_disk setting. It was already a deprecated
setting and by default set to 0, which effectively meant to not use a per-disk
thread pool at all. Users are encouraged to use servers_per_port instead.
DocImpact
Change-Id: Ie76be5c8a74d60a1330627caace19e06d1b9383c
2016年04月28日 12:06:24 -05:00
Andy McCrae
0da9da5131 Allow fallocate_reserve to be a percentage
Add the ability to set the fallocate_reserve value as a percentage.
This happens automatically when adding the '%' at the end of the value.
Having the ability to set a % of free space rather than a byte value is
useful especially when drive sizes are heterogenous.
The default for fallocate_reserve has been adjusted to 1%, having the
fallocate_reserve set seems sensible for all deploys and percentages are
far safer to default than byte values (across drives of any size).
Tests added for using fallocate_reserve as a percentage.
Duplicate tests for fallocate_reserve have been removed.
Docs updated to reflect the fallocate_reserve change.
Change-Id: I4aea613a708205c917e81d6b2861396655e73238
2016年04月23日 08:02:00 -05:00
Clay Gerrard
1d03803a85 Auditor will clean up stale rsync tempfiles
DiskFile already fills in the _ondisk_info attribute when it tries to open
a diskfile - even if the DiskFile's fileset is not valid or deleted.
During this process the rsync tempfiles would be discovered and logged,
but no-one would attempt to clean them up - even if they were really old.
Instead of logging and ignoring unexpected files when validate a DiskFile
fileset we'll add unexpected files to the unexpected key in the
_ondisk_info attribute.
With a little bit of re-organization in the auditor's object_audit method
to get things into a single return path we can add an unconditional check
for unexpected files and remove those that are "old enough".
Since the replicator will kill any rsync processes that are running longer
than the configured rsync_timeout we know that any rsync tempfiles older
than this can be deleted.
Split unlink_older_than in common.utils into two functions to allow an
explicit list of previously discovered paths to be passed in to avoid an
extra listdir. Since the getmtime handling already ignores OSError
there's less concern of race condition where a previous discovered
unexpected file is reaped by rsync while we're attempting to clean it up.
Update some doc on the new config option.
Closes-Bug: #1554005
Change-Id: Id67681cb77f605e3491b8afcb9c69d769e154283
2016年03月23日 19:34:34 +00:00
Kota Tsuyuzaki
ecbcc94989 Fix ssync related object-server docs
Swift now uses SSYNC verb instead of old REPLICATION verb for ssync
protocol. This patch replaces all docs written as REPLICATION into
SSYNC and fix a few words for explanation.
Change-Id: I1253210d4f49749e7d425d6252dd262b650d9548
2016年03月16日 08:58:31 +00:00
gh159m
b5311f63db Removed default value for log_statsd_host
Multiple files and documents showed that log_statsd_host had
a default value, usually localhost. This was incorrect, instead
setting a value for log_statsd_host enables statsd logging.
Removed any reference of log_statsd_host having a default value.
Also changed descriptions to show setting a value enables logging.
Change-Id: I3ca5c0e8b8e4981de3aa6db0c476072b5a59723d
Closes-Bug: #1542227 
2016年02月10日 10:36:59 -06:00
Clay Gerrard
f27ad34e1d Document use-case for slow option
Change-Id: Iec4087a896a2277179e3720d802cca101fa7ad54
2016年02月02日 16:10:47 -08:00
Christian Schwede
ccdf4a9f30 Document slow option in etc/object-server.conf
Change-Id: Ic9940b0b830a468887878f7b0d7ca42c2cbbebd5
2016年02月02日 09:39:32 +01:00
Ondřej Nový
a4c2fe95ab Allow to change auditor sleep interval in config
Change-Id: Ic451c5e0b686509f8982ed1bf65a223a2d77b9a0
2016年01月14日 12:52:52 +01:00
Bill Huber
0bcd7fd50e Update Erasure Coding Overview doc to remove Beta version
The major functionality of EC has been released for Liberty and
the beta version of the code has been removed since it is now
in production.
Change-Id: If60712045fb1af803093d6753fcd60434e637772
2015年12月18日 11:43:12 -06:00
Romain LE DISEZ
71f6fd025e Allows to configure the rsync modules where the replicators will send data
Currently, the rsync module where the replicators send data is static. It
forbids administrators to set rsync configuration based on their current
deployment or needs.
As an example, the rsyncd configuration example encourages to set a connections
limit for the modules account, container and object. It permits to protect
devices from excessives parallels connections, because it would impact
performances.
On a server with many devices, it is tempting to increase this number
proportionally, but nothing guarantees that the distribution of the connections
will be balanced. In the worst scenario, a single device can receive all the
connections, which is a severe impact on performances.
This commit adds a new option named 'rsync_module' to the *-replicator sections
of the *-server configuration file. This configuration variable can be
extrapolated with device attributes like ip, port, device, zone, ... by using
the format {NAME}. eg:
 rsync_module = {replication_ip}::object_{device}
With this configuration, an administrators can solve the problem of connections
distribution by creating one module per device in rsyncd configuration.
The default values are backward compatible:
 {replication_ip}::account
 {replication_ip}::container
 {replication_ip}::object
Option vm_test_mode is deprecated by this commit, but backward compatibility is
maintained. The option is only effective when rsync_module is not set. In that
case, {replication_port} is appended to the default value of rsync_module.
Change-Id: Iad91df50dadbe96c921181797799b4444323ce2e
2015年09月07日 08:00:18 +02:00
John Dickinson
2289137164 do container listing updates in another (green)thread
The actual server-side changes are simple. The tests are a different
matter. Many changes were needed to the object server tests to
handle the now-async calls to the container server. In an effort to
test this properly, some drive-by changes were made to improve tests.
I tested this patch by doing zero-byte object writes to one container
as fast as possible. Then I did it again while also saturating 2 of the
container replica's disks. The results are linked below.
https://gist.github.com/notmyname/2bb85acfd8fbc7fc312a
DocImpact
Change-Id: I737bd0af3f124a4ce3e0862a155e97c1f0ac3e52
2015年07月22日 01:19:58 -07:00
Darrell Bishop
df134df901 Allow 1+ object-servers-per-disk deployment
Enabled by a new > 0 integer config value, "servers_per_port" in the
[DEFAULT] config section for object-server and/or replication server
configs. The setting's integer value determines how many different
object-server workers handle requests for any single unique local port
in the ring. In this mode, the parent swift-object-server process
continues to run as the original user (i.e. root if low-port binding
is required), binds to all ports as defined in the ring, and forks off
the specified number of workers per listen socket. The child, per-port
servers drop privileges and behave pretty much how object-server workers
always have, except that because the ring has unique ports per disk, the
object-servers will only be handling requests for a single disk. The
parent process detects dead servers and restarts them (with the correct
listen socket), starts missing servers when an updated ring file is
found with a device on the server with a new port, and kills extraneous
servers when their port is found to no longer be in the ring. The ring
files are stat'ed at most every "ring_check_interval" seconds, as
configured in the object-server config (same default of 15s).
Immediately stopping all swift-object-worker processes still works by
sending the parent a SIGTERM. Likewise, a SIGHUP to the parent process
still causes the parent process to close all listen sockets and exit,
allowing existing children to finish serving their existing requests.
The drop_privileges helper function now has an optional param to
suppress the setsid() call, which otherwise screws up the child workers'
process management.
The class method RingData.load() can be told to only load the ring
metadata (i.e. everything except replica2part2dev_id) with the optional
kwarg, header_only=True. This is used to keep the parent and all
forked off workers from unnecessarily having full copies of all storage
policy rings in memory.
A new helper class, swift.common.storage_policy.BindPortsCache,
provides a method to return a set of all device ports in all rings for
the server on which it is instantiated (identified by its set of IP
addresses). The BindPortsCache instance will track mtimes of ring
files, so they are not opened more frequently than necessary.
This patch includes enhancements to the probe tests and
object-replicator/object-reconstructor config plumbing to allow the
probe tests to work correctly both in the "normal" config (same IP but
unique ports for each SAIO "server") and a server-per-port setup where
each SAIO "server" must have a unique IP address and unique port per
disk within each "server". The main probe tests only work with 4
servers and 4 disks, but you can see the difference in the rings for the
EC probe tests where there are 2 disks per server for a total of 8
disks. Specifically, swift.common.ring.utils.is_local_device() will
ignore the ports when the "my_port" argument is None. Then,
object-replicator and object-reconstructor both set self.bind_port to
None if server_per_port is enabled. Bonus improvement for IPv6
addresses in is_local_device().
This PR for vagrant-swift-all-in-one will aid in testing this patch:
https://github.com/swiftstack/vagrant-swift-all-in-one/pull/16/
Also allow SAIO to answer is_local_device() better; common SAIO setups
have multiple "servers" all on the same host with different ports for
the different "servers" (which happen to match the IPs specified in the
rings for the devices on each of those "servers").
However, you can configure the SAIO to have different localhost IP
addresses (e.g. 127.0.0.1, 127.0.0.2, etc.) in the ring and in the
servers' config files' bind_ip setting.
This new whataremyips() implementation combined with a little plumbing
allows is_local_device() to accurately answer, even on an SAIO.
In the default case (an unspecified bind_ip defaults to '0.0.0.0') as
well as an explict "bind to everything" like '0.0.0.0' or '::',
whataremyips() behaves as it always has, returning all IP addresses for
the server.
Also updated probe tests to handle each "server" in the SAIO having a
unique IP address.
For some (noisy) benchmarks that show servers_per_port=X is at least as
good as the same number of "normal" workers:
https://gist.github.com/dbishop/c214f89ca708a6b1624a#file-summary-md
Benchmarks showing the benefits of I/O isolation with a small number of
slow disks:
https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-results-md
If you were wondering what the overhead of threads_per_disk looks like:
https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md
DocImpact
Change-Id: I2239a4000b41a7e7cc53465ce794af49d44796c6
2015年06月18日 12:43:50 -07:00
Joanna H. Huang
af8d842076 Replaced setting run_pause with standard interval
The deprecated directive `run_pause` should be replaced with the more
standard one `interval`. The `run_pause` should be still supported for
backward compatibility. This patch updates object replicator to use
`interval` and support `run_pause`. It also updates its sample config
and documentation.
Co-Authored-By: Joanna H. Huang <joanna.huitzu.huang@gmail.com>
Co-Authored-By: Kamil Rykowski <kamil.rykowski@intel.com>
Change-Id: Ie2a3414a96a94efb9273ff53a80b9d90c74fff09
Closes-Bug: #1364735 
2015年05月25日 11:47:47 +02:00
Jenkins
2ea8bae389 Merge "Allow rsync to use compression" 2015年05月13日 10:58:08 +00:00
paul luse
647b66a2ce Erasure Code Reconstructor
This patch adds the erasure code reconstructor. It follows the
design of the replicator but:
 - There is no notion of update() or update_deleted().
 - There is a single job processor
 - Jobs are processed partition by partition.
 - At the end of processing a rebalanced or handoff partition, the
 reconstructor will remove successfully reverted objects if any.
And various ssync changes such as the addition of reconstruct_fa()
function called from ssync_sender which performs the actual
reconstruction while sending the object to the receiver
Co-Authored-By: Alistair Coles <alistair.coles@hp.com>
Co-Authored-By: Thiago da Silva <thiago@redhat.com>
Co-Authored-By: John Dickinson <me@not.mn>
Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
Co-Authored-By: Tushar Gohad <tushar.gohad@intel.com>
Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
Co-Authored-By: Christian Schwede <christian.schwede@enovance.com>
Co-Authored-By: Yuan Zhou <yuan.zhou@intel.com>
blueprint ec-reconstructor
Change-Id: I7d15620dc66ee646b223bb9fff700796cd6bef51
2015年04月14日 00:52:17 -07:00
Prashanth Pai
9c33bbde69 Allow rsync to use compression
From rsync's man page:
-z, --compress
With this option, rsync compresses the file data as it is sent to the
destination machine, which reduces the amount of data being transmitted --
something that is useful over a slow connection.
A configurable option has been added to allow rsync to compress, but only
if the remote node is in a different region than the local one.
NOTE: Objects that are already compressed (for example: .tar.gz, .mp3)
might slow down the syncing process.
On wire compression can also be extended to ssync later in a different
change if required. In case of ssync, we could explore faster
compression libraries like lz4. rsync uses zlib which is slow but offers
higher compression ratio.
Change-Id: Ic9b9cbff9b5e68bef8257b522cc352fc3544db3c
Signed-off-by: Prashanth Pai <ppai@redhat.com>
2015年03月02日 14:39:58 +05:30
Eohyung Lee
87d8626505 fix example typo
5 * 1024 * 1024 = 5242880
Change-Id: I0eeb6e2d9fbd79103cd8c658627344f73fed9498
2014年11月20日 11:38:49 +09:00
Samuel Merritt
7d0e5ebe69 Zero-copy object-server GET responses with splice()
This commit lets the object server use splice() and tee() to move data
from disk to the network without ever copying it into user space.
Requires Linux. Sorry, FreeBSD folks. You still have the old
mechanism, as does anyone who doesn't want to use splice. This
requires a relatively recent kernel (2.6.38+) to work, which includes
the two most recent Ubuntu LTS releases (Precise and Trusty) as well
as RHEL 7. However, it excludes Lucid and RHEL 6. On those systems,
setting "splice = on" will result in warnings in the logs but no
actual use of splice.
Note that this only applies to GET responses without Range headers. It
can easily be extended to single-range GET requests, but this commit
leaves that for future work. Same goes for PUT requests, or at least
non-chunked ones.
On some real hardware I had laying around (not a VM), this produced a
37% reduction in CPU usage for GETs made directly to the object
server. Measurements were done by looking at /proc/<pid>/stat,
specifically the utime and stime fields (user and kernel CPU jiffies,
respectively).
Note: There is a Python module called "splicetee" available on PyPi,
but it's licensed under the GPL, so it cannot easily be added to
OpenStack's requirements. That's why this patch uses ctypes instead.
Also fixed a long-standing annoyance in FakeLogger:
 >>> fake_logger.warn('stuff')
 >>> fake_logger.get_lines_for_level('warn')
 []
 >>>
This, of course, is because the correct log level is 'warning'. Now
you get a KeyError if you call get_lines_for_level with a bogus log
level.
Change-Id: Ic6d6b833a5b04ca2019be94b1b90d941929d21c8
2014年09月18日 16:02:47 -07:00
John Dickinson
b7281cf2c5 make the bind_port config setting required
In a long-term effort to change the recommended ports for Swift,
the first step is to require the bind_port in config files. Later,
we can change the recommended setting.
Anyone currently explicitly setting the ports will not be affected.
Anyone not setting the ports will need to specify them to match their
rings.
DocImpact
Change-Id: Icca83a263acdd0afc9016424a3e9f8c15e944789
2014年09月08日 07:28:43 -07:00
paul luse
7ea30df9cd Update docs to highlight that the auditor chunk size can be set
May not be obvious, but existing code will let you change the
disk_chunk_size just for the auditor so this just points that
out in the docs. In one short test I ran with a 4 node cluster
with 18GB of 4MB objects on it, changint he auditor chunk size
from the default of 64K to 1MB creased the auditor CPU time from
10% to 4%.
Also added test code to make sure this overridden value is
actually used and checked other auditWorker conf values as
well.
Change-Id: Ia12e1c6127877dc2124b60cd963cd0b6d5f3d6ef
2014年07月10日 14:13:27 -07:00
Eamonn O'Toole
d317888a7e Parallel object auditor
We are soon going to put servers with a high ratio of disk to CPU
into production as object servers. One of our concerns with this
configuration is that the object auditor would take too long to
complete its audit cycle. Therefore we decided to parallelise
the auditor.
The auditor already uses fork(), so we decided to use the parallel
model from the replicator. Concurrency is set by the concurrency
parameter in the auditor stanza, which sets the number of parallel
checksum auditors. The actual number of parallel auditing processes
is concurrency + 1 if zero_byte_fps is non-zero.
Only one ZBF process is forked, and a new ZBF process is forked as
soon as the current ZBF process finishes. Thus the last process
running will always be a ZBF process.
Both forever and once modes are parallelised.
Each checksum auditor process submits a nested dictionary with keys
{'object_auditor_stats_ALL': {'diskn': {..}}} to dump_recon_cache
so that the object_auditor_stats_ALL dict in recon cache consists
of individual sub-dicts for each of the object disks on the server.
The recon cache is no different to before when the checksum auditor
is run in serial mode. When swift-recon is run, it sums the stats
for the individual disks.
DocImpact
Change-Id: I0ce3db57a43e482d4be351cc522fc9060af6e2d3
2014年06月25日 16:57:53 +01:00
gholt
2d00f7b7ba New log_max_line_length option.
Log lines can get quite large, as we previously noticed with rsync error
log lines. We added a setting to cap those, but it really looks like we
should have just done this overall limit. We noticed the issue when we
switched to UDP syslogging and it would occasionally blow past the 16436
lo MTU! This causes Python's logging code to get an error and hilarity
ensues.
Change-Id: I44bdbe68babd58da58c14360379e8fef8a6b75f7
2014年05月22日 20:30:34 +00:00
zhang-hare
f5caac43ac Add profiling middleware in Swift
The profile middleware provide a tool to profile Swift
code on the fly and collect statistic data for performance
analysis. An native simple Web UI is also provided to help
query and visualize the data.
Change-Id: I6a1554b2f8dc22e9c8cd20cff6743513eb9acc05
Implements: blueprint profiling-middleware
2014年05月08日 18:31:07 +08:00
Shane Wang
a94be9443d Fix misspellings in swift
Fix misspellings detected by:
* pip install misspellings
* git ls-files | grep -v locale | misspellings -f -
Change-Id: I6594fc4ca5ae10bd30eac8a2f2493a376adcadee
Closes-Bug: #1257295 
2014年02月20日 16:15:48 +08:00
Prashanth Pai
aad81528d4 Make .expiring_objects account name configurable
The account which tracks objects scheduled for deletion had its name
hard-coded to 'expiring_objects'. This is made configurable via
expiring_objects_account_name option. Backend file-systems
integration efforts may want to treat these "special" accounts in a
different way.
This would still go undocumented, hence 'pseudo-hidden'.
UpgradeImpact: None as the default value would continue to be the same
 which is '.expiring_objects'.
Change-Id: I1a093b0d0e2bdd0c3d723090af03fc0adf2ad7e3
Signed-off-by: Prashanth Pai <ppai@redhat.com>
2014年02月06日 16:38:34 +05:30
Kota Tsuyuzaki
e078dc3da0 Add missing sample config of object-replicator
Change-Id: I2bca67023aeb9a012927c69e23d582d4a0ff2098
2014年01月30日 23:04:46 -08:00
gholt
c859ebf5ce Per device replication_lock
New replication_one_per_device (True by default)
that restricts incoming REPLICATION requests to
one per device, replication_currency allowing.
Also has replication_lock_timeout (15 by default)
to control how long a request will wait to obtain
a replication device lock before giving up.
This should be very useful in that you can be
assured any concurrent REPLICATION requests are
each writing to distinct devices. If you have 100
devices on a server, you can set
replication_concurrency to 100 and be confident
that, even if 100 replication requests were
executing concurrently, they'd each be writing to
separate devices. Before, all 100 could end up
writing to the same device, bringing it to a
horrible crawl.
NOTE: This is only for ssync replication. The
current default rsync replication still has the
potentially horrible behavior.
Change-Id: I36e99a3d7e100699c76db6d3a4846514537ff685
2013年11月22日 21:40:29 +00:00
gholt
a80c720af5 Object replication ssync (an rsync alternative)
For this commit, ssync is just a direct replacement for how
we use rsync. Assuming we switch over to ssync completely
someday and drop rsync, we will then be able to improve the
algorithms even further (removing local objects as we
successfully transfer each one rather than waiting for whole
partitions, using an index.db with hash-trees, etc., etc.)
For easier review, this commit can be thought of in distinct
parts:
1) New global_conf_callback functionality for allowing
 services to perform setup code before workers, etc. are
 launched. (This is then used by ssync in the object
 server to create a cross-worker semaphore to restrict
 concurrent incoming replication.)
2) A bit of shifting of items up from object server and
 replicator to diskfile or DEFAULT conf sections for
 better sharing of the same settings. conn_timeout,
 node_timeout, client_timeout, network_chunk_size,
 disk_chunk_size.
3) Modifications to the object server and replicator to
 optionally use ssync in place of rsync. This is done in
 a generic enough way that switching to FutureSync should
 be easy someday.
4) The biggest part, and (at least for now) completely
 optional part, are the new ssync_sender and
 ssync_receiver files. Nice and isolated for easier
 testing and visibility into test coverage, etc.
All the usual logging, statsd, recon, etc. instrumentation
is still there when using ssync, just as it is when using
rsync.
Beyond the essential error and exceptional condition
logging, I have not added any additional instrumentation at
this time. Unless there is something someone finds super
pressing to have added to the logging, I think such
additions would be better as separate change reviews.
FOR NOW, IT IS NOT RECOMMENDED TO USE SSYNC ON PRODUCTION
CLUSTERS. Some of us will be in a limited fashion to look
for any subtle issues, tuning, etc. but generally ssync is
an experimental feature. In its current implementation it is
probably going to be a bit slower than rsync, but if all
goes according to plan it will end up much faster.
There are no comparisions yet between ssync and rsync other
than some raw virtual machine testing I've done to show it
should compete well enough once we can put it in use in the
real world.
If you Tweet, Google+, or whatever, be sure to indicate it's
experimental. It'd be best to keep it out of deployment
guides, howtos, etc. until we all figure out if we like it,
find it to be stable, etc.
Change-Id: If003dcc6f4109e2d2a42f4873a0779110fff16d6
2013年11月07日 16:52:01 +00:00
Greg Lange
176a34161f Make the length of a line logged configurable
Failed calls to rysnc can
 result in very long log lines.
These lines are
 mostly made up
of file paths and are
 not always useful.
This change will
 allow for reducing the
length of these
 lines logged if desired.
Change-Id: I9a28f19eadc07757da9d42b0d7be1ed82170d732
2013年08月21日 19:38:24 +00:00
Jenkins
a2126add0b Merge "Set default wsgi workers to cpu_count" 2013年07月30日 19:12:28 +00:00
Newptone
5c1a7871d9 Unified format of boolean params in conf files
In swift conf files, boolean options use different
format: some use true/false, and some use True/False.
This patch is aim to using lowcase true/false to unify
boolean params formats in swift conf files.
Fix Bug #1203421
Change-Id: I3e1bfc6e43231f51e0710aa54869f3774ee896b1
2013年07月23日 15:40:05 +08:00
Clay Gerrard
de3acec4bf Set default wsgi workers to cpu_count
Change the default value of wsgi workers from 1 to auto. The new default
value for workers in the proxy, container, account & object wsgi servers will
spawn as many workers per process as you have cpu cores.
This will not be ideal for some configurations, but it's much more likely to
produce a successful out of the box deployment.
Inspect the number of cpu_cores using python's multiprocessing when available.
Multiprocessing was added in python 2.6, but I know I've compiled python
without it before on accident. The cpu_count method seems to be pretty system
agnostic, but it says it can raise NotImplementedError or sometimes return 0.
Add a new utility method 'config_auto_int_value' to pull an integer out of the
config which has a dynamic default.
 * drive by s/container/proxy/ in proxy-server.conf.5
 * fix misplaced max_clients in *-server.conf-sample
 * update doc/development_saio to force workers = 1
DocImpact
Change-Id: Ifa563d22952c902ab8cbe1d339ba385413c54e95
2013年07月18日 22:57:18 -07:00
David Goetz
043bfb77f4 Record some simple object stats in the object auditor
Change-Id: I043a80c38091f59ce6707730363a4b43b29ae6ec
2013年07月02日 13:41:18 -07:00
Jenkins
b63b5d590a Merge "Use threadpools in the object server for performance." 2013年06月11日 22:47:07 +00:00
Samuel Merritt
b491549ac2 Use threadpools in the object server for performance.
Without a (per-disk) threadpool, requests to a slow disk would affect
all clients by blocking the entire eventlet reactor on
read/write/etc. The slower the disk, the worse the performance. On an
object server, you frequently have at least one slow disk due to
auditing and replication activity sucking up all the available IO. By
kicking those blocking calls out to a separate OS thread, we let the
eventlet reactor make progress in other greenthreads, and by having a
per-disk pool, we ensure that one slow disk can't suck up all the
resources of an entire object server.
There were a few blocking calls that were done with eventlet.tpool,
but that's a fixed-size global threadpool, so I moved them to the
per-disk threadpools. If the object server is configured not to use
per-disk threadpools, (i.e. threads_per_disk = 0, which is the
default), those call sites will still ultimately end up using
eventlet.tpool.execute. You won't end up blocking a whole object
server while waiting for a huge fsync.
If you decide not to use threadpools, the only extra overhead should
be a few extra Python function calls here and there. This is
accomplished by setting threads_per_disk = 0 in the config.
blueprint concurrent-disk-io
Change-Id: I490f8753d926fdcee3a0c65c5aaf715bc2b7c290
2013年06月07日 13:06:04 -07:00
Samuel Merritt
efdb0e3681 Make sample configs more readable.
Inject some empty lines to avoid the wall-of-text effect and to make
it a little clearer which descriptions go with which options.
Change-Id: I58914b83dad76ea5ca330903a246bee7ffaeba83
2013年06月06日 15:35:19 -07:00
Dieter Plaetinck
442fd83a8b implement an rsync_bwlimit setting for object replicator
Change-Id: I8789d6e4d22de83db9a2760d51a94eb56a48c3b5
2013年05月31日 15:57:19 -04:00