merging in stats stuff
This commit is contained in:
41 changed files with 2699 additions and 67 deletions
@@ -47,7 +47,7 @@ If you need more throughput to either Account or Container Services, they may
each be deployed to their own servers. For example you might use faster (but
more expensive) SAS or even SSD drives to get faster disk I/O to the databases.
Load balancing and network design is left as an excercise to the reader,
Load balancing and network design is left as an exercise to the reader,
but this is a very important part of the cluster, so time should be spent
designing the network for a Swift cluster.
@@ -59,7 +59,7 @@ Preparing the Ring
The first step is to determine the number of partitions that will be in the
ring. We recommend that there be a minimum of 100 partitions per drive to
insure even distribution accross the drives. A good starting point might be
insure even distribution across the drives. A good starting point might be
to figure out the maximum number of drives the cluster will contain, and then
multiply by 100, and then round up to the nearest power of two.
@@ -154,8 +154,8 @@ Option Default Description
------------------ ---------- ---------------------------------------------
swift_dir /etc/swift Swift configuration directory
devices /srv/node Parent directory of where devices are mounted
mount_check true Weather or not check if the devices are
mounted to prevent accidently writing
mount_check true Whether or not check if the devices are
mounted to prevent accidentally writing
to the root device
bind_ip 0.0.0.0 IP Address for server to bind to
bind_port 6000 Port for server to bind to
@@ -173,7 +173,7 @@ use paste.deploy entry point for the object
log_name object-server Label used when logging
log_facility LOG_LOCAL0 Syslog log facility
log_level INFO Logging level
log_requests True Weather or not to log each request
log_requests True Whether or not to log each request
user swift User to run as
node_timeout 3 Request timeout to external services
conn_timeout 0.5 Connection timeout to external services
@@ -193,7 +193,7 @@ Option Default Description
log_name object-replicator Label used when logging
log_facility LOG_LOCAL0 Syslog log facility
log_level INFO Logging level
daemonize yes Weather or not to run replication as a
daemonize yes Whether or not to run replication as a
daemon
run_pause 30 Time in seconds to wait between
replication passes
@@ -249,9 +249,9 @@ The following configuration options are available:
Option Default Description
------------------ ---------- --------------------------------------------
swift_dir /etc/swift Swift configuration directory
devices /srv/node Parent irectory of where devices are mounted
mount_check true Weather or not check if the devices are
mounted to prevent accidently writing
devices /srv/node Parent directory of where devices are mounted
mount_check true Whether or not check if the devices are
mounted to prevent accidentally writing
to the root device
bind_ip 0.0.0.0 IP Address for server to bind to
bind_port 6001 Port for server to bind to
@@ -339,8 +339,8 @@ Option Default Description
------------------ ---------- ---------------------------------------------
swift_dir /etc/swift Swift configuration directory
devices /srv/node Parent directory or where devices are mounted
mount_check true Weather or not check if the devices are
mounted to prevent accidently writing
mount_check true Whether or not check if the devices are
mounted to prevent accidentally writing
to the root device
bind_ip 0.0.0.0 IP Address for server to bind to
bind_port 6002 Port for server to bind to
@@ -353,7 +353,7 @@ user swift User to run as
================== ============== ==========================================
Option Default Description
------------------ -------------- ------------------------------------------
use paste.deploy entry point for the account
use Entry point for paste.deploy for the account
server. For most cases, this should be
`egg:swift#account`.
log_name account-server Label used when logging
@@ -412,6 +412,11 @@ conn_timeout 0.5 Connection timeout to external services
Proxy Server Configuration
--------------------------
An example Proxy Server configuration can be found at
etc/proxy-server.conf-sample in the source code repository.
The following configuration options are available:
[DEFAULT]
============================ =============== =============================
@@ -432,7 +437,7 @@ key_file Path to the ssl .key
============================ =============== =============================
Option Default Description
---------------------------- --------------- -----------------------------
use paste.deploy entry point for
use Entry point for paste.deploy for
the proxy server. For most
cases, this should be
`egg:swift#proxy`.
@@ -443,10 +448,10 @@ log_headers True If True, log headers in each
request
recheck_account_existence 60 Cache timeout in seconds to
send memcached for account
existance
existence
recheck_container_existence 60 Cache timeout in seconds to
send memcached for container
existance
existence
object_chunk_size 65536 Chunk size to read from
object servers
client_chunk_size 65536 Chunk size to read from
@@ -474,7 +479,7 @@ rate_limit_account_whitelist Comma separated list of
rate limit
rate_limit_account_blacklist Comma separated list of
account name hashes to block
completly
completely
============================ =============== =============================
[auth]
@@ -482,7 +487,7 @@ rate_limit_account_blacklist Comma separated list of
============ =================================== ========================
Option Default Description
------------ ----------------------------------- ------------------------
use paste.deploy entry point
use Entry point for paste.deploy
to use for auth. To
use the swift dev auth,
set to:
@@ -500,7 +505,7 @@ Memcached Considerations
------------------------
Several of the Services rely on Memcached for caching certain types of
lookups, such as auth tokens, and container/account existance. Swift does
lookups, such as auth tokens, and container/account existence. Swift does
not do any caching of actual object data. Memcached should be able to run
on any servers that have available RAM and CPU. At Rackspace, we run
Memcached on the proxy servers. The `memcache_servers` config option
@@ -526,7 +531,7 @@ Most services support either a worker or concurrency value in the settings.
This allows the services to make effective use of the cores available. A good
starting point to set the concurrency level for the proxy and storage services
to 2 times the number of cores available. If more than one service is
sharing a server, then some experimentaiton may be needed to find the best
sharing a server, then some experimentation may be needed to find the best
balance.
At Rackspace, our Proxy servers have dual quad core processors, giving us 8
@@ -548,7 +553,7 @@ Filesystem Considerations
-------------------------
Swift is designed to be mostly filesystem agnostic--the only requirement
beeing that the filesystem supports extended attributes (xattrs). After
being that the filesystem supports extended attributes (xattrs). After
thorough testing with our use cases and hardware configurations, XFS was
the best all-around choice. If you decide to use a filesystem other than
XFS, we highly recommend thorough testing.
@@ -611,5 +616,5 @@ Logging Considerations
Swift is set up to log directly to syslog. Every service can be configured
with the `log_facility` option to set the syslog log facility destination.
It is recommended to use syslog-ng to route the logs to specific log
We recommended using syslog-ng to route the logs to specific log
files locally on the server and also to remote log collecting servers.
@@ -7,9 +7,7 @@ Instructions for setting up a dev VM
------------------------------------
This documents setting up a virtual machine for doing Swift development. The
virtual machine will emulate running a four node Swift cluster. It assumes
you're using *VMware Fusion 3* on *Mac OS X Snow Leopard*, but should give a
good idea what to do on other environments.
virtual machine will emulate running a four node Swift cluster.
* Get the *Ubuntu 10.04 LTS (Lucid Lynx)* server image:
@@ -17,20 +15,9 @@ good idea what to do on other environments.
- Ubuntu Live/Install: http://cdimage.ubuntu.com/releases/10.04/release/ubuntu-10.04-dvd-amd64.iso (4.1 GB)
- Ubuntu Mirrors: https://launchpad.net/ubuntu/+cdmirrors
* Create guest virtual machine:
#. `Continue without disc`
#. `Use operating system installation disc image file`, pick the .iso
from above.
#. Select `Linux` and `Ubuntu 64-bit`.
#. Fill in the *Linux Easy Install* details.
#. `Customize Settings`, name the image whatever you want
(`SAIO` for instance.)
#. When the `Settings` window comes up, select `Hard Disk`, create an
extra disk (the defaults are fine).
#. Start the virtual machine up and wait for the easy install to
finish.
* Create guest virtual machine from the Ubuntu image (if you are going to use
a separate partition for swift data, be sure to add another device when
creating the VM)
* As root on guest (you'll have to log in as you, then `sudo su -`):
#. `apt-get install python-software-properties`
@@ -41,11 +28,22 @@ good idea what to do on other environments.
python-xattr sqlite3 xfsprogs python-webob python-eventlet
python-greenlet python-pastedeploy`
#. Install anything else you want, like screen, ssh, vim, etc.
#. `fdisk /dev/sdb` (set up a single partition)
#. `mkfs.xfs -i size=1024 /dev/sdb1`
#. If you would like to use another partition for storage:
#. `fdisk /dev/sdb` (set up a single partition)
#. `mkfs.xfs -i size=1024 /dev/sdb1`
#. Edit `/etc/fstab` and add
`/dev/sdb1 /mnt/sdb1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0`
#. If you would like to use a loopback device instead of another partition:
#. `dd if=/dev/zero of=/srv/swift-disk bs=1024 count=0 seek=1000000`
(modify seek to make a larger or smaller partition)
#. `mkfs.xfs -i size=1024 /srv/swift-disk`
#. Edit `/etc/fstab` and add
`/srv/swift-disk /mnt/sdb1 xfs loop,noatime,nodiratime,nobarrier,logbufs=8 0 0`
#. `mkdir /mnt/sdb1`
#. Edit `/etc/fstab` and add
`/dev/sdb1 /mnt/sdb1 xfs noatime,nodiratime,nobarrier,logbufs=8 0 0`
#. `mount /mnt/sdb1`
#. `mkdir /mnt/sdb1/1 /mnt/sdb1/2 /mnt/sdb1/3 /mnt/sdb1/4 /mnt/sdb1/test`
#. `chown <your-user-name>:<your-group-name> /mnt/sdb1/*`
@@ -56,7 +54,7 @@ good idea what to do on other environments.
#. Add to `/etc/rc.local` (before the `exit 0`)::
mkdir /var/run/swift
chown <your-user-name>:<your-user-name> /var/run/swift
chown <your-user-name>:<your-group-name> /var/run/swift
#. Create /etc/rsyncd.conf::
@@ -64,7 +62,7 @@ good idea what to do on other environments.
gid = <Your group name>
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
address = 127.0.0.1
[account6012]
max connections = 25
@@ -472,6 +470,11 @@ good idea what to do on other environments.
sudo service rsyslog restart
sudo service memcached restart
.. note::
If you are using a loopback device, substitute `/dev/sdb1` above with
`/srv/swift-disk`
#. Create `~/bin/remakerings`::
#!/bin/bash
@@ -24,6 +24,7 @@ Overview:
overview_reaper
overview_auth
overview_replication
overview_stats
rate_limiting
Development:
184
doc/source/overview_stats.rst
Normal file
184
doc/source/overview_stats.rst
Normal file
@@ -0,0 +1,184 @@
==================
Swift stats system
==================
The swift stats system is composed of three parts parts: log creation, log
uploading, and log processing. The system handles two types of logs (access
and account stats), but it can be extended to handle other types of logs.
---------
Log Types
---------
***********
Access logs
***********
Access logs are the proxy server logs. Rackspace uses syslog-ng to redirect
the proxy log output to an hourly log file. For example, a proxy request that
is made on August 4, 2010 at 12:37 gets logged in a file named 2010080412.
This allows easy log rotation and easy per-hour log processing.
******************
Account stats logs
******************
Account stats logs are generated by a stats system process.
swift-account-stats-logger runs on each account server (via cron) and walks
the filesystem looking for account databases. When an account database is
found, the logger selects the account hash, bytes_used, container_count, and
object_count. These values are then written out as one line in a csv file. One
csv file is produced for every run of swift-account-stats-logger. This means
that, system wide, one csv file is produced for every storage node. Rackspace
runs the account stats logger every hour. Therefore, in a cluster of ten
account servers, ten csv files are produced every hour. Also, every account
will have one entry for every replica in the system. On average, there will be
three copies of each account in the aggregate of all account stat csv files
created in one system-wide run.
----------------------
Log Processing plugins
----------------------
The swift stats system is written to allow a plugin to be defined for every
log type. Swift includes plugins for both access logs and storage stats logs.
Each plugin is responsible for defining, in a config section, where the logs
are stored on disk, where the logs will be stored in swift (account and
container), the filename format of the logs on disk, the location of the
plugin class definition, and any plugin-specific config values.
The plugin class definition defines three methods. The constructor must accept
one argument (the dict representation of the plugin's config section). The
process method must accept an iterator, and the account, container, and object
name of the log. The keylist_mapping accepts no parameters.
-------------
Log Uploading
-------------
swift-log-uploader accepts a config file and a plugin name. It finds the log
files on disk according to the plugin config section and uploads them to the
swift cluster. This means one uploader process will run on each proxy server
node and each account server node. To not upload partially-written log files,
the uploader will not upload files with an mtime of less than two hours ago.
Rackspace runs this process once an hour via cron.
--------------
Log Processing
--------------
swift-log-stats-collector accepts a config file and generates a csv that is
uploaded to swift. It loads all plugins defined in the config file, generates
a list of all log files in swift that need to be processed, and passes an
iterable of the log file data to the appropriate plugin's process method. The
process method returns a dictionary of data in the log file keyed on (account,
year, month, day, hour). The log-stats-collector process then combines all
dictionaries from all calls to a process method into one dictionary. Key
collisions within each (account, year, month, day, hour) dictionary are
summed. Finally, the summed dictionary is mapped to the final csv values with
each plugin's keylist_mapping method.
The resulting csv file has one line per (account, year, month, day, hour) for
all log files processed in that run of swift-log-stats-collector.
================================
Running the stats system on SAIO
================================
#. Create a swift account to use for storing stats information, and note the
account hash. The hash will be used in config files.
#. Install syslog-ng::
sudo apt-get install syslog-ng
#. Add the following to the end of `/etc/syslog-ng/syslog-ng.conf`::
# Added for swift logging
destination df_local1 { file("/var/log/swift/proxy.log" owner(<username>) group(<groupname>)); };
destination df_local1_err { file("/var/log/swift/proxy.error" owner(<username>) group(<groupname>)); };
destination df_local1_hourly { file("/var/log/swift/hourly/$YEAR$MONTH$DAY$HOUR" owner(<username>) group(<groupname>)); };
filter f_local1 { facility(local1) and level(info); };
filter f_local1_err { facility(local1) and not level(info); };
# local1.info -/var/log/swift/proxy.log
# write to local file and to remove log server
log {
source(s_all);
filter(f_local1);
destination(df_local1);
destination(df_local1_hourly);
};
# local1.error -/var/log/swift/proxy.error
# write to local file and to remove log server
log {
source(s_all);
filter(f_local1_err);
destination(df_local1_err);
};
#. Restart syslog-ng
#. Create the log directories::
mkdir /var/log/swift/hourly
mkdir /var/log/swift/stats
chown -R <username>:<groupname> /var/log/swift
#. Create `/etc/swift/log-processor.conf`::
[log-processor]
swift_account = <your-stats-account-hash>
user = <your-user-name>
[log-processor-access]
swift_account = <your-stats-account-hash>
container_name = log_data
log_dir = /var/log/swift/hourly/
source_filename_format = %Y%m%d%H
class_path = swift.stats.access_processor.AccessLogProcessor
user = <your-user-name>
[log-processor-stats]
swift_account = <your-stats-account-hash>
container_name = account_stats
log_dir = /var/log/swift/stats/
source_filename_format = %Y%m%d%H_*
class_path = swift.stats.stats_processor.StatsLogProcessor
account_server_conf = /etc/swift/account-server/1.conf
user = <your-user-name>
#. Add the following under [app:proxy-server] in `/etc/swift/proxy-server.conf`::
log_facility = LOG_LOCAL1
#. Create a `cron` job to run once per hour to create the stats logs. In
`/etc/cron.d/swift-stats-log-creator`::
0 * * * * <your-user-name> swift-account-stats-logger /etc/swift/log-processor.conf
#. Create a `cron` job to run once per hour to upload the stats logs. In
`/etc/cron.d/swift-stats-log-uploader`::
10 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf stats
#. Create a `cron` job to run once per hour to upload the access logs. In
`/etc/cron.d/swift-access-log-uploader`::
5 * * * * <your-user-name> swift-log-uploader /etc/swift/log-processor.conf access
#. Create a `cron` job to run once per hour to process the logs. In
`/etc/cron.d/swift-stats-processor`::
30 * * * * <your-user-name> swift-log-stats-collector /etc/swift/log-processor.conf
After running for a few hours, you should start to see .csv files in the
log_processing_data container in the swift stats account that was created
earlier. This file will have one entry per account per hour for each account
with activity in that hour. One .csv file should be produced per hour. Note
that the stats will be delayed by at least two hours by default. This can be
changed with the new_log_cutoff variable in the config file. See
`log-processing.conf-sample` for more details.
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.