Toolserver/For users

From Meta, a Wikimedia project coordination wiki

This is an archived version of this page, as edited by LeonWeber (talk | contribs) at 07:37, 20 May 2006 (→‎Your account ). It may differ significantly from the current version .

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Toolserver

This box: view · talk · edit

This page provides information for zedler users.

Your account

You have an account. You can log in to via ssh to login-services.zedler.knams.wikimedia.org, or tools.wikimedia.de for short. (This is currently 145.97.39.142.)

All accounts will expire at a certain point in the future. You will be told when at log in time. Accounts which are in use will be renewed before this time according to policy (see Toolserver).

Access control is done via public keys. To add a new ssh key to your account, edit ~/.ssh/authorized_keys and add the new key on a new line (OpenSSH-format SSHv2 keys only).

If you have any questions about your account, please contact zedler-admins@wikimedia.org. Alternatively, you can mail all users by using the toolserver-l mailing list, or ask on the #wikimedia-toolserver IRC channel.

Database

You have a mysql database called "u_<username>". The password for this is in your ~/.my.cnf file. Only you have access to this database, with the exception of tables whose names begin with "pub_"; these tables can be read by any user. The host is sql and not localhost.

Databases for most public wikis are available. These are called <lang><project>_p, where <lang> is the language (e.g. "en" or "de"), and <project> is the project ("wiki" for Wikipedia, else "wiktionary", "wikibooks" etc). Access to these databases is read-only. A list of all wiki databases along with information about the content language and size is in the wiki table in the toolserver database.

The following tables are available for use:

categorylinks
cur
hitcounter
image
imagelinks
interwiki
logging
math
oldimage
page
pagelinks
querycache
revision
site_stats
text
updates
user_groups
recentchanges
archive
user_ids (a user_name -> user_id mapping)

Note: a MySQL bug prevents efficient queries being run on the views in some cases. This will hopefully be fixed at some point.

Some of these databases have additional indexes not present on the standard schemata, these are documented in /usr/local/etc/views. If you need another index added, ask...

Format of Text in cur and text Tables

Text in the 'cur' or 'text' tables may be compressed, or in object format.

compressed text has 'gzip' in old_flags. this data is compressed with headerless zlib compression.
text with "utf8" in old_flags is in UTF-8 encoding. text without this flag may or may not be latin-1. (??? explain further)
text with "object" in old_flags is encoded as a serialised PHP object. this may refer to:
- cur stubs. the PHP object contains the cur_id for the relevant row in the cur table containing the actual text.
- history blob stubs. this is concatenated compressed storage. (??? is this documented?)
- something else?
text with "external" in old_flags is stored in separate external databases. this text is not yet available on zedler.

You can use the MediaWiki function Revision::getRevisionText to extract the actual text automatically (see Revision.php for details).

At some point, the missing old text will be imported from an XML dump, and compressed data will be uncompressed. However, this needs to wait until we have more disk space available.

Warning: the databases containing the data are marked as "latin1". However, they do not contain latin1 data. Most of the data is in UTF-8. This is a holdover from MySQL 4.0's (lack of) UTF-8 support. Be very careful that the MySQL client does not try to convert the data to UTF-8 for you, or you will end up with garbage.

Timestamps

Timestamps are NOT numbers, they are strings consisting of numeric characters. So enclose timestamps in quotes (e.g. rc_timestamp> '200500000000' instead of rc_timestamp> 200500000000). If you forget this, your query will run much slower (up to 50x slower or more).

SQL-Queries

Zombie queries: Just quitting won't stop long running queries. If you accidently started a query that needs too much time you can get its thread_id with SHOW PROCESSLIST and terminate it with KILL thread_id
Bulk insert: Inserting tab-seperated files is much faster that inserting single records: LOAD DATA LOCAL INFILE absolute-path-to-file IGNORE INTO TABLE table
If you run slow queries on the database, you may want to read [1] for a way to make them faster.

Web hosting

You have a web page at http://tools.wikimedia.de/~<username>/. Put the contents in ~/public_html/. You can use PHP for scripting; alternatively, you can use CGI scripts by putting them in ~/public_html/cgi-bin/. Both CGI scripts and PHP scripts will run as your uid, not Apache's.

Webalizer statistics for tools.wikimedia.de are here: [2] (updated every 6 hours). Contact User:Duesentrieb if this is broken.

Disk quota

You have a disk quota of 256MB on the /u01 filesystem. This is not meant to restrict usage; rather, to prevent users from accidentally using too much disk space. If you would like a larger quota, please mail zedler-admins@wikimedia.org explaining how much you need and why.

Software

Most of the usual software you would expect is installed. Ask zedler-admins if you want something else. There are a few points to note:

64-bit compilation environment: by default, on Solaris, everything is compiled in 32-bit mode. to compile a 64-bit binary, use "-xarch=amd64" (cc) or "-m64" (gcc)
mysql: mysql 32-bit client libraries are installed in /opt/mysql50/. the 64-bit libraries are in /usr/local/mysql/
C/C++ compiler: two are available: GCC and Sun Studio. for gcc, use /usr/sfw/bin/gcc (C) or /usr/sfw/bin/g++ (C++). for Studio, use /opt/SUNWspro/bin/cc (C) or /opt/SUNWspro/bin/CC (C++).
/bin/sh: don't use it. use /usr/bin/ksh or /usr/xpg4/bin/sh

GNU userland

if you prefer to use a GNU-style userland, place /usr/local/gnu/bin at the front of your path:

$ PATH=/usr/local/gnu/bin:$PATH; export PATH

you can add this to your .profile.

Batch jobs

Sometimes, people want to run long, slow jobs that use a lot of CPU. but they don't want to disturb other users with these things. using the Fair Share Scheduler[3], it's possible to allocate these processes less CPU time. (in our particular configuration, if one normal job and one batch both are both on the same CPU, the batch job will only receive 20% of the CPU time).

to start a new process in the batch scheduling class, use newtask:

$ /bin/id -p # (GNU id, /usr/local/gnu/bin/id, does not support the p flag)
uid=1001(kate) gid=102(users) projid=3(default)
$ newtask -p batch bash
$ id -p
uid=1001(kate) gid=102(users) projid=4(batch)

any processes you now start will run with lower priority until you return to your default shell with

$ exit

Retrieved from "https://meta.wikimedia.org/w/index.php?title=Toolserver/For_users&oldid=354320"

Categories: