Help:Toolforge/ToolsDB
For admin documentation about ToolsDB, see Portal:Toolforge/Admin/ToolsDB.
Toolforge tools have access to a shared MariaDB server at tools.db.svc.wikimedia.cloud for tool-created databases. Database names must start with the name of the credential user followed by two underscores and then the name of the database: <credentialUser>__<DBName> (e.g. "s51234__mydb").
The credential user is not your user name. It can be found in your $HOME/replica.my.cnf file. The name of the credential user looks something like 'u1234' for a user and 's51234' for a tool account. You can also find the name of the credential user using a live database connection:
SELECTSUBSTRING_INDEX(CURRENT_USER(),'@',1);
Tools can request the ability to create Trove databases via a Toolforge quota-request task.
Privileges on the database
Users have all privileges and have access to all grant options on their databases. Database names ending with _p are granted read access for everyone. Please create a ticket if you need more fine-grained permissions, like sharing a database only between 2 users, or other special permissions.
_p) can also be accessed from Quarry and Superset.
Note that the table names and approximate sizes of all databases, including non-public ones, are publicly visible in the tool-db-usage tool (example).
Steps to create a user database
To create a database on tools.db.svc.wikimedia.cloud:
- Become your tool account.
maintainer@tools-bastion-NN:~$ becometoolaccount - Connect to tools.db.svc.wikimedia.cloud with the replica.my.cnf credentials: You could also just type:
$ mariadb--defaults-file=$HOME/replica.my.cnf-htools.db.svc.wikimedia.cloud
$ sqltools - In the MariaDB console, create a new database (where CREDENTIALUSER is your credentials user, which can be found in your ~/replica.my.cnf file, and DBNAME the name you want to give to your database. Note that there are 2 underscores between CREDENTIALUSER and DBNAME):
MariaDB[(none)]>CREATEDATABASECREDENTIALUSER__DBNAME;
You can then connect to your database using:
$ mariadb--defaults-file=$HOME/replica.my.cnf-htools.db.svc.wikimedia.cloudCREDENTIALUSER__DBNAME
Or:
$ sqltools
MariaDB[(none)]>USECREDENTIALUSER__DBNAME;
Example
Assuming that your tool account is called "mytool", this is what it would look like:
maintainer@tools-bastion-NN:~$ becomemytool tools.mytool@tools-bastion-NN:~$ mariadb--defaults-file=$HOME/replica.my.cnf-htools.db.svc.wikimedia.cloud
MariaDB[(none)]>selectsubstring_index(current_user(),'@',1)asuname; +---------------+ |uname| +---------------+ |u123something| +---------------+ 1rowinset(0.00sec) MariaDB[(none)]>createdatabaseu123something__wiki;
tools-db service name was deprecated in September 2017 and removed in May 2019. Use tools.db.svc.wikimedia.cloud instead.
Note: Some projects like python-Django can throw an exception like MySQLdb._exceptions.OperationalError: (1709, 'Index column size too large. The maximum column size is 767 bytes.') when migrated using the setup above. This can be fixed by altering the database charset to utf-8in most cases. To avoid this, create the database using the following command instead to specify the charset:
MariaDB[(none)]>CREATEDATABASECREDENTIALUSER__DBNAMECHARACTERSETutf8;
Read-only replica host
We maintain two copies of the ToolsDB database, using a MariaDB primary-replica setup.
The read-only replica host can be accessed using the same credentials and the following hostname: tools-readonly.db.svc.wikimedia.cloud
Using the read-only replica host is recommended if you have to run queries that take a long time to complete, as in this way you will reduce the load on the primary host.
Please note that the replica host can sometimes lag behind the primary host, but we are doing our best to keep this lag at a minimum.
Backups
We don't do offline backups of any of the databases in ToolsDB. ToolsDB users can backup their data using mariadb-dump (included in the mariadb image) if necessary:
:# use umask to make the dump private (use unless the database is public) $ toolforgejobsrun--command'umask o-r; ( mariadb-dump --defaults-file=$TOOL_DATA_DIR/replica.my.cnf --host=tools-readonly.db.svc.wikimedia.cloud credentialUser__DBName > $TOOL_DATA_DIR/DBname-$(date -I).sql )'--imagemariadbbackup
Note that we don't recommend storing backups permanently on NFS (/data/project, /home, or /data/scratch on Toolforge) or on any other Cloud VPS hosted drive. True backups should be kept offsite.
Caveats
The Toolsforge team tries to keep ToolsDB configurations as close to MariaDB defaults as possible. This can lead to surprising behaviors, such as:
- Transactions not rolled back on query timeouts, which can be common during high load on a shared database (see this issue)
If you encounter an issue, feel free to add it above.
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)