I am working on an SMS marketing project based on Yii2 Framework (PHP 7.3 & MariaDb (innodb engine)) where we have to move the logs from different tables to the archive db, which is total a different DB server from the live one.
The log tables keep growing and we had to setup an automated job which runs 3 times a week at midnight and will keep filtering out the logs that are older than 45 days and move them to another database.
I decided to use the Range partitioning and then use EXCHANGE PARTITION
to move all the related data to a separate new table so that i keep the main/live table locked for partition process only and keep the copying/moving process that involves Select
operation on a different table.
So I divided the whole process into 2 different processes.
- Partitioning
- Archiving
The partition process is further divided into
- Drop any Previously created Backup tables
- Create a new backup table like live table
- Create Partition on live table
- Exchange the partition with the backup table
- Remove the partition from the live table.
My main focus is on the partition process if it can be improved to work more efficiently and less time. Currently I have the following stats for the partition process; I am only adding for one of the large tables
Transaction History table stats
Rows : total 172,899,990 rows approx
Time to Partition : 1472.429115057 secs(24.54048525095 Mins) with total rows in the partition (12,937,902)
Exchange partition : 0.062991857528687 secs
Removed Partition : 1293.8012390137 secs.(21.56335398356167 Mins)
Transaction History Schema
CREATE TABLE `transaction_history` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`amount` decimal(19,6) NOT NULL,
`description` varchar(150) COLLATE utf8_unicode_ci DEFAULT NULL,
`transaction_type` varchar(30) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'blast',
`remaining_balance` decimal(19,6) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`updated_at` timestamp NULL DEFAULT NULL ON UPDATE current_timestamp(),
PRIMARY KEY (`id`,`created_at`),
KEY `transaction_type` (`transaction_type`),
KEY `user_id_transaction_type` (`user_id`,`transaction_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The code is for the Yii2 console app.
<?php
namespace console\controllers;
use Yii;
use DateTime;
use Exception;
use DateInterval;
use yii\helpers\Console;
use yii\console\Controller;
use yii\helpers\ArrayHelper;
use console\controllers\traits\ArchiveTraits;
class PartitionController extends Controller
{
use ArchiveTraits;
/**
* Contains array of tables and their range column to be used for partition
* in the key=>value format like [table_name=>range_column_name]
*
* Note: before you add any new table to the below list make sure you have
* added the range column to unique index or part of the primary key
*/
const BACKUP_TABLES = [
'message_inbound' => 'created_at',
'transaction_history' => 'created_at',
'sms_api_log' => 'created_at',
'balance_history' => 'created_on',
'buy_number_logs' => 'created_at',
'destination_delivery' => 'created_at',
'message_delivery' => 'created_at',
'email_delivery_logs' => 'created_at',
'responder_keywords_logs' => 'created_at',
'sms_alert_logs' => 'created_at',
'suppression_message_logs' => 'created_at',
];
private $_date = null;
/**
* @var batch size for the
* inserts in the database
*/
const BATCH_SIZE = 10000;
/**
* @var limit for the rows to be migrated to the
* database in one iteration
*/
const MIGRATE_LIMIT = 50000;
public function actionIndex($date = null)
{
$this->_date = $date;
$this->startNotification("Partition Process started", 'partition-start');
ini_set("memory_limit", 0);
$this->stdout("Starting Partition Process.\n");
$this->startPartition();
$this->stdout("Completed Partition Process.\n");
$date = date("Y-m-d");
$this->sendSummaryReport("Partitioning Process Complete for {$date}", "partition-complete", "partition");
}
/**
* @param int $start the start timestamp
*/
public function end($start)
{
return microtime(true) - $start;
}
public function start()
{
return microtime(true);
}
/**
* Starts the partitioning process for the live DB log tables
*
* @return null
* @throws Exception
*/
protected function startPartition()
{
foreach (self::BACKUP_TABLES as $tableName => $rangeColumn) {
try {
$this->partitionNow($tableName, $rangeColumn);
$this->stdout("\n");
} catch (Exception $e) {
$this->sendExceptionEmail($tableName, $e, "Exception on Partitioning table", "partition-exception");
$this->stdout("There was an error while trying to archive the {$tableName} .\n");
$this->stdout($e->getMessage() . "\n===============\n");
$this->stdout("Continuing to archive the next table.\n");
}
}
}
/**
* Creates the backup for the specified table and the range column
* by creating a partition and then exchanging the old data
* partition with the backup table and then move the data to the
* archive database
*
* @param string $tableName the name of the table to backup data from live DB
* @param string $rangeColumn the name of the column used for the range partition
*
* @return null
*/
protected function partitionNow($tableName, $rangeColumn = 'created_at')
{
$rangeOldPartition = $this->rangeOldPartition();
$backupTableName = $this->generateBackupTableName($tableName);
$dbLive = self::_getDsnAttribute('dbname');
//drop backup table if exists
$this->dropBackupTables($tableName);
$this->stdout("Started Partitioning {$tableName}\n");
$startTime = $this->start();
try {
$sql = <<<SQL
-- create the backup table and remove partitioning from the backup table
CREATE TABLE `{$dbLive}`.{{%{$backupTableName}}} LIKE `{$dbLive}`.{{%{$tableName}}};
-- start partitioning the source table
ALTER TABLE `{$dbLive}`.{{%{$tableName}}}
PARTITION BY RANGE(UNIX_TIMESTAMP({$rangeColumn}))
(
PARTITION oldPt VALUES LESS THAN (UNIX_TIMESTAMP("{$rangeOldPartition}")),
PARTITION activePt VALUES LESS THAN (MAXVALUE)
);
SQL;
$command = Yii::$app->db->createCommand($sql);
$command->execute();
//necessary to catch exceptions or errors when
// using multiple SQL statements with createcommand
while ($command->pdoStatement->nextRowSet()) {
//leave blank do nothing
}
$this->stdout("Partitioned table in {$this->end($startTime)} secs.\n", Console::FG_GREEN);
$startTime = $this->start();
$sql = <<<SQL
-- exchange the partition with the backup table
ALTER TABLE `{$dbLive}`.{{%{$tableName}}}
EXCHANGE PARTITION oldPt WITH TABLE `{$dbLive}`.{{%{$backupTableName}}};
SQL;
$command = Yii::$app->db->createCommand($sql);
$command->execute();
$this->stdout("Completed Exchange partition {$this->end($startTime)} secs\n");
$startTime = $this->start();
$sql = <<<SQL
-- remove partition from the source table once data moved to separate table
ALTER TABLE `{$dbLive}`.{{%{$tableName}}} REMOVE PARTITIONING;
SQL;
$command = Yii::$app->db->createCommand($sql);
$command->execute();
$this->stdout("Removed Partition in {$this->end($startTime)} secs.\n");
$this->stdout("Filterd out data from live table.\n");
} catch (Exception $e) {
throw $e;
}
}
/**
* Takes the source table name ad
*
* @param $tableName
*/
protected function dropBackupTables($tableName)
{
$backupTableName = $this->generateBackupTableName($tableName);
$sql = <<<SQL
DROP TABLE IF EXISTS {{%{$backupTableName}}};
SQL;
Yii::$app->db->createCommand($sql)->execute();
}
/**
* Generates the backup table name from the source table name
*
* @param string $tableName the source table to mock the backup table from
*
* @return string $backupTableName the name of the backup table
*/
protected function generateBackupTableName($tableName)
{
$backupTableAlias = 'bkp_' . date('Ymd') . '_';
return "{$backupTableAlias}{$tableName}";
}
/**
* Returns the create table command for the given table
* @param $tableName
*/
protected function getCreateTable($tableName)
{
$data = Yii::$app->db->createCommand("show create table {{%{$tableName}}}")->queryOne();
return str_replace('CREATE TABLE', 'CREATE TABLE IF NOT EXISTS', $data['Create Table']);
}
/**
* @param $tableName
*/
protected function getPageData($tableName, $offset = 0)
{
$limit = self::MIGRATE_LIMIT;
$sql = <<<SQL
SELECT * FROM {{%{$tableName}}}
order by id
LIMIT {$offset},{$limit}
SQL;
return Yii::$app->db->createCommand($sql)->queryAll();
}
/**
* @param $tableName
*/
protected function getRowCount($tableName)
{
$sql = <<<SQL
SELECT COUNT(*) as total FROM {{%{$tableName}}}
SQL;
$data = Yii::$app->db->createCommand($sql)->queryOne();
$this->stdout("Found {$data['total']} records.");
return $data['total'];
}
/**
* Returns the columns names for a table
*
* @param string $db the database name
* @param string $tableName the table name
*
* @return mixed
*/
protected function getTableColumns($db, $tableName)
{
$sql = <<<SQL
select
COLUMN_NAME
from
information_schema.columns
where
table_schema = "{$db}" and table_name="{$tableName}"
order by
table_name,ordinal_position;
SQL;
return ArrayHelper::getColumn(
Yii::$app->db->createCommand($sql)->queryAll(),
'COLUMN_NAME'
);
}
/**
* Returns the date for the specified interval
* to backup default interval is 45 days.
*
* @return mixed
*/
protected function rangeOldPartition()
{
$date = new DateTime();
$date->sub(new DateInterval('P0M45D'));
return $date->format("Y-m-d");
}
/**
* Returns the database name after extracting the
* specific string from the Dsn property
*
* @param string $name the name of the property in dsn string.
* @param string $target the target connection of the database live|archive, default "live"
*
* @return mixed
*/
private static function _getDsnAttribute($name, $target = 'live')
{
if ($target === 'live') {
if (preg_match("/{$name}=([^;]*)/", Yii::$app->getDb()->dsn, $match)) {
return $match[1];
}
} else {
if (preg_match("/{$name}=([^;]*)/", Yii::$app->db_backup->dsn, $match)) {
return $match[1];
}
}
throw new Exception("Unable to extract the db Name");
}
}
2 Answers 2
As I see it, 99% of your code is spent in ALTER TABLE
-- both in adding partitioning in REMOVE PARTITIONING
, so my answer focuses on how to speed up those steps. The following may eliminate most of that 99%.
Adding partitioning to a non-partitioned table takes a long time because it must copy all the data over and reconstruct all the indexes. [In the following example, I am assuming that each partition holds on month's worth of data, and the March, 2022 partition is called p202203
.] Leave the table partitioned, then do only
ALTER TABLE ... EXCHANGE PARTITION ...` -- fast for removing a partition
ALTER TABLE ... REORGANIZE PARTITION future INTO
PARTITION p202203 VALUES LESS THAN (TO_DAYS('2022-04-01')),
PARTITION future VALUES LESS THAN MAXVALUE;
Do that right before 04/01, while future
is empty (or at least very small); it will take essentially no time.
More details and examples: Partitioning and Transportable tablespaces for 5.7+
For your situation, daily partitions and doing the REORGANIZE PARTITION
just before midnight is optimal to minimize shoveling data around. DROP PARTITION
just after midnight takes care of data for more than 45 days ago.
Please provide the main SELECTs
, I may have further advice on Indexing. (I worry, especially, about the low cardinality of transaction_type
.)
A minor change: Do the archiving nightly and have 48 'daily' partitions (see the first link for discussion).
-
\$\begingroup\$ Hmm the first thing you mentioned does make sense but i didnt knew about the reorganize partition part which can be helpful with an already partitioned table, if i am getting it right. Because that was the only reason for me to drop the partition every time and then create it again. I will be working on that part first, and follow up but i dont understand the part where you say Do that right before 04/01, are you referring to the actual date here? \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2022年02月15日 08:31:57 +00:00Commented Feb 15, 2022 at 8:31
-
\$\begingroup\$ About the
SELECTs
, which ones are you referring to here ? I am not following 100% \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2022年02月15日 08:33:01 +00:00Commented Feb 15, 2022 at 8:33 -
\$\begingroup\$ the partitioning and archiving both are run via cron at 12 mid night, but what do you mean when you say have 48 daily partitions ? I will follow up here after reading those links you provided. \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2022年02月15日 08:34:55 +00:00Commented Feb 15, 2022 at 8:34
-
\$\begingroup\$ @MuhammadOmerAslam - My example involved April. Build the April partition just before April 1. If you wait until April has started, the
REORGANIZATION
will take longer (because it copies all the data since start of April over). Oops, I mixed "monthly" and "daily". Each partition covering one "day" leads to a reasonable number of partitions. Running the script each night is good; not each "month" like my example. I'll clean it up. \$\endgroup\$Rick James– Rick James2022年02月15日 16:38:47 +00:00Commented Feb 15, 2022 at 16:38 -
\$\begingroup\$ the thing is i am using the above script to remove those logs that are older than last 35 days, meaning if today is 16 FEB then there wont be any logs before 2nd of Jan , why would i be creating a partition that targets a date that is in future ? or maybe i am not following correctly what you are implying here. \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2022年02月21日 09:24:51 +00:00Commented Feb 21, 2022 at 9:24
This review will focus on the PHP to generate the queries. Other reviewers may have suggestions about improving the partition process.
General feedback
The code is easy to read and mostly is in line with the recommendations in the PSR-12 style guide. The methods have good docblocks to describe the purpose and list parameters, return values and possible exceptions thrown.
Suggestions
Coding Style
the underscore to indicate private properties and methods is a convention from PHP 4 though PSR-12 advises against this practice:
4.3 Properties and Constants
Visibility MUST be declared on all properties.
Visibility MUST be declared on all constants if your project PHP minimum version supports constant visibilities (PHP 7.1 or later).
The
var
keyword MUST NOT be used to declare a property.There MUST NOT be more than one property declared per statement.
Property names MUST NOT be prefixed with a single underscore to indicate protected or private visibility. That is, an underscore prefix explicitly has no meaning.
4.4 Methods and Functions
Visibility MUST be declared on all methods.
Method names MUST NOT be prefixed with a single underscore to indicate protected or private visibility. That is, an underscore prefix explicitly has no meaning.
Also note it recommends declaring the visibility on constants when using PHP 7.1 or later.
Type declarations
Type declarations can be added for arguments and return values of methods, plus with PHP 7.4 they can be added to properties.
Method length
Most methods are quite short and concise, thoughpartitionNow()
is rather long- perhaps moving the code to get the queries to sub-methods would help. Also at the end are these lines:
catch (Exception $e) { throw $e; }
If there was code to handle the exception or at least log it somewhere then it would make sense to catch it but this makes the try/catch
seem useless.
In the function generateTableBackupName()
does this:
$backupTableAlias = 'bkp_' . date('Ymd') . '_'; return "{$backupTableAlias}{$tableName}";
There seems little point in creating a variable used only once - it would be simpler to merely have it be:
return 'bkp_' . date('Ymd') . '_' . $tableName;
Repetitive code
The method _getDsnAttribute()
could be simplified from the current code:
if ($target === 'live') { if (preg_match("/{$name}=([^;]*)/", Yii::$app->getDb()->dsn, $match)) { return $match[1]; } } else { if (preg_match("/{$name}=([^;]*)/", Yii::$app->db_backup->dsn, $match)) { return $match[1]; } } throw new Exception("Unable to extract the db Name");
The only thing that appears to be different is the second argument in the call to preg_match()
. That difference can be stored in a variable which allows the regular expression condition to only be listed once, in turn reducing indentation levels.
if ($target === 'live') {
$dsn = Yii::$app->getDb()->dsn;
} else {
$dsn = Yii::$app->db_backup->dsn;
}
if (preg_match("/{$name}=([^;]*)/", $dsn, $match)) {
return $match[1];
}
throw new Exception("Unable to extract the db Name");
And the assignment of $dsn
could be consolidated to a ternary unless that is too long for one line:
$dsn = $target === 'live' ? Yii::$app->getDb()->dsn : Yii::$app->db_backup->dsn;
docblock inaccuracy
The docblock for startPartition
states that it "@throws Exception
" however that doesn’t appear to be true since it catches Exception
s.
-
\$\begingroup\$ I would be updating the visibility section in the code . and the other deficiencies mentioned. If there was code to handle the exception or at least log it somewhere then it would make sense to catch it but this makes the try/catch seem useless. Well this class is called as a cronjob (as mentioned in the first line of the 2nd paragraph of my post) and that cron logs everything into a file so those exceptions are the part of the logs that are recorded. if you like to see a sample file that terminated due to exception i can provide that. \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2021年12月05日 12:53:09 +00:00Commented Dec 5, 2021 at 12:53
-
\$\begingroup\$ That docblock inaccuracy is due to the changed code recently i would be updating that too thanks for the highlighted sections \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2021年12月05日 12:53:30 +00:00Commented Dec 5, 2021 at 12:53
-
\$\begingroup\$ and that suggestion will be an error i think
return = 'bkp_' . date('Ymd') . '_' . $tableName;
syntax error, unexpected '=' ? or you were suggesting something likereturn 'bkp_' . date('Ymd') . '_' . $tableName;
? \$\endgroup\$Muhammad Omer Aslam– Muhammad Omer Aslam2021年12月05日 13:16:36 +00:00Commented Dec 5, 2021 at 13:16 -
\$\begingroup\$ Okay - yes I meant the return without assignment- I have updated that section \$\endgroup\$2021年12月05日 14:20:00 +00:00Commented Dec 5, 2021 at 14:20
partitionNow()
andgenerateBackupTableName()
\$\endgroup\$partitionNow()
is the first method listed in the code, and thegenerateBackupTableName()
is only generating the table name for the bakcup i have added it too \$\endgroup\$ALTER TABLE
is fast forADD PARTITION
, and may be OK for `REORGANIZE PARTITION if only one partition is involved. \$\endgroup\$