I have table containing 3.2TB data in partition named default_partition, & all writes/read goes to this partition, as we have defined partitions on table like below :
PARTITION BY RANGE (to_days(created_at))
PARTITION 2024_03_19 VALUES LESS THAN (739330) ENGINE = InnoDB,
PARTITION default_partition VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
We want to remove data older than 7 days from default_partition and want to save new data coming into separate daily partitions based on created_at column, like "2024_06_06", "2024_06_07", "2024_06_08".. etc. The purpose of this is we only want to keep last 7 days data, & we will keep dropping partitions older than 7 days via application scheduler if they exist.
But it seems to even achieve that, I have to re-organize default_partition using:
Query 1: ALTER TABLE mytable REORGANIZE PARTITION default_partition INTO (
PARTITION past VALUES LESS THAN (TO_DAYS('2022-04-05')),
PARTITION default_partition VALUES LESS THAN MAXVALUE
)
Once above query get executed, default_partition should become empty now, then running everyday scheduler we can keep creating future partitions using example queries given below, so that all new data for 6th/7th/8th...etc will go in their own partitions.
ALTER TABLE mytable REORGANIZE PARTITION default_partition INTO (
PARTITION '2022-04-05' VALUES LESS THAN (TO_DAYS('2022-04-06')),
PARTITION default_partition VALUES LESS THAN MAXVALUE
)
ALTER TABLE mytable REORGANIZE PARTITION default_partition INTO (
PARTITION '2022-04-06' VALUES LESS THAN (TO_DAYS('2022-04-07')),
PARTITION default_partition VALUES LESS THAN MAXVALUE
)... etc.
Issue :-
Running query-1 above will copy 3.2TB of data to partition: '2022-04-05', which I am not sure if can be done in production without any downtime.
Is there a way I can just create future partitions like '2024-04-07', '2024-04-08', etc without touching data in default_partition for now & future days data will get saved into these partitions & once we have last 7 days data in these partitions, we will just delete the old data in default_partition by taking some downtime.
Or if I should create a new table & start read/write on that table for next 7 days then delete old table completely, this could save me from any downtime, but this requires code changes.
What would be best way to remove data older than 7 days in default_partition with zero OR no downtime at all ?
1 Answer 1
Plan A There is essentially no way to avoid a full copy of the table.
Plan B On the other hand, pt-online-schema-change
may be able to handle such (See percona.com). Or gh-ost
.
Plan C It seems that PARTITION 2024_03_19
is huge, but PARTITION default_partition
may be manageable for the time being.
For a "time series" it is wise to always have an empty "future" partition. And just before the next day, build a new partition via REORGANIZE PARTITION future INTO 2024-..., future". That way, it is copying over zero rows, hence having no impact on the table. Since
default` is already kinda big, it is too late to avoid some pain. But the pain need be only one shot of reorganizing that PARTITION:
REORGANIZE PARTITION default_partition
INTO some_of_2024, VALUES LESS THAN (TO_DAYS('2024-06-06')),
future VALUES LESS THAN MAXVALUE
(Do verify the syntax.) Note: Adjust that date to be after the day you run it. After that, the daily REORGANIZE
will be very fast every night/week/whatever.
How many days will you keep in the table? If more than, say 50, I recommend instead to partition by weeks or months.
Here's another thought on how to gradually add partitioning:
After getting that established, set up another process to continually chip away at the 3.2TB. I recommend walking through the PRIMARY KEY
1K rows at a time. See link below. Perhaps also be sure to tackle only that big partition.
Another issue. Usually all the partitions need to be changed when adding or removing partitioning.
More info:
Plan D
If you are tossing all but 7 day's worth of data and that id only some number of GB, then the best way is to start a new table with partitioning in place.
CREATE TABLE just_7_days ...; -- with new partitioning
-- stop writing to table
INSERT INTO just_7_days
SELECT * FROM the_table
WHERE ... > NOW() - INTERVAL 7 DAY;
RENAME TABLE the_table TO old,
just_7_days TO the_table;
-- allow writing
DROP TABLE old;
Assuming that much less than 3.2TB exists in the last 7 days, the downtime is only during the INSERT..SELECT..
The RENAME
is fast.
If you keep only 7 days' worth of data, then keep only 10 partitions in the table, as discussed in my link.
-
Actually PARTITION 2024_03_19 is of 60 GB & default partition is 3.2TB & growing, we should be able to drop 2024_03_19, but first we want to reorganize default partition & make it empty, then follow strategy of keep creating future daily empty partitions & adding coming data there, then instead of deleting 1K rows at a time, we would want to drop reorganized partition of 3.2TB in 1 go using drop partition query. Would that be recommended/possible without table lock/downtime ? I read on mysql docs that drop partitioning can be done without table lock and blocking read/write queries.tusharRawat– tusharRawat2024年06月06日 18:32:05 +00:00Commented Jun 6, 2024 at 18:32
-
DROP PARTITION
-- Dropping one partition is almost instantaneous; the OS may take a little bit of time carrying out the garbage.Rick James– Rick James2024年06月06日 20:40:03 +00:00Commented Jun 6, 2024 at 20:40 -
And Is there a way we can measure/estimate downtime impact of re-organize query on default_partition of size 3.2TB data, so that we will be sure that downtime won't effect the large userbase on low traffic hours.tusharRawat– tusharRawat2024年06月07日 08:27:26 +00:00Commented Jun 7, 2024 at 8:27
-
@tusharRawat - disk speed, size, etc. It may take days for 3.2TB. Is there really that much data in a partition since 2024_03_19? See my addition.Rick James– Rick James2024年06月07日 14:06:17 +00:00Commented Jun 7, 2024 at 14:06
-
Yes, that's the size of default_partition - 3.2TB.tusharRawat– tusharRawat2024年06月15日 11:38:05 +00:00Commented Jun 15, 2024 at 11:38
Explore related questions
See similar questions with these tags.