We are using MySql.
My situation
I have a large number of tables with millions of rows each. Those tables are updated every second and are used both for adding info and retrieving info. Each table can be 5GB or 10GB or even more.
I have one table that I keep sums of information (something like a summary table of the information I need) but this is starting to get big in size as well.
My limitations
at the moment I cannot change database due to various reasons (mainly no knowledge, time and budget)
all the extra power that we add to the server goes to other resources needed so I cannot run very heavy queries
Temporary ways I have thought for scaling
Having these things in mind I am trying to think of ways to scale with what I have:
For the tables with millions of rows I have thought to keep to separate databases (could make my life easier for backups / exports / changes). Keep my main data in 1 database and all peripherals (huge tables) to other databases. Let’s say have a different database for a different need.
For the problem with the table that I really need regularly and is growing fast I was thinking into splitting it into XX tables. Could be 1 table per user (which might be too much) or 1 table per XXX users.
Are these ideas totally crazy and really bad DB design?
If yes..... any suggestions other than changing everything at once?
2 Answers 2
PARTITIONing
does not save space. Each partition has 4M-7M of "free" space. That can add up. Partitioning can be used for removing data from the server (or moving to another server) "old" data. More discussion: http://mysql.rjweb.org/doc.php/partitionmaint
Nor is any ad hoc partitioning likely to help. That is splitting tables up, moving between databases (I'm talking about CREATE DATABASE
, not server) will not change performance, nor space. On the other hand, "easier for backups / exports" may be a valid reason. Please provide details so we can discuss.
"Sharding" is complex, but it is the way to split data across multiple servers. (Note: I am using "server" to mean a separate physical machine.)
A small fix is to shrink the datatypes in tables. Are you using BIGINT
(8 bytes) when MEDIUMINT
(3 bytes) would suffice? Etc, etc.
Summary tables are an excellent way to save space and speed up "reports". But perhaps they are not as small as they could be? Let's see SHOW CREATE TABLE
.
"growing fast I was thinking into splitting it into XX tables" -- NO, NO, NO! The only exceptions is when "sharding".
How do you populate the Summary Table(s)? Some tips: http://mysql.rjweb.org/doc.php/summarytables I assume your summary tables are updated either in realtime, or nightly? And you update them only incrementally, not by starting over.
You say 1 insert per second? That is a very low rate. Perhaps it is faster than that?
When the data is properly indexed and/or summarized, the size of the table does not matter (for performance). Are you concerned about disk space or speed?
I think the first step to solving your problems is acceptance that you cannot achieve something with nothing. :-)
With that tautology out of the way, it isn't at all obvious what difference it would make if you rearranged tables between different databases. Why do you think this would help?
Same for the table you are looking at splitting into one table per user - why would this help? Are you doing full table scans? Partitioning the table would probably be a cleaner approach, but it is difficult to give specific advice based on the limited information you provided. A common solution to full table scans is summary tables, which you mentioned you are already using.
-
Hi @gordan-bobic :) I am trying to avoid having tables with a) lots of data and b) big in size so that I won't need LOTS of extra server resources to run my queries fast. I want to buy time to focus build my product in other areas and not re-do the entire database at the moment. At some cases I do full table scans, but when that got a bit slow, I created cron jobs to sum data I need and place then at a different table. I will check out Partitioning tables - Verace also mentioned this in the comments above as well.Vicky Dallas– Vicky Dallas2020年05月09日 21:04:06 +00:00Commented May 9, 2020 at 21:04
-
Last but not least, I was thinking different tables just to keep things organized / it was just an idea. It could be in the same database but I want to be flexible in backups etc. I do not want to end up with a database of 50,100GB which will make my life harder at the moment. You already helped a lot just by asking me those questions.Vicky Dallas– Vicky Dallas2020年05月09日 21:04:59 +00:00Commented May 9, 2020 at 21:04
-
You mentioned keeping data small - I know I'm stating the obvious, but keeping the data small means keeping less data. That means pruning data that is no longer useful. Partitioning can make that much cheaper -
ALTER TABLE ... DROP PARTITION
is instantaneous.DELETE
, on the other hand, can be very slow.Gordan Bobić– Gordan Bobić2020年05月09日 21:09:03 +00:00Commented May 9, 2020 at 21:09 -
Yes, I am sorry @gordan-bobic. By saying keep data small I mean: If I have 1 table which has 10M rows VS having a table with 1M rows, wouldn't it be easier for the queries to run faster without having to add extra resources? At the moment all the data I have is useful cannot remove them. Whatever could be removed is already out. I will read more about partitions, because maybe this will help me. I didn't know about partitions. Maybe it is a good solution for meVicky Dallas– Vicky Dallas2020年05月09日 21:10:03 +00:00Commented May 9, 2020 at 21:10
-
1@VickyDallas when you will read about partitioning just remember that this is not a "performance" feature )NikitaSerbskiy– NikitaSerbskiy2020年05月10日 09:06:30 +00:00Commented May 10, 2020 at 9:06
Explore related questions
See similar questions with these tags.
updated every second and are used both for adding info and retrieving info.
- when retrieving info, is this an OLTP (single record by single record) scenario or are you running reports (selecting over large ranges of data) at the same time as your updates?