0

I have a huge database(16GB) to which I wish to add a new column based on an existing column.

The existing column looks like this:-

MariaDB [demo]> SELECT my_mon FROM mytable;
+----------+
| my_mon |
+----------+
| Jan 2020 |
| Feb 2020 |
| Mar 2020 |
| Apr 2020 |
| May 2020 |
| Jun 2020 |
| Jul 2020 |
| Aug 2020 |
| Sep 2020 |
| Oct 2020 |
| Nov 2020 |
| Dec 2020 |
| Jan 2021 |
| Feb 2021 |
| Mar 2021 |
| Apr 2021 |
| May 2021 |
| Jun 2021 |
| Jul 2021 |
| Aug 2021 |
| Sep 2021 |
| Oct 2021 |
| Nov 2021 |
| Dec 2021 |
+----------+
24 rows in set (0.015 sec)
MariaDB [demo]> 

I wish to create a new column by doing :

MariaDB [demo]> alter table mytable add column my_wave char(8);
MariaDB [demo]> update mytable set my_wave = concat("W",ceiling(month(str_to_date(substr(my_mon,1,3),"%b"))/4)," ",substr(my_mon,5,8));

This new column looks like this:


MariaDB [demo]> select * from mytable;
+----+----------+---------+
| id | my_mon | my_wave |
+----+----------+---------+
| 1 | Jan 2020 | W1 2020 |
| 2 | Feb 2020 | W1 2020 |
| 3 | Mar 2020 | W1 2020 |
| 4 | Apr 2020 | W1 2020 |
| 5 | May 2020 | W2 2020 |
| 6 | Jun 2020 | W2 2020 |
| 7 | Jul 2020 | W2 2020 |
| 8 | Aug 2020 | W2 2020 |
| 9 | Sep 2020 | W3 2020 |
| 10 | Oct 2020 | W3 2020 |
| 11 | Nov 2020 | W3 2020 |
| 12 | Dec 2020 | W3 2020 |
| 13 | Jan 2021 | W1 2021 |
| 14 | Feb 2021 | W1 2021 |
| 15 | Mar 2021 | W1 2021 |
| 16 | Apr 2021 | W1 2021 |
| 17 | May 2021 | W2 2021 |
| 18 | Jun 2021 | W2 2021 |
| 19 | Jul 2021 | W2 2021 |
| 20 | Aug 2021 | W2 2021 |
| 21 | Sep 2021 | W3 2021 |
| 22 | Oct 2021 | W3 2021 |
| 23 | Nov 2021 | W3 2021 |
| 24 | Dec 2021 | W3 2021 |
+----+----------+---------+
24 rows in set (0.000 sec)
MariaDB [demo]> 

My query is: Will increasing the innodb_buffer_pool_size make this and similar queries run faster ?

I have a total of 8GB of RAM.

My current (default setting I think) is :

| innodb_buffer_pool_size | 134217728 |

which is 128MB.

I would like to clarify my query.

The innodb_buffer_pool_size is a CACHE. A big value for this will ensure that multiple reads especially repeated requests for the same data will be fast.

In my case I am doing an update(that is why I described my update at the beginning of this question), which means I need to read one column and write once to another column.

Will the innodb_buffer_pool_size be relevant to the update?

Does updating the table happen "chunk" by "chunk" of rows, or does it load the table into memory and then update it ?

Perhaps, I would like to see the code/algorithm for the UPDATE.

Can someone clarify if and why this setting would affect this update query?

Note: I am trying to modify the innodb_buffer_pool_size to see if it affects the update time as suggested by Ergest Basha.

My original server was LIVE DEVELOPMENT server so I am trying to run this experiment on my laptop. I created a 3GB dataset. My innodb_buffer_pool_size on this computer is also the same. I have not yet modified it.

MariaDB [demo]> show variables like "%innodb_buffer_pool_size%";
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
1 row in set (0.002 sec)
MariaDB [demo]> LOAD DATA INFILE './mydata.csv' INTO TABLE mytable FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS; 
Query OK, 132000000 rows affected (56 min 11.379 sec)
Records: 132000000 Deleted: 0 Skipped: 0 Warnings: 0
MariaDB [demo]> ALTER TABLE mytable ADD COLUMN my_wave CHAR(7);
Query OK, 0 rows affected (1.650 sec)
Records: 0 Duplicates: 0 Warnings: 0
MariaDB [demo]> UPDATE mytable SET my_wave=concat("W",ceiling(month(str_to_date(substr(my_mon,1,3),"%b"))/4)," ",substr(my_mon,5,8));
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [demo]> UPDATE mytable SET my_wave=concat("W",ceiling(month(str_to_date(substr(my_mon,1,3),"%b"))/4)," ",substr(my_mon,5,8));
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 36
Current database: demo
ERROR 2013 (HY000): Lost connection to MySQL server during query
$ systemctl status mysql
くろまる mariadb.service - MariaDB 10.3.31 database server
 Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
 Active: active (running) since Sun 2022年02月06日 14:56:15 IST; 12s ago
 Docs: man:mysqld(8)
 https://mariadb.com/kb/en/library/systemd/
 Process: 28797 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
 Process: 28798 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Process: 28800 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] 
 Process: 28970 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
 Process: 28972 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
 Main PID: 28848 (mysqld)
 Status: "To roll back: 1 transactions, 11705889 rows"
 Tasks: 32 (limit: 4915)
 Memory: 440.3M
 CGroup: /system.slice/mariadb.service
 └─28848 /usr/sbin/mysqld

Restarted the server but the query still does not go through.

MariaDB [demo]> UPDATE mytable SET my_wave=concat("W",ceiling(month(str_to_date(substr(my_mon,1,3),"%b"))/4)," ",substr(my_mon,5,8));
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [demo]> 

Hi again,

When I set the innodb_buffer_pool_size=5G

and try to run the UPDATE to create a new column, it froze my laptop.

I restarted my computer and now my mysql refuses to start.

$ sudo mysql
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
asked Feb 4, 2022 at 7:44
2
  • @ErgestBasha please see my edit. The links you have posted talk about innodb_buffer_pool_size in general. My query is about this variable affecting this particular query. Commented Feb 4, 2022 at 12:55
  • 1
    You could add it as a virtual column. The buffer pool also stores the MVCC history and provides a breathing room for the purging to disk of changes. In general, recommend storing a date in a date format and handle the representation changes in the application. Commented Feb 5, 2022 at 6:01

1 Answer 1

1

Possibly.

128M is an old, too-small, default value for that setting. Change to about 70% of available memory, which might be:

innodb_buffer_pool_size = 5G

Because it is a "cache", a bigger value will cut down on I/O, which is a big part of any big query on a big table.

UPDATEing a 16GB table will be quite slow, but for another reason. The system will keep an old copy of all the rows in case you crash. (That way it can recover the data back to consistent state.)

That is, the UPDATE will take a long time, regardless of the cache size.

All actions on all tables do use the buffer_pool.

UPDATE will not chunk by itself. You can do the chunking. See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . (It is written in a DELETE context, but the changes for UPDATE should be easy.)

Another approach to consider: Build a new table with the old column(s) plus the new column(s), such as

CREATE TABLE newtable (
 id ...,
 my_mon ...,
 my_wave ...,
 PRIMARY KEY(id)
) ENGINE=InnoDB
 SELECT id, my_mon, concat(...) AS my_wave;
answered Feb 5, 2022 at 16:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.