0

i'm trying to run this simple query:

insert into diff_set_values(value)
select value 
from diff_sets
group by value;

but after it is finished, there is no data in the destination table diff_set_values. Unfortunately, there's so much data in the source table diff_sets, that the query runs a little more than an hour. Too long for MySQL Workbench to keep the connection to the MySQL server. It says "Error Code: 2013. Lost connection to MySQL server during query" after 30 seconds, but doesn't tell me if any other error occured.

information about the tables and server

The source table diff_sets contains about 186 million records with 4 columns (ID1 int, ID2 int, diffType int, value varchar(64)). Primary Key: ID1, ID2, diffType. The column value also has an index idx_diff_sets_value.

Example data:

ID1 ID2 diffType value
25151 24978 15 13ddfb5fa96ffebffbdfd3ff7dd7cabd7
66675 66670 14 13ddfb5fbb5cdefe7bffffbd359775ffe
66701 66675 18 13ddfb5fbb5cdefe7bffffbd359775ffe
25789 25750 14 13ddfb5fbf7fdeff7fabff3edbfcefadf
25789 25778 17 13ddfb5fbf7fdeff7fabff3edbfcefadf

The destination table diff_set_values is empty and contains a column value varchar(64) and 6 other int columns that will be filled with null. Those columns will be updated later.

I'm running MySQL Community Server 8.0.16 on my Notebook. I'm accessing it with MySQL Workbench 8.0 and I have exclusive access to the database.

What I tried

I've created a stored procedure and inserted all error handlers I could think of.

CREATE DEFINER=`root`@`localhost` PROCEDURE `fillInterDiffSets`()
BEGIN
 declare continue handler for sqlexception
 begin
 get diagnostics condition 1 @p1 = returned_sqlstate, @p2 = message_text;
 insert into sp_error(error_code, error_description, error_time)
 select @p1, @p2, now();
 end;
 
 declare continue handler for sqlwarning
 begin
 get diagnostics condition 1 @p1 = returned_sqlstate, @p2 = message_text;
 insert into sp_error(error_code, error_description, error_time)
 select @p1, @p2, now();
 end;
 declare continue handler for not found
 begin
 get diagnostics condition 1 @p1 = returned_sqlstate, @p2 = message_text;
 insert into sp_error(error_code, error_description, error_time)
 select @p1, @p2, now();
 end;
 
 insert into diff_set_values(value)
 select value 
 from diff_sets
 group by value;
END

When I run that procedure, I can watch the size of the diff_set_values.ibd file increasing. After a bit more than an hour the increasing stops at about 5.2 GiB. When I run show processlist, the process is still active for a few minutes.

When the process disappears from the show processlist, there are no records in the diff_set_values table and yet no error messages in the sp_errors table.

I tried, if the error handlers work by removing the group by part of the query, which immediately results in a "duplicate key" message in the sp errors table.

I tried to use select distinct value from diff_sets instead of select value from diff_sets group by value. Doesn't help.

I tried select sql_big_result distinct value from diff_sets. The only difference is that the file size of the ibd file doesn't increase and show processlist only tells me "creating sort index", but eventually disappears from the processlist.

I limited the query by appending limit 0, 1000000 to it. That works as expected.

A few months ago, the diff_sets table was smaller and the query worked properly.

I'm running out of ideas what to do. Is there a way to see all errors that occured on the server? Do you have any idea how to solve the problem?

asked Jul 15, 2020 at 10:06
5
  • 1
    One possible cause is that the tx runs out of log space. What is the value of: innodb_log_file_size ? Commented Jul 15, 2020 at 10:19
  • That's 50331648. The innodb_log_buffer_size is 16777216. Commented Jul 15, 2020 at 10:23
  • 1
    Looks like non-commited transaction. Destination table is empty initially - monitor table's file size during insert.. Commented Jul 15, 2020 at 19:47
  • 1
    Too long for MySQL Workbench to keep the connection to the MySQL server. It says "Error Code: 2013. Lost connection to MySQL server during query" after 30 seconds, but doesn't tell me if any other error occured - lost connection means that you will not receive messages occured after, both success and error ones. Try to insert using CLI and look for messages. PS. Use DISTINCT instead GROUP BY. Commented Jul 15, 2020 at 19:50
  • I managed to solve the problem. I deleted the idx_diff_sets_value index. The file size of that table shrunk from about 32 GiB to 13 GiB. After that, the query ran as expected. I think it was some kind of memory issue. Thanks for your help anyway. Commented Jul 16, 2020 at 11:26

1 Answer 1

1

This may run faster:

 INSERT INTO diff_set_values(value)
 select DISTINCT value 
 from diff_sets;

Either would run faster if diff_set_values had INDEX(value). However, adding the index would take a long time.

Increase the timeout.

answered Jul 16, 2020 at 1:20
2
  • I already tried distinct instead of group by. I think I had some memory issues. After deleting the index in the source table, the file size shrunk by 20 GiB and I was able to run the query. Thank you for your help. Commented Jul 16, 2020 at 11:28
  • @Dorian - That could indicate a different problem -- and the slowness will return after you add a bunch of data. What is the value of innodb_buffer_pool_size? Commented Jul 16, 2020 at 16:43

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.