i'm trying to run this simple query:
insert into diff_set_values(value)
select value
from diff_sets
group by value;
but after it is finished, there is no data in the destination table diff_set_values
. Unfortunately, there's so much data in the source table diff_sets
, that the query runs a little more than an hour. Too long for MySQL Workbench to keep the connection to the MySQL server. It says "Error Code: 2013. Lost connection to MySQL server during query" after 30 seconds, but doesn't tell me if any other error occured.
information about the tables and server
The source table diff_sets
contains about 186 million records with 4 columns (ID1 int
, ID2 int
, diffType int
, value varchar(64)
). Primary Key: ID1, ID2, diffType. The column value
also has an index idx_diff_sets_value
.
Example data:
ID1 ID2 diffType value
25151 24978 15 13ddfb5fa96ffebffbdfd3ff7dd7cabd7
66675 66670 14 13ddfb5fbb5cdefe7bffffbd359775ffe
66701 66675 18 13ddfb5fbb5cdefe7bffffbd359775ffe
25789 25750 14 13ddfb5fbf7fdeff7fabff3edbfcefadf
25789 25778 17 13ddfb5fbf7fdeff7fabff3edbfcefadf
The destination table diff_set_values
is empty and contains a column value varchar(64)
and 6 other int columns that will be filled with null. Those columns will be updated later.
I'm running MySQL Community Server 8.0.16 on my Notebook. I'm accessing it with MySQL Workbench 8.0 and I have exclusive access to the database.
What I tried
I've created a stored procedure and inserted all error handlers I could think of.
CREATE DEFINER=`root`@`localhost` PROCEDURE `fillInterDiffSets`()
BEGIN
declare continue handler for sqlexception
begin
get diagnostics condition 1 @p1 = returned_sqlstate, @p2 = message_text;
insert into sp_error(error_code, error_description, error_time)
select @p1, @p2, now();
end;
declare continue handler for sqlwarning
begin
get diagnostics condition 1 @p1 = returned_sqlstate, @p2 = message_text;
insert into sp_error(error_code, error_description, error_time)
select @p1, @p2, now();
end;
declare continue handler for not found
begin
get diagnostics condition 1 @p1 = returned_sqlstate, @p2 = message_text;
insert into sp_error(error_code, error_description, error_time)
select @p1, @p2, now();
end;
insert into diff_set_values(value)
select value
from diff_sets
group by value;
END
When I run that procedure, I can watch the size of the diff_set_values.ibd file increasing. After a bit more than an hour the increasing stops at about 5.2 GiB. When I run show processlist
, the process is still active for a few minutes.
When the process disappears from the show processlist
, there are no records in the diff_set_values
table and yet no error messages in the sp_errors
table.
I tried, if the error handlers work by removing the group by
part of the query, which immediately results in a "duplicate key" message in the sp errors
table.
I tried to use select distinct value from diff_sets
instead of select value from diff_sets group by value
. Doesn't help.
I tried select sql_big_result distinct value from diff_sets
. The only difference is that the file size of the ibd file doesn't increase and show processlist
only tells me "creating sort index", but eventually disappears from the processlist.
I limited the query by appending limit 0, 1000000
to it. That works as expected.
A few months ago, the diff_sets
table was smaller and the query worked properly.
I'm running out of ideas what to do. Is there a way to see all errors that occured on the server? Do you have any idea how to solve the problem?
1 Answer 1
This may run faster:
INSERT INTO diff_set_values(value)
select DISTINCT value
from diff_sets;
Either would run faster if diff_set_values
had INDEX(value)
. However, adding the index would take a long time.
Increase the timeout.
-
I already tried
distinct
instead ofgroup by
. I think I had some memory issues. After deleting the index in the source table, the file size shrunk by 20 GiB and I was able to run the query. Thank you for your help.Dorian– Dorian2020年07月16日 11:28:51 +00:00Commented Jul 16, 2020 at 11:28 -
@Dorian - That could indicate a different problem -- and the slowness will return after you add a bunch of data. What is the value of
innodb_buffer_pool_size
?Rick James– Rick James2020年07月16日 16:43:59 +00:00Commented Jul 16, 2020 at 16:43
idx_diff_sets_value
index. The file size of that table shrunk from about 32 GiB to 13 GiB. After that, the query ran as expected. I think it was some kind of memory issue. Thanks for your help anyway.