What would be the fastest operation of checksum table
, which returns the same value in MySQL and in Postgres if there is the same dataset in tables?
It could be a function, or just an equivalent of MySQL's checksum in Postgres.
Looking for a way to verify consistency by additional check, after migration from MySQL to Postgres.
1 Answer 1
Moving house at the moment so can't actually take time to give a proper answer - but what I would do is use a MySQL FDW (*) (Foreign Data Wrapper) to do something like this using hashes - MD5 in this case:
(*) EnterpriseDB is the biggest commercial company in the PostgreSQL world and this wrapper from them is F/LOSS
There are three answers below - one complex and two simpler ones - I was messing around with it and my curiosity got the better of me - but I learnt a lot, so +1! :-)
The answers here and here (both from PostgreSQL heavy hitters) led me to the simpler approaches - see below.
Complex answer:
Sample data and MD5 hash:
See PostgreSQL fiddle here.
SELECT
x, y, z, z::INT AS z_int,
MD5(z::INT::CHAR),
MD5(x::TEXT) || MD5(y) || MD5(z::INT::TEXT) AS md5_z
FROM
pg_test;
Result:
x y z z_int md5 ?column?
34343 table_rec_1 t 1 c4ca4238a0b923820dcc509a6f75849b aef5a7530aaa272bbaec34e49251b25a7081960ce69ce1909d11a4602c3d1b64c4ca4238a0b923820dcc509a6f75849b
5445 table_rec_2 f 0 cfcd208495d565ef66e7dff9f98764da bdc363788b2b48c031bf406cf15aa25234b63ad71c9faa41d18d4b7ec3f41f24cfcd208495d565ef66e7dff9f98764da
With the FDW, you would run this code from your PostgreSQL server, but I can't install one on dbfiddle<>.
MySQL (fiddle):
SELECT
x, y, z, CAST(z AS SIGNED INTEGER) AS z_int,
CONCAT
(
CAST(MD5(x) AS CHAR), CAST(MD5(y) AS CHAR),
MD5(CASE
WHEN z <> 0 THEN CAST(1 AS SIGNED INTEGER)
ELSE CAST(0 AS SIGNED INTEGER)
END)
) AS md5_z
FROM
mysql_test;
Result:
x y z z_int md5_z
34343 table_rec_1 1 1 aef5a7530aaa272bbaec34e49251b25a7081960ce69ce1909d11a4602c3d1b64c4ca4238a0b923820dcc509a6f75849b
5445 table_rec_2 0 0 bdc363788b2b48c031bf406cf15aa25234b63ad71c9faa41d18d4b7ec3f41f24cfcd208495d565ef66e7dff9f98764da
You may even want to use the LEFT function (PostgreSQL manual) to reduce the size of the MD5 concatenated strings. MySQL's syntax is identical, except that it doesn't accept a negative second parameter which doesn't apply in this case!
Simple answer 1:
Still involves an FDW.
The fiddle from one of the heavy hitters referred to earlier we have this gem (be sure to read the whole post and also the accepted answer for a different approach - which might be useful under different circumstances):
You can circumvent the entire process above by using EXCEPT
.
SELECT
(TABLE pg_md5_check EXCEPT TABLE mysql_md5_check)
UNION ALL
(TABLE mysql_md5_check EXCEPT TABLE pg_md5_check);
Result:
pg_md5
null
Or, the elegant:
SELECT
CASE
WHEN EXISTS (TABLE pg_md5_check EXCEPT TABLE mysql_md5_check)
OR EXISTS (TABLE mysql_md5_check EXCEPT TABLE pg_md5_check)
THEN 'different'
ELSE 'same'
END AS result;
Result:
result
same
TABLE
in this context is just PostgreSQL shorthand for SELECT * FROM my_table;
- non-standard I believe and confusing (at least when I first encountered it!), but produces nice tidy queries betimes.
Simple answer 2:
Still involves FDW.
From here (another PostgreSQL guru) we have the undocumented HASH_RECORD_EXTENDED function (CAUTION
- C code...) - see fiddle here:
hash_record_extended(record, bigint)
So, we do the following (adapted from fiddle above):
SELECT
CASE
WHEN (SELECT SUM(HASH_RECORD_EXTENDED(t, 0)) FROM pg_md5_check t ) !=
(SELECT SUM(HASH_RECORD_EXTENDED(t, 0)) FROM mysql_md5_check t)
THEN 'different'
ELSE
'same'
END AS is_different;
Result:
is_different
same
Be careful of NULL
s - judicious use of the COALESCE
(PostgreSQL manual) function might be called for - e.g.
SELECT field_1, field_2, COALESCE(possibly_NULL_field, 0), field...)
I had to jump through some hoops to get the answer to work for the MySQL BOOLEAN
type - from here, I found out that the Boolean variable had to be CAST
as a SIGNED INTEGER
and not just INTEGER
(much weeping and gnashing of teeth - not to mention swearing!) ensued before I found this MySQL idiosyncrasy (one of many that I have stumbled across down the years... personal opinion - go with PostgreSQL whenever possible!).
As mentioned above, this is untested, so you may find other issues, but I hope this answer sets you on the road to solving your problem! No performance testing was possible as of time of writing 2024年12月01日 10:15.
There are also interesting solutions discussed here and here here and here. I hope to take a further look when I have time (moving house, as mentioned above).