1

What would be the fastest operation of checksum table, which returns the same value in MySQL and in Postgres if there is the same dataset in tables?

It could be a function, or just an equivalent of MySQL's checksum in Postgres.

Looking for a way to verify consistency by additional check, after migration from MySQL to Postgres.

Erwin Brandstetter
186k28 gold badges463 silver badges636 bronze badges
asked Nov 29, 2024 at 19:19
0

1 Answer 1

4

Moving house at the moment so can't actually take time to give a proper answer - but what I would do is use a MySQL FDW (*) (Foreign Data Wrapper) to do something like this using hashes - MD5 in this case:

(*) EnterpriseDB is the biggest commercial company in the PostgreSQL world and this wrapper from them is F/LOSS

There are three answers below - one complex and two simpler ones - I was messing around with it and my curiosity got the better of me - but I learnt a lot, so +1! :-)

The answers here and here (both from PostgreSQL heavy hitters) led me to the simpler approaches - see below.

Complex answer:

Sample data and MD5 hash:

See PostgreSQL fiddle here.

SELECT
 x, y, z, z::INT AS z_int,
 MD5(z::INT::CHAR),
 MD5(x::TEXT) || MD5(y) || MD5(z::INT::TEXT) AS md5_z
FROM
 pg_test;

Result:

x y z z_int md5 ?column?
34343 table_rec_1 t 1 c4ca4238a0b923820dcc509a6f75849b aef5a7530aaa272bbaec34e49251b25a7081960ce69ce1909d11a4602c3d1b64c4ca4238a0b923820dcc509a6f75849b
5445 table_rec_2 f 0 cfcd208495d565ef66e7dff9f98764da bdc363788b2b48c031bf406cf15aa25234b63ad71c9faa41d18d4b7ec3f41f24cfcd208495d565ef66e7dff9f98764da

With the FDW, you would run this code from your PostgreSQL server, but I can't install one on dbfiddle<>.

MySQL (fiddle):

SELECT
 x, y, z, CAST(z AS SIGNED INTEGER) AS z_int,
 CONCAT
 (
 CAST(MD5(x) AS CHAR), CAST(MD5(y) AS CHAR), 
 MD5(CASE 
 WHEN z <> 0 THEN CAST(1 AS SIGNED INTEGER)
 ELSE CAST(0 AS SIGNED INTEGER)
 END) 
 ) AS md5_z
FROM
 mysql_test;

Result:

x y z z_int md5_z
34343 table_rec_1 1 1 aef5a7530aaa272bbaec34e49251b25a7081960ce69ce1909d11a4602c3d1b64c4ca4238a0b923820dcc509a6f75849b
5445 table_rec_2 0 0 bdc363788b2b48c031bf406cf15aa25234b63ad71c9faa41d18d4b7ec3f41f24cfcd208495d565ef66e7dff9f98764da

You may even want to use the LEFT function (PostgreSQL manual) to reduce the size of the MD5 concatenated strings. MySQL's syntax is identical, except that it doesn't accept a negative second parameter which doesn't apply in this case!

Simple answer 1:

Still involves an FDW.

The fiddle from one of the heavy hitters referred to earlier we have this gem (be sure to read the whole post and also the accepted answer for a different approach - which might be useful under different circumstances):

You can circumvent the entire process above by using EXCEPT.

SELECT
 (TABLE pg_md5_check EXCEPT TABLE mysql_md5_check)
 UNION ALL
 (TABLE mysql_md5_check EXCEPT TABLE pg_md5_check);

Result:

pg_md5
null

Or, the elegant:

SELECT 
 CASE 
 WHEN EXISTS (TABLE pg_md5_check EXCEPT TABLE mysql_md5_check)
 OR EXISTS (TABLE mysql_md5_check EXCEPT TABLE pg_md5_check)
 THEN 'different'
 ELSE 'same'
 END AS result;

Result:

result
same

TABLE in this context is just PostgreSQL shorthand for SELECT * FROM my_table; - non-standard I believe and confusing (at least when I first encountered it!), but produces nice tidy queries betimes.

Simple answer 2:

Still involves FDW.

From here (another PostgreSQL guru) we have the undocumented HASH_RECORD_EXTENDED function (CAUTION - C code...) - see fiddle here:

hash_record_extended(record, bigint)

So, we do the following (adapted from fiddle above):

SELECT
 CASE 
 WHEN (SELECT SUM(HASH_RECORD_EXTENDED(t, 0)) FROM pg_md5_check t ) !=
 (SELECT SUM(HASH_RECORD_EXTENDED(t, 0)) FROM mysql_md5_check t)
 THEN 'different'
 ELSE 
 'same'
 END AS is_different;

Result:

is_different
same

Be careful of NULLs - judicious use of the COALESCE (PostgreSQL manual) function might be called for - e.g.

SELECT field_1, field_2, COALESCE(possibly_NULL_field, 0), field...)

I had to jump through some hoops to get the answer to work for the MySQL BOOLEAN type - from here, I found out that the Boolean variable had to be CAST as a SIGNED INTEGER and not just INTEGER (much weeping and gnashing of teeth - not to mention swearing!) ensued before I found this MySQL idiosyncrasy (one of many that I have stumbled across down the years... personal opinion - go with PostgreSQL whenever possible!).

As mentioned above, this is untested, so you may find other issues, but I hope this answer sets you on the road to solving your problem! No performance testing was possible as of time of writing 2024年12月01日 10:15.

There are also interesting solutions discussed here and here here and here. I hope to take a further look when I have time (moving house, as mentioned above).

answered Dec 1, 2024 at 10:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.