From PostgreSQL docs:
jsonb
does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.
I have a jsonb
column that contains some data. I extract specific key data (which is also json) and hash it using sha256, something like:
SELECT sha256(to_jsonb(rule_element) :: TEXT :: BYTEA);
Where rule_element
was extracted from the original data. Given that it does not maintain order of keys and the original column of jsonb
may be updated in the future, I do not think it is safe to compute hashes as they might change.
Can I compute consistent hashes for jsonb
columns? (99% Sure you can't, 1% hope to make my work easier)
The hashes would be used for unique identification purposes, basically acting as a unique key for different types of JSON compositions.
1 Answer 1
There is no official guarantee that the implementation (and the text representation) of jsonb
won't change, so I wouldn't depend on the stability of such hashes if I am dealing with data that will be used 50 years from now.
However, if the on-disk representation of jsonb
changed, that would be a release that breaks upgrade with pg_upgrade
, so it is not likely to happen anytime soon. The string representation could of course change, but I cannot think of a reason to change that unless the on-disk representation of jsonb
changes.
If you want to use the hashes in a unique constraint you could create a unique constraint on jsonb_hash_extended(jsoncol, 0)
. As long as you only store the hashes in a ways that wouldn't break if the hash values change during a dump/restore. I would feel unsafe about using those hashes as immutable identifiers outside the database.
But perhaps I am too worried: if the result of jsonb_hash_extended()
would change in a different PostgreSQL version, then hash partitioning in jsonb
would break during an upgrade. So perhaps we can never change that anyway.
-
Please also see stackoverflow.com/q/77830609/15412365. The hashes would be used for unique identification purposes, basically acting as a unique key for different types of JSON compositions.VIAGC– VIAGC2024年01月17日 07:42:54 +00:00Commented Jan 17, 2024 at 7:42
-
Interesting, I did not know about
jsonb_hash
andjsonb_hash_extended
. Why can't we justjsonb_hash
to get a unique hash value? As long as the actual JSON injsonb
doesn't change, anyhow it may be represented, it should give a consistent unique value, right?VIAGC– VIAGC2024年01月17日 10:57:57 +00:00Commented Jan 17, 2024 at 10:57 -
> As long as you only store the hashes in a way that wouldn't break if the hash values change during a dump/restore. Why would the hash values change during dump/restore? I was thinking to get consistent hash output for logically same JSON, was something like sorting all the keys and then finding the hash value.
jsonb
-> sort -> sorted and consistent -> hash.VIAGC– VIAGC2024年01月17日 11:00:54 +00:00Commented Jan 17, 2024 at 11:00 -
I am not understanding the complete picture and your concerns over here, can you please elaborate? Really appreciate you taking out time.VIAGC– VIAGC2024年01月17日 11:02:39 +00:00Commented Jan 17, 2024 at 11:02
-
1I wouldn't enforce uniqueness on JSON values. I think that is a design problem. But if I had to, I would use the built-in hash functions, make sure that they are not used anywhere outside the database, and that a dump/restore will work even if the hash values change.Laurenz Albe– Laurenz Albe2024年01月17日 11:36:16 +00:00Commented Jan 17, 2024 at 11:36
jsonb
keys prior to hashing, anyone?jsonb
give consistent output for same input in postgres?