I've got a scenario where I need to compare two JSON files and override one if the values are different. These JSONs also include arrays (i.e. [
)
My approach is to traverse one JSON to dictionary O(n), and then traverse the second JSON against the dictionary (also O(n)??).
There by getting 2O(n) ~ O(n).
Am I day dreaming? Or is there more efficient approach for such problem (or possibly something similar)?
EDIT:
- Both files have the same structure, format and order. The only difference is key value.
- I can share the code I've now, if you feel this is appropriate forum.
-
3If you find a difference, what happens? Do you always pick the value (for example) from file A? Then why don't you simply copy A to B wholesale in the first place?Kilian Foth– Kilian Foth2017年08月10日 05:59:04 +00:00Commented Aug 10, 2017 at 5:59
-
@KilianFoth thank you for the comment. To add a bit, both files have identical structure and keys, but the values may differ.Simply_me– Simply_me2017年08月10日 07:00:00 +00:00Commented Aug 10, 2017 at 7:00
-
@KilianFoth you are right, copying A to B would be much easier, but as a learning experience I'm interested to know what's an efficient way to do so.Simply_me– Simply_me2017年08月10日 07:08:02 +00:00Commented Aug 10, 2017 at 7:08
-
What is the point of translating one JSON file to a dictionary first? Why not comparing the data of the JSON files directly?JanDotNet– JanDotNet2017年08月10日 07:32:43 +00:00Commented Aug 10, 2017 at 7:32
-
1@JanDotNet: JSON files are considered equal if parsing them gives the same result. There are trivial differences like whitespace. Strings can have different escaped characters. One can be UTF-8 and one little-endian UTF-32. But worst, dictionaries can be in arbitrary order.gnasher729– gnasher7292017年08月10日 18:51:28 +00:00Commented Aug 10, 2017 at 18:51
1 Answer 1
First of all, yes: doing two O(n) things results in another O(n) operation. Constant terms or even factors such as "two" or even "seventeen" don't count in this calculus.
Second, well, if you really need to compare every value in both structures, then you obviously can't do any better than accessing each at least once, so there's no point worrying how to get a faster solution than that.
But more generally, what the best solution is depends on what exactly the requirements are. Do you have to find out whether the two are different, or how they differ, or do you also have to write one of the two do make them equal?
Finding out whether the two are different means that you can stop once you've found the first difference. That's still an O(n) operation, but it might save considerable time nevertheless. If you have to compile a list of differences, then obviously you can't take that short cut. And if you have to write one file, then that will probably take much more time than whatever operations you can program on the contents (disk I/O is tremendously slower than memory operations), so you're probably better off simply copying one file to the other. (In fact, most file systems only allow you to write out an entire file, so you'd have to pay for the entire write anyway, even if you have a list of specific differences to process.)
-
thank you for the answer. I would like to keep it at linear time, not faster :-). Right now I'm reading one file to dicts, and working on comparing the key-Val pairs.Simply_me– Simply_me2017年08月10日 07:42:12 +00:00Commented Aug 10, 2017 at 7:42