I have below unstructured but valid JSON which need to be converted to structured format using any C# library or newtonsoft-
{
"root_id": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"root_tittel": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"root_mottaker_adresse1": {
"Path": "InsertDocuments",
"MainContract": "CreateDocumentParameter"
},
"root_mottaker_adresse2": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"root_web_id_guid": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
}
}
want to make it structured as below -
{
"id": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"tittel": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"mottaker": {
"adresse1": {
"Path": "InsertDocuments",
"MainContract": "CreateDocumentParameter"
},
"adresse2": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
}
},
"web": {
"id": {
"guid": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
}
}
}
}
if you see the difference the hierarchy is split with _(underscore). I want to make it in a more nested way.
i.e.
root_element->elementroot_element1_element2->element1is parent andelement2is child.
Code
JObject obj = JObject.Parse(jsonString);
JObject finalObj = new JObject();
foreach (var item in obj)
{
var keys = item.Key.Replace("root_", "").Split("_").Reverse();
bool nestedKeyProcessed = false;
JObject tempObj = new JObject();
foreach (string key in keys)
{
if (keys.Count() > 1 && !nestedKeyProcessed)
{
tempObj = CreateJObject(key, item.Value);
nestedKeyProcessed = true;
}
else
{
if (keys.Count() == 1)
finalObj.Add(new JProperty(key, item.Value));
else
tempObj = CreateJObjectUsingJProperty(key, tempObj);
}
}
if (keys.Count() > 1)
finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
}
string json = JsonConvert.SerializeObject(finalObj);
JObject CreateJObject(string key, JToken? data)
{
JObject obj = new JObject();
obj.Add(key, data);
return obj;
}
JObject CreateJObjectUsingJProperty(string key, object? data)
{
JObject obj = new JObject(new JProperty(key, data));
return obj;
}
Please review and let me know if it can be any optimized in any way
-
\$\begingroup\$ How large is your typical input json? \$\endgroup\$Peter Csala– Peter Csala2022年11月24日 18:09:07 +00:00Commented Nov 24, 2022 at 18:09
-
2\$\begingroup\$ Welcome to Code review! What should be optimized? Speed, memory usage, time, readability? Please edit to clarify \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2022年11月24日 19:55:13 +00:00Commented Nov 24, 2022 at 19:55
-
\$\begingroup\$ @SᴀᴍOnᴇᴌᴀ In all aspects if we can optimise it would be good enough! \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 05:35:03 +00:00Commented Nov 25, 2022 at 5:35
-
1\$\begingroup\$ @PeterCsala For now in bytes and not more that 50 flatten fields. \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 06:12:09 +00:00Commented Nov 25, 2022 at 6:12
-
1\$\begingroup\$ @PeterCsala Utmost 5, not more than that. \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 07:01:06 +00:00Commented Nov 25, 2022 at 7:01
2 Answers 2
Let me present here an alternative solution
static readonly JsonMergeSettings MergeSettings = new() { MergeArrayHandling = MergeArrayHandling.Union };
const char LevelSeparator = '_';
static string DeflattenJson(string json)
{
var mappings = JObject.Parse(json).Properties().ToDictionary(prop => prop.Name, prop => prop.Value);
var objectsWithHierarchy = (from kv in mappings
let entryLevels = kv.Key.Split(LevelSeparator).Skip(1).Reverse()
let deflattened = CreateHierarchy(new Queue<string>(entryLevels.Skip(1)),
new JObject(new JProperty(entryLevels.First(), kv.Value)))
select deflattened).ToList();
var baseObject = new JObject();
objectsWithHierarchy.ForEach(obj => baseObject.Merge(obj, MergeSettings));
return baseObject.ToString();
}
static JObject CreateHierarchy(Queue<string> pathLevels, JObject currentNode)
{
if (pathLevels.Count == 0) return currentNode;
var newNode = new JObject(new JProperty(pathLevels.Dequeue(), currentNode));
return CreateHierarchy(pathLevels, newNode);
}
mappings: The top-level field names must be unique that's why we could create aDictionary- the key is the field name
- the value is an object which contains
PathandMainContract
objectsWithHierarchy: This linq query does the heavy lifting- It iterates through the previous
Dictionary entryLevels: This splits the field name by underscore then skips therootand reverse the order- for example from
root_mottaker_adresse2we will haveadresse2, mottaker
- for example from
deflattened: It calls a recursive function to create the hierarchy from the most inner to the most outer- It utilises a
Queueto support greater depth than 2
- It utilises a
- It iterates through the previous
- Finally we merge together the
JObjects by taking their union- Please note that we could also use the first element of the
objectsWithHierarchyas thebaseObject
- Please note that we could also use the first element of the
UPDATE #1
I've put together the following benchmark where the Original is your version and the Alternative is mine
class Program
{
static void Main()
{
BenchmarkRunner.Run<Versions>();
}
}
[HtmlExporter]
[MemoryDiagnoser]
[SimpleJob(RunStrategy.Monitoring, targetCount: 5)]
public class Versions
{
string json;
[GlobalSetup]
public void Setup()
{
json = File.ReadAllText("sample.json");
}
[Benchmark(Baseline = true)]
public void RunOriginal() => Original(json);
[Benchmark()]
public void RunAlternative() => Alternative(json);
...
}
With the above setup I've run this on the following machine:
BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H2026) [Darwin 19.6.0]
Intel Core i9-9980HK CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
Job-FYODYN : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
The results are the following
| Method | Mean | Error | StdDev | Median | Ratio | RatioSD | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|
| RunOriginal | 121.0 us | 200.8 us | 52.14 us | 121.77 us | 1.00 | 0.00 | 31.78 KB | 1.00 |
| RunAlternative | 111.1 us | 333.5 us | 86.61 us | 70.54 us | 0.90 | 0.36 | 36.41 KB | 1.15 |
From the above results I can see the following:
- Mine mean execution time is around 10% faster
- Yours memory consumption is around 15% less
-
\$\begingroup\$ Thanks! Could you also please let me know "In what/which parameter this solution is more efficient than the one I shared"? \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 10:16:21 +00:00Commented Nov 25, 2022 at 10:16
-
1\$\begingroup\$ Let me put together a bechmark.net experiment to compare the two versions from execution time and memory consumption perspectives. \$\endgroup\$Peter Csala– Peter Csala2022年11月25日 10:25:23 +00:00Commented Nov 25, 2022 at 10:25
-
1\$\begingroup\$ @PPB I've updated my post, please check it. I would say there is no significant different between the two versions. \$\endgroup\$Peter Csala– Peter Csala2022年11月25日 11:00:11 +00:00Commented Nov 25, 2022 at 11:00
- a proper naming like
unstructuredandstructuredinstead ofobjandfinalObjor any better name that would add a better readability. nestedKeyProcessedcan be omitted, if initiatedtempObj = nulland replaced withtempObj == null.Reverse()would add extra cost to the operation, it can be omitted since you can do a loop on the keys reversely.Count()is expensive, you could reduce its costs if you stored the results outside the inner loop and reuse the stored value, also it can be omitted since you can omitReverse()and usesLengthofArray.CreateJObjectandCreateJObjectUsingJPropertyare unnecessary.JsonMergeSettingscan be cached and reused instead of creating a new instance on each iteration.
Revision Proposal
private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
public JObject RestructureJson(string jsonString)
{
var unstructured = JObject.Parse(jsonString);
var structured = new JObject();
foreach (var item in unstructured)
{
var keys = item.Key.Split('_');
// keys[0] == root
if (keys.Length == 2)
{
structured.Add(new JProperty(keys[1], item.Value));
}
else if (keys.Length > 2)
{
JObject? tempObj = null;
// Reverse() replacement
for (var x = keys.Length - 1; x != 0; x--)
{
tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
}
structured.Merge(tempObj, _jsonMergeSettings);
}
}
return structured.ToString();
}
UPDATE
Here is some benchmarks using BenchmarkDotNet, it would give you a better view on how it would perform in general basis. Though, environment, and resource will affect the overall performance as well, so your milage will vary.
Setup :
[SimpleJob]
[HtmlExporter]
[MemoryDiagnoser]
public class JsonRestructureBenchmark
{
private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
private const char LevelSeparator = '_';
private string json;
[GlobalSetup]
public void Setup()
{
json = File.ReadAllText("C:\\TempFolder\\unstructured.json");
}
[Benchmark(Baseline = true)]
public string Original() => Original(json);
[Benchmark()]
public string Revised() => Revised(json);
private string Revised(string jsonString)
{
var unstructured = JObject.Parse(jsonString);
var structured = new JObject();
foreach (var item in unstructured)
{
var keys = item.Key.Split('_');
if (keys.Length == 2)
{
structured.Add(new JProperty(keys[1], item.Value));
}
else if (keys.Length > 2)
{
JObject? tempObj = null;
for (var x = keys.Length - 1; x != 0; x--)
{
tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
}
structured.Merge(tempObj, _jsonMergeSettings);
}
}
return structured.ToString();
}
private string Original(string jsonString)
{
JObject obj = JObject.Parse(jsonString);
JObject finalObj = new JObject();
foreach (var item in obj)
{
var keys = item.Key.Replace("root_", "").Split('_').Reverse();
bool nestedKeyProcessed = false;
JObject tempObj = new JObject();
foreach (string key in keys)
{
if (keys.Count() > 1 && !nestedKeyProcessed)
{
tempObj = CreateJObject(key, item.Value);
nestedKeyProcessed = true;
}
else
{
if (keys.Count() == 1)
finalObj.Add(new JProperty(key, item.Value));
else
tempObj = CreateJObjectUsingJProperty(key, tempObj);
}
}
if (keys.Count() > 1)
finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
}
JObject CreateJObject(string key, JToken? data) => new JObject { { key, data } };
JObject CreateJObjectUsingJProperty(string key, object? data) => new JObject(new JProperty(key, data));
return finalObj.ToString();
}
}
Results :
BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.819)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
|---|---|---|---|---|---|---|---|---|---|
| Original | 21.49 μs | 0.429 μs | 0.985 μs | 1.00 | 0.00 | 7.5989 | - | 31.14 KB | 1.00 |
| Revised | 20.03 μs | 0.395 μs | 0.566 μs | 0.95 | 0.05 | 7.2632 | - | 29.73 KB | 0.95 |
As you can see in the results, the Revised version consumes less memory since we eliminated Reverse(). which would save between 1% to 7% on memory consumption.
If you see the Mean, you will also see some improvement there, this is because we eliminated the need of Count() and replacing it with Length and used the cached _jsonMergeSettings.
-
\$\begingroup\$ Thanks! Have you done bechmark.net experiment to compare? \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月30日 06:36:07 +00:00Commented Nov 30, 2022 at 6:36
-
1\$\begingroup\$ @PPB while
BenchmarkDotNetis an open-source library and anyone can use it, I did it anyway as requested ;). I hope this would help. \$\endgroup\$iSR5– iSR52022年11月30日 12:59:34 +00:00Commented Nov 30, 2022 at 12:59