Created structured/nested JSON from unstructured JSON

Question 1

I have below unstructured but valid JSON which need to be converted to structured format using any C# library or newtonsoft-

 {
 "root_id": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 },
 "root_tittel": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 }, 
 "root_mottaker_adresse1": {
 "Path": "InsertDocuments",
 "MainContract": "CreateDocumentParameter"
 },
 "root_mottaker_adresse2": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 },
 "root_web_id_guid": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 }
}

want to make it structured as below -

{
 "id": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 },
 "tittel": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 }, 
 "mottaker": {
 "adresse1": {
 "Path": "InsertDocuments",
 "MainContract": "CreateDocumentParameter"
 },
 "adresse2": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 }
 },
 "web": {
 "id": {
 "guid": {
 "Path": "InsertCases",
 "MainContract": "CreateCaseParameter"
 }
 }
 }
}

if you see the difference the hierarchy is split with _(underscore). I want to make it in a more nested way.

i.e.

root_element -> element
root_element1_element2 -> element1 is parent and element2 is child.

Code

JObject obj = JObject.Parse(jsonString);
JObject finalObj = new JObject();
foreach (var item in obj)
{
 var keys = item.Key.Replace("root_", "").Split("_").Reverse();
 bool nestedKeyProcessed = false;
 JObject tempObj = new JObject();
 foreach (string key in keys)
 {
 if (keys.Count() > 1 && !nestedKeyProcessed)
 {
 tempObj = CreateJObject(key, item.Value);
 nestedKeyProcessed = true;
 }
 else
 {
 if (keys.Count() == 1)
 finalObj.Add(new JProperty(key, item.Value));
 else
 tempObj = CreateJObjectUsingJProperty(key, tempObj);
 }
 }
 if (keys.Count() > 1)
 finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
}
string json = JsonConvert.SerializeObject(finalObj);
JObject CreateJObject(string key, JToken? data)
{
 JObject obj = new JObject();
 obj.Add(key, data);
 return obj;
}
JObject CreateJObjectUsingJProperty(string key, object? data)
{
 JObject obj = new JObject(new JProperty(key, data));
 return obj;
}

Please review and let me know if it can be any optimized in any way

Question 2

How large is your typical input json?

Question 3

Welcome to Code review! What should be optimized? Speed, memory usage, time, readability? Please edit to clarify

Question 4

@SᴀᴍOnᴇᴌᴀ In all aspects if we can optimise it would be good enough!

Question 5

@PeterCsala For now in bytes and not more that 50 flatten fields.

Question 6

@PeterCsala Utmost 5, not more than that.

Question 7

Let me present here an alternative solution

static readonly JsonMergeSettings MergeSettings = new() { MergeArrayHandling = MergeArrayHandling.Union };
const char LevelSeparator = '_';
static string DeflattenJson(string json)
{
 var mappings = JObject.Parse(json).Properties().ToDictionary(prop => prop.Name, prop => prop.Value);
 var objectsWithHierarchy = (from kv in mappings
 let entryLevels = kv.Key.Split(LevelSeparator).Skip(1).Reverse()
 let deflattened = CreateHierarchy(new Queue<string>(entryLevels.Skip(1)),
 new JObject(new JProperty(entryLevels.First(), kv.Value)))
 select deflattened).ToList();
 var baseObject = new JObject();
 objectsWithHierarchy.ForEach(obj => baseObject.Merge(obj, MergeSettings));
 return baseObject.ToString();
}
static JObject CreateHierarchy(Queue<string> pathLevels, JObject currentNode)
{
 if (pathLevels.Count == 0) return currentNode;
 var newNode = new JObject(new JProperty(pathLevels.Dequeue(), currentNode));
 return CreateHierarchy(pathLevels, newNode);
}

mappings: The top-level field names must be unique that's why we could create a Dictionary
- the key is the field name
- the value is an object which contains Path and MainContract
objectsWithHierarchy: This linq query does the heavy lifting
- It iterates through the previous Dictionary
- entryLevels: This splits the field name by underscore then skips the root and reverse the order
  - for example from root_mottaker_adresse2 we will have adresse2, mottaker
- deflattened: It calls a recursive function to create the hierarchy from the most inner to the most outer
  - It utilises a Queue to support greater depth than 2
Finally we merge together the JObjects by taking their union
- Please note that we could also use the first element of the objectsWithHierarchy as the baseObject

UPDATE #1

I've put together the following benchmark where the Original is your version and the Alternative is mine

class Program
{
 static void Main()
 {
 BenchmarkRunner.Run<Versions>();
 }
}
[HtmlExporter]
[MemoryDiagnoser]
[SimpleJob(RunStrategy.Monitoring, targetCount: 5)]
public class Versions
{
 string json;
 [GlobalSetup]
 public void Setup()
 {
 json = File.ReadAllText("sample.json");
 }
 [Benchmark(Baseline = true)]
 public void RunOriginal() => Original(json);
 [Benchmark()]
 public void RunAlternative() => Alternative(json);
 
 ...
}

With the above setup I've run this on the following machine:

BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H2026) [Darwin 19.6.0]
Intel Core i9-9980HK CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
 [Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
 Job-FYODYN : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

The results are the following

Method	Mean	Error	StdDev	Median	Ratio	RatioSD	Allocated	Alloc Ratio
RunOriginal	121.0 us	200.8 us	52.14 us	121.77 us	1.00	0.00	31.78 KB	1.00
RunAlternative	111.1 us	333.5 us	86.61 us	70.54 us	0.90	0.36	36.41 KB	1.15

From the above results I can see the following:

Mine mean execution time is around 10% faster
Yours memory consumption is around 15% less

Question 8

Thanks! Could you also please let me know "In what/which parameter this solution is more efficient than the one I shared"?

Question 9

Let me put together a bechmark.net experiment to compare the two versions from execution time and memory consumption perspectives.

Question 10

@PPB I've updated my post, please check it. I would say there is no significant different between the two versions.

Question 11

a proper naming like unstructured and structured instead of obj and finalObj or any better name that would add a better readability.
nestedKeyProcessed can be omitted, if initiated tempObj = null and replaced with tempObj == null.
Reverse() would add extra cost to the operation, it can be omitted since you can do a loop on the keys reversely.
Count() is expensive, you could reduce its costs if you stored the results outside the inner loop and reuse the stored value, also it can be omitted since you can omit Reverse() and uses Length of Array.
CreateJObject and CreateJObjectUsingJProperty are unnecessary.
JsonMergeSettings can be cached and reused instead of creating a new instance on each iteration.

Revision Proposal

private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
public JObject RestructureJson(string jsonString)
{
 var unstructured = JObject.Parse(jsonString);
 var structured = new JObject();
 foreach (var item in unstructured)
 { 
 var keys = item.Key.Split('_');
 // keys[0] == root
 
 if (keys.Length == 2)
 {
 structured.Add(new JProperty(keys[1], item.Value));
 }
 else if (keys.Length > 2)
 {
 JObject? tempObj = null;
 
 // Reverse() replacement
 for (var x = keys.Length - 1; x != 0; x--)
 {
 tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
 }
 structured.Merge(tempObj, _jsonMergeSettings);
 }
 }
 return structured.ToString();
}

UPDATE Here is some benchmarks using BenchmarkDotNet, it would give you a better view on how it would perform in general basis. Though, environment, and resource will affect the overall performance as well, so your milage will vary.

Setup :

[SimpleJob]
[HtmlExporter]
[MemoryDiagnoser]
public class JsonRestructureBenchmark
{
 private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
 
 private const char LevelSeparator = '_';
 private string json;
 [GlobalSetup]
 public void Setup()
 {
 json = File.ReadAllText("C:\\TempFolder\\unstructured.json");
 }
 [Benchmark(Baseline = true)]
 public string Original() => Original(json);
 [Benchmark()]
 public string Revised() => Revised(json);
 private string Revised(string jsonString)
 {
 var unstructured = JObject.Parse(jsonString);
 var structured = new JObject();
 foreach (var item in unstructured)
 {
 var keys = item.Key.Split('_');
 if (keys.Length == 2)
 {
 structured.Add(new JProperty(keys[1], item.Value));
 }
 else if (keys.Length > 2)
 {
 JObject? tempObj = null;
 for (var x = keys.Length - 1; x != 0; x--)
 { 
 tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
 }
 structured.Merge(tempObj, _jsonMergeSettings);
 }
 }
 
 return structured.ToString();
 }
 private string Original(string jsonString)
 {
 JObject obj = JObject.Parse(jsonString);
 JObject finalObj = new JObject();
 foreach (var item in obj)
 {
 var keys = item.Key.Replace("root_", "").Split('_').Reverse();
 bool nestedKeyProcessed = false;
 JObject tempObj = new JObject();
 foreach (string key in keys)
 {
 if (keys.Count() > 1 && !nestedKeyProcessed)
 {
 tempObj = CreateJObject(key, item.Value);
 nestedKeyProcessed = true;
 }
 else
 {
 if (keys.Count() == 1)
 finalObj.Add(new JProperty(key, item.Value));
 else
 tempObj = CreateJObjectUsingJProperty(key, tempObj);
 }
 }
 if (keys.Count() > 1)
 finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
 }
 JObject CreateJObject(string key, JToken? data) => new JObject { { key, data } };
 JObject CreateJObjectUsingJProperty(string key, object? data) => new JObject(new JProperty(key, data));
 return finalObj.ToString();
 }
}

Results :


BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.819)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100
 [Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
 DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
Original	21.49 μs	0.429 μs	0.985 μs	1.00	0.00	7.5989	-	31.14 KB	1.00
Revised	20.03 μs	0.395 μs	0.566 μs	0.95	0.05	7.2632	-	29.73 KB	0.95

As you can see in the results, the Revised version consumes less memory since we eliminated Reverse(). which would save between 1% to 7% on memory consumption.

If you see the Mean, you will also see some improvement there, this is because we eliminated the need of Count() and replacing it with Length and used the cached _jsonMergeSettings.

Question 12

Thanks! Have you done bechmark.net experiment to compare?

Question 13

@PPB while BenchmarkDotNet is an open-source library and anyone can use it, I did it anyway as requested ;). I hope this would help.

Peter Csala 10.8k1 gold badge16 silver badges36 bronze badges · Answer 1 · 2022-11-25 09:43:07Z

Let me present here an alternative solution

static readonly JsonMergeSettings MergeSettings = new() { MergeArrayHandling = MergeArrayHandling.Union };
const char LevelSeparator = '_';
static string DeflattenJson(string json)
{
 var mappings = JObject.Parse(json).Properties().ToDictionary(prop => prop.Name, prop => prop.Value);
 var objectsWithHierarchy = (from kv in mappings
 let entryLevels = kv.Key.Split(LevelSeparator).Skip(1).Reverse()
 let deflattened = CreateHierarchy(new Queue<string>(entryLevels.Skip(1)),
 new JObject(new JProperty(entryLevels.First(), kv.Value)))
 select deflattened).ToList();
 var baseObject = new JObject();
 objectsWithHierarchy.ForEach(obj => baseObject.Merge(obj, MergeSettings));
 return baseObject.ToString();
}
static JObject CreateHierarchy(Queue<string> pathLevels, JObject currentNode)
{
 if (pathLevels.Count == 0) return currentNode;
 var newNode = new JObject(new JProperty(pathLevels.Dequeue(), currentNode));
 return CreateHierarchy(pathLevels, newNode);
}

mappings: The top-level field names must be unique that's why we could create a Dictionary
- the key is the field name
- the value is an object which contains Path and MainContract
objectsWithHierarchy: This linq query does the heavy lifting
- It iterates through the previous Dictionary
- entryLevels: This splits the field name by underscore then skips the root and reverse the order
  - for example from root_mottaker_adresse2 we will have adresse2, mottaker
- deflattened: It calls a recursive function to create the hierarchy from the most inner to the most outer
  - It utilises a Queue to support greater depth than 2
Finally we merge together the JObjects by taking their union
- Please note that we could also use the first element of the objectsWithHierarchy as the baseObject

UPDATE #1

I've put together the following benchmark where the Original is your version and the Alternative is mine

class Program
{
 static void Main()
 {
 BenchmarkRunner.Run<Versions>();
 }
}
[HtmlExporter]
[MemoryDiagnoser]
[SimpleJob(RunStrategy.Monitoring, targetCount: 5)]
public class Versions
{
 string json;
 [GlobalSetup]
 public void Setup()
 {
 json = File.ReadAllText("sample.json");
 }
 [Benchmark(Baseline = true)]
 public void RunOriginal() => Original(json);
 [Benchmark()]
 public void RunAlternative() => Alternative(json);
 
 ...
}

With the above setup I've run this on the following machine:

BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H2026) [Darwin 19.6.0]
Intel Core i9-9980HK CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
 [Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
 Job-FYODYN : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

The results are the following

Method	Mean	Error	StdDev	Median	Ratio	RatioSD	Allocated	Alloc Ratio
RunOriginal	121.0 us	200.8 us	52.14 us	121.77 us	1.00	0.00	31.78 KB	1.00
RunAlternative	111.1 us	333.5 us	86.61 us	70.54 us	0.90	0.36	36.41 KB	1.15

From the above results I can see the following:

Mine mean execution time is around 10% faster
Yours memory consumption is around 15% less

Thanks! Could you also please let me know "In what/which parameter this solution is more efficient than the one I shared"?
Let me put together a bechmark.net experiment to compare the two versions from execution time and memory consumption perspectives.
@PPB I've updated my post, please check it. I would say there is no significant different between the two versions.

iSR5 6,3931 gold badge10 silver badges16 bronze badges · Answer 2 · 2022-11-25 22:35:40Z

a proper naming like unstructured and structured instead of obj and finalObj or any better name that would add a better readability.
nestedKeyProcessed can be omitted, if initiated tempObj = null and replaced with tempObj == null.
Reverse() would add extra cost to the operation, it can be omitted since you can do a loop on the keys reversely.
Count() is expensive, you could reduce its costs if you stored the results outside the inner loop and reuse the stored value, also it can be omitted since you can omit Reverse() and uses Length of Array.
CreateJObject and CreateJObjectUsingJProperty are unnecessary.
JsonMergeSettings can be cached and reused instead of creating a new instance on each iteration.

Revision Proposal

private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
public JObject RestructureJson(string jsonString)
{
 var unstructured = JObject.Parse(jsonString);
 var structured = new JObject();
 foreach (var item in unstructured)
 { 
 var keys = item.Key.Split('_');
 // keys[0] == root
 
 if (keys.Length == 2)
 {
 structured.Add(new JProperty(keys[1], item.Value));
 }
 else if (keys.Length > 2)
 {
 JObject? tempObj = null;
 
 // Reverse() replacement
 for (var x = keys.Length - 1; x != 0; x--)
 {
 tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
 }
 structured.Merge(tempObj, _jsonMergeSettings);
 }
 }
 return structured.ToString();
}

UPDATE Here is some benchmarks using BenchmarkDotNet, it would give you a better view on how it would perform in general basis. Though, environment, and resource will affect the overall performance as well, so your milage will vary.

Setup :

[SimpleJob]
[HtmlExporter]
[MemoryDiagnoser]
public class JsonRestructureBenchmark
{
 private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
 
 private const char LevelSeparator = '_';
 private string json;
 [GlobalSetup]
 public void Setup()
 {
 json = File.ReadAllText("C:\\TempFolder\\unstructured.json");
 }
 [Benchmark(Baseline = true)]
 public string Original() => Original(json);
 [Benchmark()]
 public string Revised() => Revised(json);
 private string Revised(string jsonString)
 {
 var unstructured = JObject.Parse(jsonString);
 var structured = new JObject();
 foreach (var item in unstructured)
 {
 var keys = item.Key.Split('_');
 if (keys.Length == 2)
 {
 structured.Add(new JProperty(keys[1], item.Value));
 }
 else if (keys.Length > 2)
 {
 JObject? tempObj = null;
 for (var x = keys.Length - 1; x != 0; x--)
 { 
 tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
 }
 structured.Merge(tempObj, _jsonMergeSettings);
 }
 }
 
 return structured.ToString();
 }
 private string Original(string jsonString)
 {
 JObject obj = JObject.Parse(jsonString);
 JObject finalObj = new JObject();
 foreach (var item in obj)
 {
 var keys = item.Key.Replace("root_", "").Split('_').Reverse();
 bool nestedKeyProcessed = false;
 JObject tempObj = new JObject();
 foreach (string key in keys)
 {
 if (keys.Count() > 1 && !nestedKeyProcessed)
 {
 tempObj = CreateJObject(key, item.Value);
 nestedKeyProcessed = true;
 }
 else
 {
 if (keys.Count() == 1)
 finalObj.Add(new JProperty(key, item.Value));
 else
 tempObj = CreateJObjectUsingJProperty(key, tempObj);
 }
 }
 if (keys.Count() > 1)
 finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
 }
 JObject CreateJObject(string key, JToken? data) => new JObject { { key, data } };
 JObject CreateJObjectUsingJProperty(string key, object? data) => new JObject(new JProperty(key, data));
 return finalObj.ToString();
 }
}

Results :


BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.819)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100
 [Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
 DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

Method	Mean	Error	StdDev	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
Original	21.49 μs	0.429 μs	0.985 μs	1.00	0.00	7.5989	-	31.14 KB	1.00
Revised	20.03 μs	0.395 μs	0.566 μs	0.95	0.05	7.2632	-	29.73 KB	0.95

As you can see in the results, the Revised version consumes less memory since we eliminated Reverse(). which would save between 1% to 7% on memory consumption.

If you see the Mean, you will also see some improvement there, this is because we eliminated the need of Count() and replacing it with Length and used the cached _jsonMergeSettings.

@PPB while BenchmarkDotNet is an open-source library and anyone can use it, I did it anyway as requested ;). I hope this would help.

Stack Exchange Network

Created structured/nested JSON from unstructured JSON

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Created structured/nested JSON from unstructured JSON

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions