I have below unstructured
but valid JSON
which need to be converted to structured
format using any C# library
or newtonsoft
-
{
"root_id": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"root_tittel": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"root_mottaker_adresse1": {
"Path": "InsertDocuments",
"MainContract": "CreateDocumentParameter"
},
"root_mottaker_adresse2": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"root_web_id_guid": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
}
}
want to make it structured as below -
{
"id": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"tittel": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
},
"mottaker": {
"adresse1": {
"Path": "InsertDocuments",
"MainContract": "CreateDocumentParameter"
},
"adresse2": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
}
},
"web": {
"id": {
"guid": {
"Path": "InsertCases",
"MainContract": "CreateCaseParameter"
}
}
}
}
if you see the difference the hierarchy is split with _
(underscore). I want to make it in a more nested way.
i.e.
root_element
->element
root_element1_element2
->element1
is parent andelement2
is child.
Code
JObject obj = JObject.Parse(jsonString);
JObject finalObj = new JObject();
foreach (var item in obj)
{
var keys = item.Key.Replace("root_", "").Split("_").Reverse();
bool nestedKeyProcessed = false;
JObject tempObj = new JObject();
foreach (string key in keys)
{
if (keys.Count() > 1 && !nestedKeyProcessed)
{
tempObj = CreateJObject(key, item.Value);
nestedKeyProcessed = true;
}
else
{
if (keys.Count() == 1)
finalObj.Add(new JProperty(key, item.Value));
else
tempObj = CreateJObjectUsingJProperty(key, tempObj);
}
}
if (keys.Count() > 1)
finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
}
string json = JsonConvert.SerializeObject(finalObj);
JObject CreateJObject(string key, JToken? data)
{
JObject obj = new JObject();
obj.Add(key, data);
return obj;
}
JObject CreateJObjectUsingJProperty(string key, object? data)
{
JObject obj = new JObject(new JProperty(key, data));
return obj;
}
Please review and let me know if it can be any optimized in any way
-
\$\begingroup\$ How large is your typical input json? \$\endgroup\$Peter Csala– Peter Csala2022年11月24日 18:09:07 +00:00Commented Nov 24, 2022 at 18:09
-
2\$\begingroup\$ Welcome to Code review! What should be optimized? Speed, memory usage, time, readability? Please edit to clarify \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2022年11月24日 19:55:13 +00:00Commented Nov 24, 2022 at 19:55
-
\$\begingroup\$ @SᴀᴍOnᴇᴌᴀ In all aspects if we can optimise it would be good enough! \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 05:35:03 +00:00Commented Nov 25, 2022 at 5:35
-
1\$\begingroup\$ @PeterCsala For now in bytes and not more that 50 flatten fields. \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 06:12:09 +00:00Commented Nov 25, 2022 at 6:12
-
1\$\begingroup\$ @PeterCsala Utmost 5, not more than that. \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 07:01:06 +00:00Commented Nov 25, 2022 at 7:01
2 Answers 2
Let me present here an alternative solution
static readonly JsonMergeSettings MergeSettings = new() { MergeArrayHandling = MergeArrayHandling.Union };
const char LevelSeparator = '_';
static string DeflattenJson(string json)
{
var mappings = JObject.Parse(json).Properties().ToDictionary(prop => prop.Name, prop => prop.Value);
var objectsWithHierarchy = (from kv in mappings
let entryLevels = kv.Key.Split(LevelSeparator).Skip(1).Reverse()
let deflattened = CreateHierarchy(new Queue<string>(entryLevels.Skip(1)),
new JObject(new JProperty(entryLevels.First(), kv.Value)))
select deflattened).ToList();
var baseObject = new JObject();
objectsWithHierarchy.ForEach(obj => baseObject.Merge(obj, MergeSettings));
return baseObject.ToString();
}
static JObject CreateHierarchy(Queue<string> pathLevels, JObject currentNode)
{
if (pathLevels.Count == 0) return currentNode;
var newNode = new JObject(new JProperty(pathLevels.Dequeue(), currentNode));
return CreateHierarchy(pathLevels, newNode);
}
mappings
: The top-level field names must be unique that's why we could create aDictionary
- the key is the field name
- the value is an object which contains
Path
andMainContract
objectsWithHierarchy
: This linq query does the heavy lifting- It iterates through the previous
Dictionary
entryLevels
: This splits the field name by underscore then skips theroot
and reverse the order- for example from
root_mottaker_adresse2
we will haveadresse2, mottaker
- for example from
deflattened
: It calls a recursive function to create the hierarchy from the most inner to the most outer- It utilises a
Queue
to support greater depth than 2
- It utilises a
- It iterates through the previous
- Finally we merge together the
JObject
s by taking their union- Please note that we could also use the first element of the
objectsWithHierarchy
as thebaseObject
- Please note that we could also use the first element of the
UPDATE #1
I've put together the following benchmark where the Original
is your version and the Alternative
is mine
class Program
{
static void Main()
{
BenchmarkRunner.Run<Versions>();
}
}
[HtmlExporter]
[MemoryDiagnoser]
[SimpleJob(RunStrategy.Monitoring, targetCount: 5)]
public class Versions
{
string json;
[GlobalSetup]
public void Setup()
{
json = File.ReadAllText("sample.json");
}
[Benchmark(Baseline = true)]
public void RunOriginal() => Original(json);
[Benchmark()]
public void RunAlternative() => Alternative(json);
...
}
With the above setup I've run this on the following machine:
BenchmarkDotNet=v0.13.2, OS=macOS Catalina 10.15.7 (19H2026) [Darwin 19.6.0]
Intel Core i9-9980HK CPU 2.40GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
Job-FYODYN : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
The results are the following
Method | Mean | Error | StdDev | Median | Ratio | RatioSD | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|
RunOriginal | 121.0 us | 200.8 us | 52.14 us | 121.77 us | 1.00 | 0.00 | 31.78 KB | 1.00 |
RunAlternative | 111.1 us | 333.5 us | 86.61 us | 70.54 us | 0.90 | 0.36 | 36.41 KB | 1.15 |
From the above results I can see the following:
- Mine mean execution time is around 10% faster
- Yours memory consumption is around 15% less
-
\$\begingroup\$ Thanks! Could you also please let me know "In what/which parameter this solution is more efficient than the one I shared"? \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月25日 10:16:21 +00:00Commented Nov 25, 2022 at 10:16
-
1\$\begingroup\$ Let me put together a bechmark.net experiment to compare the two versions from execution time and memory consumption perspectives. \$\endgroup\$Peter Csala– Peter Csala2022年11月25日 10:25:23 +00:00Commented Nov 25, 2022 at 10:25
-
1\$\begingroup\$ @PPB I've updated my post, please check it. I would say there is no significant different between the two versions. \$\endgroup\$Peter Csala– Peter Csala2022年11月25日 11:00:11 +00:00Commented Nov 25, 2022 at 11:00
- a proper naming like
unstructured
andstructured
instead ofobj
andfinalObj
or any better name that would add a better readability. nestedKeyProcessed
can be omitted, if initiatedtempObj = null
and replaced withtempObj == null
.Reverse()
would add extra cost to the operation, it can be omitted since you can do a loop on the keys reversely.Count()
is expensive, you could reduce its costs if you stored the results outside the inner loop and reuse the stored value, also it can be omitted since you can omitReverse()
and usesLength
ofArray
.CreateJObject
andCreateJObjectUsingJProperty
are unnecessary.JsonMergeSettings
can be cached and reused instead of creating a new instance on each iteration.
Revision Proposal
private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
public JObject RestructureJson(string jsonString)
{
var unstructured = JObject.Parse(jsonString);
var structured = new JObject();
foreach (var item in unstructured)
{
var keys = item.Key.Split('_');
// keys[0] == root
if (keys.Length == 2)
{
structured.Add(new JProperty(keys[1], item.Value));
}
else if (keys.Length > 2)
{
JObject? tempObj = null;
// Reverse() replacement
for (var x = keys.Length - 1; x != 0; x--)
{
tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
}
structured.Merge(tempObj, _jsonMergeSettings);
}
}
return structured.ToString();
}
UPDATE
Here is some benchmarks using BenchmarkDotNet
, it would give you a better view on how it would perform in general basis. Though, environment, and resource will affect the overall performance as well, so your milage will vary.
Setup :
[SimpleJob]
[HtmlExporter]
[MemoryDiagnoser]
public class JsonRestructureBenchmark
{
private static readonly JsonMergeSettings _jsonMergeSettings = new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union };
private const char LevelSeparator = '_';
private string json;
[GlobalSetup]
public void Setup()
{
json = File.ReadAllText("C:\\TempFolder\\unstructured.json");
}
[Benchmark(Baseline = true)]
public string Original() => Original(json);
[Benchmark()]
public string Revised() => Revised(json);
private string Revised(string jsonString)
{
var unstructured = JObject.Parse(jsonString);
var structured = new JObject();
foreach (var item in unstructured)
{
var keys = item.Key.Split('_');
if (keys.Length == 2)
{
structured.Add(new JProperty(keys[1], item.Value));
}
else if (keys.Length > 2)
{
JObject? tempObj = null;
for (var x = keys.Length - 1; x != 0; x--)
{
tempObj = new JObject(new JProperty(keys[x], tempObj ?? item.Value));
}
structured.Merge(tempObj, _jsonMergeSettings);
}
}
return structured.ToString();
}
private string Original(string jsonString)
{
JObject obj = JObject.Parse(jsonString);
JObject finalObj = new JObject();
foreach (var item in obj)
{
var keys = item.Key.Replace("root_", "").Split('_').Reverse();
bool nestedKeyProcessed = false;
JObject tempObj = new JObject();
foreach (string key in keys)
{
if (keys.Count() > 1 && !nestedKeyProcessed)
{
tempObj = CreateJObject(key, item.Value);
nestedKeyProcessed = true;
}
else
{
if (keys.Count() == 1)
finalObj.Add(new JProperty(key, item.Value));
else
tempObj = CreateJObjectUsingJProperty(key, tempObj);
}
}
if (keys.Count() > 1)
finalObj.Merge(tempObj, new JsonMergeSettings { MergeArrayHandling = MergeArrayHandling.Union });
}
JObject CreateJObject(string key, JToken? data) => new JObject { { key, data } };
JObject CreateJObjectUsingJProperty(string key, object? data) => new JObject(new JProperty(key, data));
return finalObj.ToString();
}
}
Results :
BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.819)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 [AttachedDebugger]
DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Gen1 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|
Original | 21.49 μs | 0.429 μs | 0.985 μs | 1.00 | 0.00 | 7.5989 | - | 31.14 KB | 1.00 |
Revised | 20.03 μs | 0.395 μs | 0.566 μs | 0.95 | 0.05 | 7.2632 | - | 29.73 KB | 0.95 |
As you can see in the results, the Revised
version consumes less memory since we eliminated Reverse()
. which would save between 1% to 7% on memory consumption.
If you see the Mean
, you will also see some improvement there, this is because we eliminated the need of Count()
and replacing it with Length
and used the cached _jsonMergeSettings
.
-
\$\begingroup\$ Thanks! Have you done bechmark.net experiment to compare? \$\endgroup\$Pranav Bilurkar– Pranav Bilurkar2022年11月30日 06:36:07 +00:00Commented Nov 30, 2022 at 6:36
-
1\$\begingroup\$ @PPB while
BenchmarkDotNet
is an open-source library and anyone can use it, I did it anyway as requested ;). I hope this would help. \$\endgroup\$iSR5– iSR52022年11月30日 12:59:34 +00:00Commented Nov 30, 2022 at 12:59