-
Notifications
You must be signed in to change notification settings - Fork 38
Conversation
5c92d47 to
3139b2c
Compare
e84e78d to
e2f0def
Compare
@arouel thanks very much, I have fix the code based on your suggestion.
In the case of determining the parsing path, simdjsonParserWithFixPath provides better performance and supports compressing map and list type data into strings. It can quickly skip paths that do not require parsing and avoid creating instances of JSON nodes for each JSON node
Benchmark testing indicators. refer:
environment is Species[byte, 32, S_256_BIT]
Result "org.simdjson.AParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_Jackson":
693.528 ±(99.9%) 18.073 ops/s [Average]
(min, avg, max) = (687.806, 693.528, 699.113), stdev = 4.694
CI (99.9%): [675.455, 711.601] (assumes normal distribution)
Result "org.simdjson.ParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_SimdJson":
2258.495 ±(99.9%) 41.596 ops/s [Average]
(min, avg, max) = (2242.400, 2258.495, 2269.942), stdev = 10.802
CI (99.9%): [2216.899, 2300.091] (assumes normal distribution)
Result "org.simdjson.ParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_SimdJsonParserWithFixPath":
4075.984 ±(99.9%) 104.804 ops/s [Average]
(min, avg, max) = (4029.568, 4075.984, 4100.273), stdev = 27.217
CI (99.9%): [3971.180, 4180.789] (assumes normal distribution)
e2f0def to
4bed300
Compare
piotrrzysko
commented
Oct 20, 2024
How is this different from On-Demand parsing available in the c++ simdjson version?
I introduced a form of on-demand parsing in #51 (see: org.simdjson.OnDemandJsonIterator). The API requires specifying a target class to which the JSON will be parsed. However, it should be relatively easy to extend this to support a DOM-like API (JsonValue, JsonIterator, etc.), which I believe is more intuitive than introducing syntax for accessing fields and then returning an array of strings with the corresponding values.
arouel
commented
Oct 21, 2024
@piotrrzysko I agree with you, a DOM-like API (JsonValue, JsonIterator, etc.) would be very helpful in use cases where only specific parts of the JSON are conditionally relevant, so that a mapping to an object would cause allocation that you want to avoid.
Can you guide us a bit, so that we can prepare a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heykirby I just want share some thoughts/questions:
With some minor API changes in simdjson-java, could we keep the SimdJsonParserWithFixPath in another codebase or it could life in a contribution module, because it is tailored for a very specific use case?
Isn't a record JsonNode sufficient compared to using lombok?
ecd8e0e to
204fed7
Compare
@heykirby I just want share some thoughts/questions:
With some minor API changes in
simdjson-java, could we keep theSimdJsonParserWithFixPathin another codebase or it could life in a contribution module, because it is tailored for a very specific use case?Isn't a
record JsonNodesufficient compared to usinglombok?
@arouel Thanks arouel,the unused imports has been removed
204fed7 to
f6fc9e5
Compare
heykirby
commented
Nov 11, 2024
How is this different from On-Demand parsing available in the c++ simdjson version?
I introduced a form of on-demand parsing in #51 (see:
org.simdjson.OnDemandJsonIterator). The API requires specifying a target class to which the JSON will be parsed. However, it should be relatively easy to extend this to support a DOM-like API (JsonValue,JsonIterator, etc.), which I believe is more intuitive than introducing syntax for accessing fields and then returning an array of strings with the corresponding values.
hello, piotrrzysko, I used on-demand parsing,it is very convenient and efficient to deserialize json strings into java classes.it is also a solution provided by many mainstream json sdk.
However, this solution requires building a Java class before parsing the field, especially for deep paths, which is not very convenient for users. for example,if want to get field for $.a.b.c.d. first we need to define class a { class b { class c{class d}}},and then to parse value, and every time parse json string, we need to create an class instance for each node, in case of large-scale data, performance may be affected.
For SimdJsonParserWithFixPath, if we want get values for multi-paths: [$.a.c,$.a,$.a.d,$.b], we only need to provide the json paths, the usage is similar to hive's user define function: json_tuple. It also supports obtaining the value of the children of the container object while obtaining the compressed string value of the container object.
the path tree will only be created once during initialization,and the result array can be reused each time json string is parsed. In scenarios with large amounts of data, repeated creation and destruction of class instances can be avoided, and there will be some advantages in performance.
image
piotrrzysko
commented
Nov 22, 2024
Hi, sorry for the delayed reply.
@heykirby
What I meant was that we can introduce on-demand parsing for a DOM-like API, which would significantly reduce the need for creating new objects. In fact, we could have a single instance of something like OnDemandJsonValue, which would be mutable and traverse a parsed JSON under the hood (likely leveraging org.simdjson.OnDemandJsonIterator).
The schema-based API you’re referring to is simply using logic that could potentially be utilized by the on-demand DOM API as well.
Can you guide us a bit, so that we can prepare a PR?
I’d be happy to help. Perhaps I could start by creating a skeleton of the on-demand DOM API.
heykirby
commented
Nov 24, 2024
Hi, sorry for the delayed reply.
@heykirby What I meant was that we can introduce on-demand parsing for a DOM-like API, which would significantly reduce the need for creating new objects. In fact, we could have a single instance of something like
OnDemandJsonValue, which would be mutable and traverse a parsed JSON under the hood (likely leveragingorg.simdjson.OnDemandJsonIterator).The schema-based API you’re referring to is simply using logic that could potentially be utilized by the on-demand DOM API as well.
Can you guide us a bit, so that we can prepare a PR?
I’d be happy to help. Perhaps I could start by creating a skeleton of the on-demand DOM API.
thanks,piotrrzysko, It's always an expected feature.
heykirby
commented
Nov 26, 2024
@piotrrzysko I submitted a new PR, could you give me some guidance? #63
issue: #59