-
Notifications
You must be signed in to change notification settings - Fork 13.7k
rustdoc-json-types
: Intern Type
s to deduplicate and flatten
#142327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustdoc-json-types
: Intern Type
s to deduplicate and flatten
#142327
Conversation
rustbot has assigned @aDotInTheVoid.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r?
to explicitly pick a reviewer
rustdoc-json-types is a public (although nightly-only) API. If possible, consider changing src/librustdoc/json/conversions.rs
; otherwise, make sure you bump the FORMAT_VERSION
constant.
cc @CraftSpider, @aDotInTheVoid, @Enselic, @obi1kenobi
This PR modifies run-make
tests.
cc @jieyouxu
These commits modify tests/rustdoc-json
.
rustdoc-json is a public (but unstable) interface.
Please ensure that if you've changed the output:
- It's intentional.
- The
FORMAT_VERSION
insrc/librustdoc-json-types
is bumped if necessary.
This comment has been minimized.
This comment has been minimized.
Apparently rustdoc_json is the hot new optimization target: I just filed #142335. That PR doesn't touch rustdoc-json-types, however; its changes are all on the rustdoc side. There will be some conflicts between that PR and this PR (especially in conversions.rs
) but I'm pretty sure they are superficial and both PRs will be able to merge.
rustdoc-types
deBox
ification (削除ここまで)rustdoc-json-types
: Intern Types
to deduplicate and flatten (追記ここまで)
rustdoc-json-types
: Intern Types
to deduplicate and flatten (削除ここまで)rustdoc-json-types
: Intern Type
s to deduplicate and flatten (追記ここまで)
The overall approach of interning Type
s seems like a good idea. The size win from de-duplicating types is nice. That said, I don't know how painful this would make things for consumers.
I've fixed all broken tests by just hardcoding type identifiers, though something tells me this isn't really deterministic, but I have no idea about other options, so I'll really appreciate any help here.
Yeah, this PR's current approach is fragile to any change to the order types are inserted to the types
set, and seems like it'd be really painful to write and adjust tests.
If we could do something like $.types[$.index[?(@.docs="foo")].type]
is jsonpath (or $.types[MY_VAR]
), then the tests would be much more manageable. I think it makes sense to first work on improving the test infra so we can express the tests we want. Do you want to work on that?
Also, some advice: This would be alot easier to review if it was split into logically separate commits (i.e. all the tests still pass after each commit). In paticular, if the Id
->ItemId
rename was in its own commit that did just that, it'd be alot easier to review the rest of the changes in separate commits.
According to the JSONPath spec, there's no way to achieve dynamic indexing. We can try to implement some kind of extension like it's done in jsonpath_rust
or add a new directive, but why was JSONPath chosen in the first place? It seems like a very inflexible query language that needs a ton of non-standard addons to be usable, at least in the rustdoc-types
testsuit. Why not something like jq
?
Based on the comments from when it was first added (#81063), it was because we used XPath for the HTML tests, and JSONPath was a similar thing to that but for json. I'd not be opposed to moving rustdoc-json tests to jq, especially given that it seems to natively support having variables.
☔ The latest upstream changes (presumably #142358) made this pull request unmergeable. Please resolve the merge conflicts.
I can confirm that the choice of JSONPath was largely unprincipled - jq
was mentioned, I believe, but I only know of jq
as a command-line tool, and didn't want to deal with adding more external dependencies. Something that improves the tests further would definitely be nice.
Ok, I'll try to tackle the infra. @aDotInTheVoid, could you please provide up-to-date info about what needs to be done besides fixing the JSONPath constraints? I've found #94140, but it seems to be outdated in some parts. I think you can try to update it and assign me to it.
Opened #142479, with instructions on how to do the test infra change. Any questions, ask their, or the t-rustdoc stream on zulip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u32
would be better. There's no risk of overflowing; rustc uses u32
for code locations which means it can't handle more than 4 GiB of source code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And probably better as a newtype, like ItemId
, to avoid possibly mixing it up with any other kinds of index types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add this impl I think you can turn this into f.into_json(renderer)
:
impl<T, U> FromClean<Box<T>> for U
where
U: FromClean<T>,
{
fn from_clean(opt: &Box<T>, renderer: &JsonRenderer<'_>) -> U {
opt.into_json(renderer)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in #142747.
Fun historical note: Doing this was suggested 5 years ago, on the first PR implementing rustdoc-json: #75114 (comment)!
Uh oh!
There was an error while loading. Please reload this page.
The current format of
rustdoc-types
has a problem that potentially prevents it from being parsed by the majority of JSON parsers, includingserde_json
. Generally, they're implemented via recursion meaning that they'll hit their depth limit or invoke a stack overflow if a JSON blob has too deeply nested data. This creates issues like obi1kenobi/cargo-semver-checks#108. I don't think that the problem has anything to do with parsers' limitations. Instead, I consider it as a format design flaw.rustdoc-types
implementsType
as a recursive structure by usingBox
es. I've removed them and added thetypes
field toCrate
that can be indexed akin toCrate::index
. This eliminated the recursive property ofType
makingrustdoc-types
parseable with much more deeply nested crates likediesel
.I've also added a test that demonstrates that
rustdoc-types
doesn't grow in depth anymore. Thanks to @weiznich for this snippet.The only known to me problem with these changes is that the current testing approach for
rustdoc-types
ostensibly makes everything to stop you from using indexes from the same JSON blob it's querying. I've fixed all broken tests by just hardcoding type identifiers, though something tells me this isn't really deterministic, but I have no idea about other options, so I'll really appreciate any help here.As for other improvements, due to hashing of types at the render stage, there are no more duplicated types in a resulted JSON blob, i.e. resulted JSON blobs are much smaller now. I've done a test with
diesel
by generating 2 JSON blobs withcargo rustdoc -Zunstable-options --lib --output-format=json -F extras,128-column-tables -p diesel -- --document-private-items
.rustdoc
1.87 generated 231 MB, the PR'srustdoc
generated 76 MB (3x smaller). This should drastically improve the speed of tools that userustdoc-types
.