-
Notifications
You must be signed in to change notification settings - Fork 7.9k
uri: Remove php_uri_parse()
and php_uri_free()
#19624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9ea0b89
to
80e51a7
Compare
We can try parsing before allocating the `uri_internal_t` struct.
This API is both less powerful as well as less efficient compared to just calling the `->parse_uri()` handler of the parser directly. With regard to efficiency it needlessly allocates 32 byte of memory to store two pointers, of which one is already known, because it's the input. While it would be possible to just return the resulting `uri_internal_t` struct (instead of a pointer to a freshly allocated one), which would be returned as a register pair, users of the API can also just create the struct themselves for even more flexibility in allocations. The API is also less powerful, because it does not support base URIs.
80e51a7
to
8b70aa8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed these design details a while ago here: #18672 (comment) Back then, you suggested a simple API for internal usage (that's why php_uri_parse()
doesn't support the base_url
parameter), but now all the capabilities of the raw parse_uri
handler would be directly exposed for 3rd parties.
In the last sentence of my response (#18672 (comment)), I mentioned the reason why I think these API functions are important: they can hide the implementation details. E.g. if we remove the PHPAPI functions then we won't be able to change the signature (e.g. add new parameters) without causing a serious BC break for any caller of the API.
What made you change your mind in this question? What kind of future changes do you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how my last reply said "I'll look into this more at a later time" and this is what happened here. I still believe that simplifying the handler itself would be the right decision.
When I worked my way through the code, I realized that the php_uri_parse()
function is not much simpler than using the parse_uri
handler directly (its main benefit is not having the errors
parameter that is inapplicable to a generic API), but the way it works it comes with quite a bit of overhead due to the dynamic allocation (as mentioned in the commit message).
What kind of future changes do you have in mind?
I'm hoping to overall reduce the amount of stuff passed by pointer, since this adds quite a bit of overhead and makes the code harder to reason about. Specifically uri_internal_t
is currently often a dynamic allocation that is passed by pointer, which results in a 8 byte pointer being passed to a function just to look up a 16 byte struct containing two pointers, one of which is already known in many cases.
However I see that this PR by itself might not be particularly convincing. I can offer to prepare a little larger PR for you to see the "full picture" that I can then split into its individual parts, how does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note how my last reply said "I'll look into this more at a later time" and this is what happened here. I still believe that simplifying the handler itself would be the right decision.
Yeah, when rereading it, I was sure that the time had come now ^^
I'm hoping to overall reduce the amount of stuff passed by pointer, since this adds quite a bit of overhead and makes the code harder to reason about. Specifically uri_internal_t is currently often a dynamic allocation that is passed by pointer, which results in a 8 byte pointer being passed to a function just to look up a 16 byte struct containing two pointers, one of which is already known in many cases.
Thanks for the efforts!
However I see that this PR by itself might not be particularly convincing. I can offer to prepare a little larger PR for you to see the "full picture" that I can then split into its individual parts, how does that sound?
Yes, that sounds much better! I'm happy about performance/code quality improvements as long as the code continues to offer its users some kind of protection against future BC breaks. Basically, that's my main - and probably only - cornerstone :)
This API is both less powerful as well as less efficient compared to just calling the
->parse_uri()
handler of the parser directly.With regard to efficiency it needlessly allocates 32 byte of memory to store two pointers, of which one is already known, because it's the input. While it would be possible to just return the resulting
uri_internal_t
struct (instead of a pointer to a freshly allocated one), which would be returned as a register pair, users of the API can also just create the struct themselves for even more flexibility in allocations.The API is also less powerful, because it does not support base URIs.