Do not copy the normalized URI when cloning RFC 3986 URIs - alternative approach #19744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

kocsismate wants to merge 1 commit into php:master from kocsismate:ext-uri10

Closed

Do not copy the normalized URI when cloning RFC 3986 URIs - alternative approach #19744

kocsismate wants to merge 1 commit into php:master from kocsismate:ext-uri10

+15 −8

Conversation

kocsismate

Copy link

Member

@kocsismate kocsismate commented Sep 6, 2025

Altenative to #19588

@kocsismate


 Do not copy the normalized URI when cloning RFC 3986 URIs - alternati...

8b5ead3

...ve approach

@kocsismate kocsismate requested review from TimWolla and nielsdos

September 6, 2025 21:01

@github-actions github-actions bot added ABI break Extension: uri labels

Sep 6, 2025

kocsismate

kocsismate commented

Sep 6, 2025

View reviewed changes

ext/uri/php_uri.c

ZEND_USER_CODE(execute_data->func->type) &&

execute_data->opline != NULL &&

execute_data->opline->opcode == ZEND_CLONE;

void *uri = internal_uri->parser->clone_uri(internal_uri->uri, is_clone_op == false);

Copy link

Member Author

@kocsismate kocsismate Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way the extra check affects all URI parsers - so the above check could be directly added to the clone_uri handler of RFC 3986... 🤔

@kocsismate kocsismate mentioned this pull request

Sep 6, 2025

uri: Do not copy the normalized URI when cloning RFC 3986 URIs #19588

Merged

nielsdos

nielsdos requested changes

Sep 6, 2025

View reviewed changes

ext/uri/php_uri.c

execute_data->func &&

ZEND_USER_CODE(execute_data->func->type) &&

execute_data->opline != NULL &&

execute_data->opline->opcode == ZEND_CLONE;

Copy link

Member

@nielsdos nielsdos Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no, I object against these kinds of hacks. This might also not work well with the clone-as-function feature.

Copy link

Member Author

@kocsismate kocsismate Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( alright

Copy link

Member Author

@kocsismate kocsismate Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any other solution you can see that achieves similar result? The only thing I can is to set a global variable before cloning that the RFC 3986 implementation could check. But I guess this is also going to be considered hacky :)

Copy link

Member

@arnaud-lb arnaud-lb Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume that uri_clone_obj_handler() is always called by clone, and introduce a separate function with extra arguments for other use-cases. uri_clone_obj_handler() can call the new function.

Copy link

Member Author

@kocsismate kocsismate Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a reasonable assumption, however Tim has just changed the code of the withers to call the clone obj handler instead of directly calling uri_clone_obj_handler() (https://github.com/php/php-src/pull/19649/files#diff-5b8085788a05a46216af5a9b0c9db7e47351d41f3550952d497ba3d54922be15L92). He did this so that the chance of calling the improper clone obj handler is eliminated, and this also makes sense at the first sight, but now that I think about it again, uri_write_component_ex is not public code, and it's only used by PHP's own implementations... So I guess we could assume that uri_clone_obj_handler() can be called directly.

Copy link

Member

@TimWolla TimWolla Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still unclear to me, why we try to add this amount of complexity to the C code, when my previous suggestion of "just don't clone the normalized URI, because it's very likely to be invalidated" would be obviously correct. As part of my cleanup and review of ext/uri, I've fixed more one one case of memory unsafety.

Copy link

Member

@nielsdos nielsdos Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing too much premature optimization here. Go for the simple solution of Tim where we just don't copy the normalized URI. It's going to be much easier and less error-prone.
If you find in a (realistic) benchmark that it does matter, you can come up with a solution. But keep it simple at first and only when numbers show it you should optimize.

Copy link

Member Author

@kocsismate kocsismate Sep 7, 2025 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to properly copy normalized URIs when needed didn't seem complex to me: we could go back using uri_clone_obj_handler() inside uri_write_component_ex() and then we could pass false/true depending on the use-case (by using Arnaud's suggestion).

Yes, this improves the situation for only a small minority of use-cases, but for them, it's a huge improvement not to have to reparse the whole URI. Of course, real world benchmarks won't reveal this, unless I deliberately start cloning URIs like crazy for whatever reason.

In my opinion, this warrants the extra care. Not missing this opportunity seems more vital for me than optimizing other smaller details like converting zend_string_release() calls to zend_string_release_ex().

With all that said, if you both agree that we shouldn't take care of this usecase, then I'll accept and respect your decision, and close this PR.

Copy link

Member

@TimWolla TimWolla Sep 7, 2025 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

than optimizing other smaller details like converting zend_string_release() calls to zend_string_release_ex().

FWIW: I also agree with that. For me it's needless mental load that all these variants exist and just always use zend_string_release(). Doing C correctly itself is already complicated enough.

Copy link

Member

@TimWolla TimWolla Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah and what I also wanted to add: One issue I'm having here is that this would be adding another parameter to the API that only matters for one of the implementations, similarly to the errors parameter existing only for WhatWg. The current implementation tries to use generic helpers for something that differs in detail, leading to combinatorial explosion of (boolean) parameters in those helpers.

@kocsismate kocsismate closed this

Sep 7, 2025

Labels

ABI break Extension: uri

4 participants

@kocsismate @TimWolla @arnaud-lb @nielsdos

Do not copy the normalized URI when cloning RFC 3986 URIs - alternative approach #19744

Do not copy the normalized URI when cloning RFC 3986 URIs - alternative approach #19744

Conversation

@kocsismate kocsismate commented Sep 6, 2025

Uh oh!

@kocsismate kocsismate Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

@nielsdos nielsdos Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

@kocsismate kocsismate Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

@kocsismate kocsismate Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

@arnaud-lb arnaud-lb Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

@kocsismate kocsismate Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

@TimWolla TimWolla Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

@nielsdos nielsdos Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

@kocsismate kocsismate Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@TimWolla TimWolla Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@TimWolla TimWolla Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

@kocsismate kocsismate Sep 7, 2025 •

edited

Loading

@TimWolla TimWolla Sep 7, 2025 •

edited

Loading