Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Imply UTF8 validity in explode function #10805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mvorisek wants to merge 1 commit into php:master from mvorisek:imply_more_utf8

Conversation

Copy link
Contributor

@mvorisek mvorisek commented Mar 7, 2023
edited
Loading

like #10780, needs #10853/#10870

when valid UTF-8 string is exploded by valid UTF-8 delimiter, the result is always valid UTF-8 string

invalid UTF-8 string might became valid, this case requires check on the whole string, thus cannot be implied from the source

Copy link
Contributor Author

mvorisek commented Mar 8, 2023

Any idea why for the following test case:

$string = 'můj žlutý kůň';
var_dump(explode('ů', $string, -2));

the result is marked as NOT valid UTF-8 string? The real result is an array with single character: https://3v4l.org/oRGNM/rfc#vgit.master, maybe due some interning?

Copy link
Member

Both the delimiter and string don't have the "valid UTF8" flag. Using mb_check_encoding to attempt to add the flag doesn't work because they're both indeed interned.

Copy link
Contributor Author

Both the delimiter and string don't have the "valid UTF8" flag. Using mb_check_encoding to attempt to add the flag doesn't work because they're both indeed interned.

If the delimiter is valid UTF-8 and the string is valid as well, the result "is the same but has the delimiter cut away", ie. leading to a valid UTF-8 string. What do you exactly mean?

Copy link
Member

What do you exactly mean?

Print the flags of the delimiter and the input string and you'll see they both don't have the UTF-8 valid flag. They were interned by the compiler and the compiler doesn't set that flag. Trying to add that flag using mbstring wouldn't work either because a flag can't be added once they're interned.

Copy link
Contributor Author

Thank you for the hint. I created #10853 feature request, as the check is quite cheap, all interned/compiled strings should be checked upfront to utilize opcache.

Copy link
Member

Thank you for the hint. I created #10853 feature request, as the check is quite cheap, all interned/compiled strings should be checked upfront to utilize opcache.

You're welcome. I think Alex was going to look into something like this. I'm not sure anymore but I thought I saw him say something like that in a comment on a PR some time ago...

Copy link
Contributor Author

maybe later

@mvorisek mvorisek deleted the imply_more_utf8 branch March 23, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

@bukka bukka Awaiting requested review from bukka bukka will be requested when the pull request is marked ready for review bukka is a code owner

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /