Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 #138291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Davda-James wants to merge 6 commits into python:main
base: main
Choose a base branch
Loading
from Davda-James:fix/rfc3986-parse_qsl-strict

Conversation

Copy link

@Davda-James Davda-James commented Aug 31, 2025
edited
Loading

urllib.parse.parse_qsl earlier it was accepting the illegal characters as well.

Proof (that I reproduce) :
Before_Fix

Closes issue : #138284

Proof (after fixing error):
After_fix

I added the test for it as well.
Test for urlparse only :
test

All tests:
all_tests

  • Passes all tests

shloktech reacted with heart emoji
Copy link

python-cla-bot bot commented Aug 31, 2025
edited
Loading

All commit authors signed the Contributor License Agreement.

CLA signed

Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Davda-James Davda-James changed the title (削除) urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 (削除ここまで) (追記) gh-138284 : urllib.parse.parse_qsl now raises ValueError if illegal characters is passed, according to RFC 3986 (追記ここまで) Aug 31, 2025
Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@@ -91,6 +91,9 @@
# Unsafe bytes to be removed per WHATWG spec
_UNSAFE_URL_BYTES_TO_REMOVE = ['\t', '\r', '\n']

# Allowed valid characters in parse_qsl
_VALID_QUERY_CHARS = re.compile(r"^[A-Za-z0-9\-._~!$&'()*+,;=:@/?%]*$")
Copy link
Member

@StanFromIreland StanFromIreland Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced with str.isascii, str.isdecimal and a strings with the others, this should be faster.

Copy link
Author

@Davda-James Davda-James Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will do it and add new commit.

Copy link
Member

Please add a NEWS entry, and this does break existing code.

Davda-James and shloktech reacted with thumbs up emoji

Copy link

bedevere-app bot commented Aug 31, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Copy link
Author

@StanFromIreland I have added your suggestion. Can you please review it again.
Thank You

Copy link

@shloktech shloktech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am the reporter of the issue which is being solved here. The changes look good to me, and I think they solve the issue very well. Approving from my end.

Note: @Davda-James do get it reviewed by Stan.

Suggestion: Squash your commits below to have a single commit. It is a good practice to have :)

Image

cc: @StanFromIreland

Copy link
Member

I have requested the expert for this module.

Suggestion: Squash your commits below to have a single commit.

Please do not, it confuses gh making it difficult to review. They will be squashed when merged anyway.

shloktech reacted with thumbs up emoji

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that this change is expected, the following must be done:

  • The documentation of urllib.parse.parse_qsl must be updated accordingly.
  • Test coverage must be increased.

I'm not entirely sure that we necessarily need to consider this as a bug fix. The rationale is as follows:

The urlsplit() and urlparse() APIs do not perform validation of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity.

I do not know whether we should consider this is a pitfall or not.

@@ -854,6 +866,11 @@ def _unquote(s):
name, has_eq, value = name_value.partition(eq)
if not has_eq and strict_parsing:
raise ValueError("bad query field: %r" % (name_value,))
if strict_parsing:
# Validate RFC3986 characters
to_check = (name_value.decode() if isinstance(name_value, bytes) else name_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use _unquote as this handles the %-encoded values and takes care of the encoding parameter as well.

Copy link
Author

@Davda-James Davda-James Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if strict_parsing:
# Validate RFC3986 characters
to_check = _unquote(name_value)
if isinstance(to_check, (bytes, bytearray)):
to_check = to_check.decode(encoding, errors)
if not _is_valid_rfc3986_query(to_check): using like this is it good as we need to decode back as _unquote returns bytes and _is_valid_rfc3986_query accepts the string ?

Davda-James and others added 2 commits August 31, 2025 23:43
...MOp4k.rst
Updated it according to suggestion
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Reviewers

@picnixz picnixz picnixz left review comments

@StanFromIreland StanFromIreland StanFromIreland left review comments

@orsenthil orsenthil Awaiting requested review from orsenthil

+1 more reviewer

@shloktech shloktech shloktech approved these changes

Reviewers whose approvals may not affect merge requirements
Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /