-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Note greediness of PEP 723 reference parser #1960
Description
Issue Description
While preparing a PR for PEP 723 support in pip, I noticed that the reference parser defined by the PEP and listed in the PyPA docs will collate multiple adjacent /// TYPE blocks as a single match, even when separated by a comment line (the spec refers to it as a "content line"). This greedy collation is surprising and makes distinguishing error cases a little complicated, so I think it merits a warning in the docs if it is not possible to update the specification itself.
I believe this quirk is caused by the last + in the reference regex being greedy and matching all the way to the trailing /// instead of to the first available one. In my limited experimentation, replacing this quantifier with +? resolves the issue, producing the expected number of matches.
This shouldn't slip through anybody's code unnoticed, as the collation will produce invalid TOML (the interior /// is invalid syntax), but it is a surprising enough edge case that I thought to report it here.
click for code
import re script_A = """ # /// script # data (1) # /// # # /// script # data (2) # /// """ script_B = """ # /// script # data (1) # /// # /// script # data (2) # /// """ # These lines adapted from PEP 723's reference parser: # https://peps.python.org/pep-0723/#reference-implementation REGEX = r"(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$" name = "script" matches_A = list( filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_A)) ) matches_B = list( filter(lambda m: m.group("type") == name, re.finditer(REGEX, script_B)) ) # output: # 1 # 2 print(len(matches_A)) print(len(matches_B))
Code of Conduct
- I am aware that participants in this repository must follow the PSF Code of Conduct.