Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

constant.numeric.dec.python only has 2 capture groups #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
vpetrovykh merged 1 commit into MagicStack:master from asottile:dec_illegal_captures
Jul 31, 2020
Merged

constant.numeric.dec.python only has 2 capture groups #198

vpetrovykh merged 1 commit into MagicStack:master from asottile:dec_illegal_captures
Jul 31, 2020

Conversation

@asottile
Copy link
Contributor

@asottile asottile commented Feb 22, 2020

also validated this against oniguruma via onigurumacffi

>>> REG = '''\
... (?x)
... (?<![\w\.])(?:
... [1-9](?: _?[0-9] )*
... |
... 0+
... |
... [0-9](?: _?[0-9] )* ([jJ])
... |
... 0 ([0-9]+)(?![eE\.])
... )\b
... '''
>>> import onigurumacffi
>>> reg = onigurumacffi.compile(REG)
>>> reg.number_of_captures()
2

Copy link
Contributor Author

actually, on second thought -- I should probably check the rest of the regexen too 🤔

Copy link
Contributor Author

ok there are more, but I am having a hard time wrapping my head around the include scheme

here's the script I used to find them (can probably be improved a little bit to have more/better output):

import argparse
import re
import plistlib
from typing import Any
from typing import Dict
import onigurumacffi
_BACKREF_RE = re.compile(r'((?<!\\)(?:\\\\)*)\\([0-9]+)')
def _fix_end(s: str) -> str:
 """end can have backreferences"""
 return _BACKREF_RE.sub('ZZZ', s)
def _visit_captures(reg: str, captures: Dict[str, Dict[str, Any]]) -> None:
 max_n = onigurumacffi.compile(reg).number_of_captures()
 for k, v in captures.items():
 if int(k) > max_n:
 print(f'{k} > {max_n}: {reg!r} {v}')
 _visit_rule(v)
def _visit_rule(rule: Dict[str, Any]) -> None:
 if 'match' in rule and 'captures' in rule:
 _visit_captures(rule['match'], rule['captures'])
 if 'begin' in rule:
 if 'captures' in rule:
 _visit_captures(rule['begin'], rule['captures'])
 _visit_captures(_fix_end(rule['end']), rule['captures'])
 if 'beginCaptures' in rule:
 _visit_captures(rule['begin'], rule['beginCaptures'])
 if 'endCaptures' in rule:
 _visit_captures(_fix_end(rule['end']), rule['endCaptures'])
 for sub_rule in rule.get('patterns', ()):
 _visit_rule(sub_rule)
 for sub_rule in rule.get('repository', {}).values():
 _visit_rule(sub_rule)
def main() -> int:
 parser = argparse.ArgumentParser()
 parser.add_argument('filename')
 args = parser.parse_args()
 with open(args.filename, 'rb') as f:
 contents = plistlib.load(f)
 _visit_rule(contents)
 return 0
if __name__ == '__main__':
 exit(main())
$ python3 t.py grammars/MagicPython.tmLanguage 
2 > 1: "(\\]|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\]|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}

Copy link
Contributor Author

@elprans @vpetrovykh anything left to do here?

Copy link
Contributor Author

@1st1 maybe?

Copy link
Contributor

vpetrovykh commented Jul 31, 2020
edited
Loading

OK, the change looks good.

As for the other hits, all of the invalid.illegal.newline.python are, as far as I can tell, the product of us using templates in our syntax.yaml files to build various flavors of strings and regexp rules. For single-line strings we have a rule for detecting the illegal newline (unterminated string), but there's no such thing for the multi-line variants, however the template still has the match group and the scope assigned to that. So barring making a more elaborate template system we'll probably keep the current setup as it appears to do no harm.

Sorry for the delay in merging this.

asottile reacted with thumbs up emoji

@vpetrovykh vpetrovykh merged commit 225fa4c into MagicStack:master Jul 31, 2020
@asottile asottile deleted the dec_illegal_captures branch July 31, 2020 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@vpetrovykh vpetrovykh Awaiting requested review from vpetrovykh

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /