-
Notifications
You must be signed in to change notification settings - Fork 97
constant.numeric.dec.python only has 2 capture groups #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
asottile
commented
Feb 22, 2020
actually, on second thought -- I should probably check the rest of the regexen too 🤔
asottile
commented
Feb 22, 2020
ok there are more, but I am having a hard time wrapping my head around the include scheme
here's the script I used to find them (can probably be improved a little bit to have more/better output):
import argparse import re import plistlib from typing import Any from typing import Dict import onigurumacffi _BACKREF_RE = re.compile(r'((?<!\\)(?:\\\\)*)\\([0-9]+)') def _fix_end(s: str) -> str: """end can have backreferences""" return _BACKREF_RE.sub('ZZZ', s) def _visit_captures(reg: str, captures: Dict[str, Dict[str, Any]]) -> None: max_n = onigurumacffi.compile(reg).number_of_captures() for k, v in captures.items(): if int(k) > max_n: print(f'{k} > {max_n}: {reg!r} {v}') _visit_rule(v) def _visit_rule(rule: Dict[str, Any]) -> None: if 'match' in rule and 'captures' in rule: _visit_captures(rule['match'], rule['captures']) if 'begin' in rule: if 'captures' in rule: _visit_captures(rule['begin'], rule['captures']) _visit_captures(_fix_end(rule['end']), rule['captures']) if 'beginCaptures' in rule: _visit_captures(rule['begin'], rule['beginCaptures']) if 'endCaptures' in rule: _visit_captures(_fix_end(rule['end']), rule['endCaptures']) for sub_rule in rule.get('patterns', ()): _visit_rule(sub_rule) for sub_rule in rule.get('repository', {}).values(): _visit_rule(sub_rule) def main() -> int: parser = argparse.ArgumentParser() parser.add_argument('filename') args = parser.parse_args() with open(args.filename, 'rb') as f: contents = plistlib.load(f) _visit_rule(contents) return 0 if __name__ == '__main__': exit(main())
$ python3 t.py grammars/MagicPython.tmLanguage 2 > 1: "(\\]|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\]|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'} 2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'} 2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'} 2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} 2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
asottile
commented
Apr 22, 2020
@elprans @vpetrovykh anything left to do here?
asottile
commented
Jul 30, 2020
@1st1 maybe?
OK, the change looks good.
As for the other hits, all of the invalid.illegal.newline.python are, as far as I can tell, the product of us using templates in our syntax.yaml files to build various flavors of strings and regexp rules. For single-line strings we have a rule for detecting the illegal newline (unterminated string), but there's no such thing for the multi-line variants, however the template still has the match group and the scope assigned to that. So barring making a more elaborate template system we'll probably keep the current setup as it appears to do no harm.
Sorry for the delay in merging this.
also validated this against oniguruma via onigurumacffi