constant.numeric.dec.python only has 2 capture groups #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

vpetrovykh merged 1 commit into MagicStack:master from asottile:dec_illegal_captures

Jul 31, 2020

Merged

constant.numeric.dec.python only has 2 capture groups #198

vpetrovykh merged 1 commit into MagicStack:master from asottile:dec_illegal_captures

Jul 31, 2020

Conversation

@asottile

Copy link

Contributor

@asottile asottile commented Feb 22, 2020

also validated this against oniguruma via onigurumacffi

>>> REG = '''\
... (?x)
... (?<![\w\.])(?:
... [1-9](?: _?[0-9] )*
... |
... 0+
... |
... [0-9](?: _?[0-9] )* ([jJ])
... |
... 0 ([0-9]+)(?![eE\.])
... )\b
... '''
>>> import onigurumacffi
>>> reg = onigurumacffi.compile(REG)
>>> reg.number_of_captures()
2

@asottile


 constant.numeric.dec.python only has 2 capture groups

bbc1a54

@asottile

Copy link

Contributor Author

asottile commented Feb 22, 2020

actually, on second thought -- I should probably check the rest of the regexen too 🤔

@asottile

Copy link

Contributor Author

asottile commented Feb 22, 2020

ok there are more, but I am having a hard time wrapping my head around the include scheme

here's the script I used to find them (can probably be improved a little bit to have more/better output):

import argparse
import re
import plistlib
from typing import Any
from typing import Dict
import onigurumacffi
_BACKREF_RE = re.compile(r'((?<!\\)(?:\\\\)*)\\([0-9]+)')
def _fix_end(s: str) -> str:
 """end can have backreferences"""
 return _BACKREF_RE.sub('ZZZ', s)
def _visit_captures(reg: str, captures: Dict[str, Dict[str, Any]]) -> None:
 max_n = onigurumacffi.compile(reg).number_of_captures()
 for k, v in captures.items():
 if int(k) > max_n:
 print(f'{k} > {max_n}: {reg!r} {v}')
 _visit_rule(v)
def _visit_rule(rule: Dict[str, Any]) -> None:
 if 'match' in rule and 'captures' in rule:
 _visit_captures(rule['match'], rule['captures'])
 if 'begin' in rule:
 if 'captures' in rule:
 _visit_captures(rule['begin'], rule['captures'])
 _visit_captures(_fix_end(rule['end']), rule['captures'])
 if 'beginCaptures' in rule:
 _visit_captures(rule['begin'], rule['beginCaptures'])
 if 'endCaptures' in rule:
 _visit_captures(_fix_end(rule['end']), rule['endCaptures'])
 for sub_rule in rule.get('patterns', ()):
 _visit_rule(sub_rule)
 for sub_rule in rule.get('repository', {}).values():
 _visit_rule(sub_rule)
def main() -> int:
 parser = argparse.ArgumentParser()
 parser.add_argument('filename')
 args = parser.parse_args()
 with open(args.filename, 'rb') as f:
 contents = plistlib.load(f)
 _visit_rule(contents)
 return 0
if __name__ == '__main__':
 exit(main())

$ python3 t.py grammars/MagicPython.tmLanguage 
2 > 1: "(\\]|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\]|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}

@elprans elprans requested a review from vpetrovykh

February 22, 2020 16:39

@elprans elprans assigned vpetrovykh

Feb 22, 2020

@asottile

Copy link

Contributor Author

asottile commented Apr 22, 2020

@elprans @vpetrovykh anything left to do here?

@asottile

Copy link

Contributor Author

asottile commented Jul 30, 2020

@1st1 maybe?

@vpetrovykh

Copy link

Contributor

vpetrovykh commented Jul 31, 2020 •

edited

Loading

OK, the change looks good.

As for the other hits, all of the invalid.illegal.newline.python are, as far as I can tell, the product of us using templates in our syntax.yaml files to build various flavors of strings and regexp rules. For single-line strings we have a rule for detecting the illegal newline (unterminated string), but there's no such thing for the multi-line variants, however the template still has the match group and the scope assigned to that. So barring making a more elaborate template system we'll probably keep the current setup as it appears to do no harm.

Sorry for the delay in merging this.

@vpetrovykh vpetrovykh merged commit 225fa4c into MagicStack:master

Jul 31, 2020

@asottile asottile deleted the dec_illegal_captures branch

July 31, 2020 14:52

Labels

None yet

2 participants

@asottile @vpetrovykh

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

constant.numeric.dec.python only has 2 capture groups #198

constant.numeric.dec.python only has 2 capture groups #198

Uh oh!

Conversation

@asottile asottile commented Feb 22, 2020

Uh oh!

asottile commented Feb 22, 2020

Uh oh!

asottile commented Feb 22, 2020

Uh oh!

asottile commented Apr 22, 2020

Uh oh!

asottile commented Jul 30, 2020

Uh oh!

vpetrovykh commented Jul 31, 2020 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

constant.numeric.dec.python only has 2 capture groups #198

constant.numeric.dec.python only has 2 capture groups #198

Uh oh!

Conversation

@asottile asottile commented Feb 22, 2020

Uh oh!

asottile commented Feb 22, 2020

Uh oh!

asottile commented Feb 22, 2020

Uh oh!

asottile commented Apr 22, 2020

Uh oh!

asottile commented Jul 30, 2020

Uh oh!

vpetrovykh commented Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vpetrovykh commented Jul 31, 2020 •

edited

Loading