I've been trying to cleanup some text. But got stuck on regex, finally got around with re.sub. But end up with syntax error. Original Code:
Test for name cleanup
import re
input = u'CHEZ MADU 東久留米店(シェマディ)【東京都東久留米市】'
pattern = re.compile(ur'(【(.*?)\】)', re.UNICODE)\
print(re.sub(input, pattern, ''))
Gave me this error:
File "retest01.py", line 6
pattern = re.compile(ur'(【(.*?)\】)', re.UNICODE)\
^
SyntaxError: invalid syntax
I've been testing code from another regex thread: python regular expression with utf8 issue
It gave same error. What could be possible the source of problem here?
2 Answers 2
If you don't use the raw string notation, it works out fine for me. Additionally, I don't think you're using the re.sub properly:
re.sub(pattern, repl, string, count=0, flags=0)
This didn't throw an error for me:
import re
input = u'CHEZ MADU 東久留米店(シェマディ)【東京都東久留米市】'
pattern = re.compile(u'(【(.*?)\】)', re.UNICODE)
print(re.sub(pattern, '', input))
This works on python 2 and 3, but you don't need the unicode specifier on 3.
1 Comment
The ur'....' syntax is invalid since Python 3.3 (see http://bugs.python.org/issue15096 )
The syntax error is, a bit surprisingly, indicated at the end of the string...
>>> ru'my string'
File "<stdin>", line 1
ru'my string'
^
SyntaxError: invalid syntax
So, in Python 3, you can use either:
'my string'oru'mystring', which mean the same (the latter was reintroduced in Python 3.3 for compatibility with Python 2 code, see PEP 414 )- or
r'my string with \backslashes'for a "raw" string.
uprefix on the strings since all strings are unicode.pattern = re.compile(u'(【(.*?)\】)')is working for meuprefix.