pandas invalid escape sequence after update

Question 1

I am parsing a csv with multi char delimiters in pandas as follows

big_df = pd.read_csv(os.path.expanduser('~/path/to/csv/with/special/delimiters.csv'), 
 encoding='utf8', 
 sep='\$\$><\$\$', 
 decimal=',', 
 engine='python')
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df = big_df.replace(['^<', '>$'], ['', ''], regex=True)
big_df.columns = big_df.columns.to_series().replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)

this worked fine until I recently upgrade my pandas installation. Now I see a lot of deprecation warnings:

<input>:3: DeprecationWarning: invalid escape sequence \$
<input>:3: DeprecationWarning: invalid escape sequence \$
<input>:3: DeprecationWarning: invalid escape sequence \$
<input>:3: DeprecationWarning: invalid escape sequence \$
<input>:3: DeprecationWarning: invalid escape sequence \$
<ipython-input-6-1ba5b58b9e9e>:3: DeprecationWarning: invalid escape sequence \$
 sep='\$\$><\$\$',
<ipython-input-6-1ba5b58b9e9e>:7: DeprecationWarning: invalid escape sequence \$
 big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')

As I need the special delimiters with the $ symbols I am unsure how to properly handle these warnings

Question 2

Use raw strings: r'\$\$><\$\$' etc. That way string escaping and regex escaping don't interfere.

Question 3

Thanks, this is already the answer. If you want to feel free to post it as an answer.

Question 4

Thanks. I was going to refuse, but this deprecation seems to be a pretty new thing, I mostly find github issues for libraries such as jinja, scikit, sympy, etc; all from the past week or so.

Question 5

The problem is that escaping in strings can interfere with escaping in regular expressions. While '\s' is a valid regex token, for python this would represent a special character which doesn't exist (the string literal '\s' automatically gets converted to '\\s' i.e. r'\s', and I suspect that this process is what's been deprecated, apparently, from python 3.6).

The point is to always use raw string literals when constructing regular expressions, in order to make sure that python doesn't get confused by the backslashes. While most frameworks used to handle this ambiguity just fine (I assume by ignoring invalid escape sequences), apparently newer versions of certain libraries are trying to force programmers to be explicit and unambiguous (which I fully support).

In you specific case, your patterns should be changed from, say, '\$\$><\$\$' to r'\$\$><\$\$':

big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace(r'\$\$>$', '')

What actually happens is that the backslashes themselves have to escaped for python, in order to have a literal length-2 '\$' string in your regex pattern:

>>> r'\$\$><\$\$'
'\\$\\$><\\$\\$'

score 13 · Accepted Answer · 2017-06-02 10:19:53Z

The problem is that escaping in strings can interfere with escaping in regular expressions. While '\s' is a valid regex token, for python this would represent a special character which doesn't exist (the string literal '\s' automatically gets converted to '\\s' i.e. r'\s', and I suspect that this process is what's been deprecated, apparently, from python 3.6).

The point is to always use raw string literals when constructing regular expressions, in order to make sure that python doesn't get confused by the backslashes. While most frameworks used to handle this ambiguity just fine (I assume by ignoring invalid escape sequences), apparently newer versions of certain libraries are trying to force programmers to be explicit and unambiguous (which I fully support).

In you specific case, your patterns should be changed from, say, '\$\$><\$\$' to r'\$\$><\$\$':

big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace(r'\$\$>$', '')

What actually happens is that the backslashes themselves have to escaped for python, in order to have a literal length-2 '\$' string in your regex pattern:

>>> r'\$\$><\$\$'
'\\$\\$><\\$\\$'

CollectivesTM on Stack Overflow

pandas invalid escape sequence after update

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related