I have been looking to remove all the non numerical character, and convert the same to float type, But whenever any String character comes up , it gives the Error as "ValueError: could not convert string to float"
Please suggest how to resolve it .
Input File
col1 col2
122.45 NaN
Ninety Five 3585/-
9987 178@#?
225 Nine 1983.86
Twelve 7363*
Output File
col1 col2
122.45 NaN
NaN 3585
9987 178
225 1983.86
NaN 7363
Code i am using :
df[['col1','col2']] = df[['col1','col2']].replace('([^\d/.])', '', regex=True).astype(float)
Getting the Error:
ValueError: could not convert string to float
1 Answer 1
You need to use a raw string (with the r in front) for regex patterns, or double backslash (\\) escapes. Also you need \. to match literal . characters, not /.:
df[['col1', 'col2']] = df[['col1', 'col2']].replace('(-?[^\d\.])', '', regex=True).replace('', float('NaN')).astype(float)
answered Jun 18, 2021 at 15:31
Will Da Silva
7,1512 gold badges35 silver badges55 bronze badges
Sign up to request clarification or add additional context in comments.
7 Comments
Manz
@Willdasilva - Using the above lines as well Gives the same Error when it comes to the row containing value "One Lakh Two Thousand Three hundred & Twenty Paise"
Manz
@AnuragDabas - When we have the all characters as Alpha in row values, For Eg. "One Lakh Two Thousand Three hundred & Twenty Paise" It still gives the same Error.
Anurag Dabas
@Manz so
df[['col1','col2']]=df[['col1','col2']].replace('([^\d\.])', '', regex=True).replace('',float("NaN"),regex=True).astype(float) doesn't work?Manz
@willdasilva - Any suggestion on if having the row value as "-122.45", Its not working in this case, how to resolve the same.
Manz
@AnuragDabas - Thanks for the Answer , Its working but when we have the row values as "-122.45" it doesn't works , gives the same value as output.
|
lang-py
'[^\d\.]'and add.replace('', np.nan)like so:df[['col1','col2']].replace('([^\d\.])', '', regex=True).replace('',np.nan).astype(float)replace?