0

I'm trying to decode the strings in the list below. They were all encoded in utf-8 format.

_strs=['."\n\nThe vicar\'',':--\n\nIn the', 'cathedral']

Expected output:

['.The vicar', ':--In the', 'cathedral']

My attempts

>>> for x in _str:
 x.decode('string_escape')
 print x
'."\n\nThe vicar\''
."
The vicar'
':--\n\nIn the'
:--
In the
'cathedral'
cathedral
>>> print [x.decode('string_escape') for x in _str]
['."\n\nThe vicar\'', ':--\n\nIn the', 'cathedral']

Both attempts failed. Any ideas?

asked Apr 8, 2014 at 13:55

1 Answer 1

1

So you want to remove some characters from your list, it can be done using a simple regex like in the following:

import re
print [re.sub(r'[."\'\n]','',x) for x in _str]

this regex removes all the (., ", ', \n) and the result will be:

['The vicar', ':--In the', 'cathedral']

hope this helps.

answered Apr 8, 2014 at 14:11
Sign up to request clarification or add additional context in comments.

3 Comments

I want to retain all the punctuation marks. Sorry I did not state that in my question or expected output. The punctuation marks are too many, I don't know of any method that decodes automatically other than selective removal of unwanted character using a reg-ex.
any characters you want to keep, don't put in the regex. so if you want the . to be in the output like your last edit, then make the regex = ["\'\n]
you are right about that, but my dataset is too large, and the characters are numerous. If there's no standard method for decoding, then I'll have to build a punctuation list and adopt your solution. Thanks a lot bro.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.