Timeline for Decoding and Encoding in Python
Current License: CC BY-SA 3.0
27 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Mar 10, 2018 at 11:19 | vote | accept | cordelia | ||
| Mar 10, 2018 at 11:05 | comment | added | cordelia | How do I avoid losing the 're in the you're ? I have reposted this because I posted it in the Answer section by mistake. | |
| Mar 10, 2018 at 10:23 | vote | accept | cordelia | ||
| Mar 10, 2018 at 10:56 | |||||
| Mar 10, 2018 at 10:23 | vote | accept | cordelia | ||
| Mar 10, 2018 at 10:23 | |||||
| Mar 10, 2018 at 10:20 | answer | added | Jens | timeline score: 1 | |
| Mar 10, 2018 at 10:13 | comment | added | cordelia | 'code'original_tweet = "I luv my <3 iphone & you’re awsm apple. DisplayIsAwesome, sooo happppppy 🙂 apple.com" tweet = html.parser.unescape(original_tweet) print (tweet) 'code' | |
| Mar 10, 2018 at 10:13 | comment | added | cordelia | Do you recommend my using a parser on original_tweet and then applying your encode and decode code to that? | |
| Mar 10, 2018 at 10:12 | comment | added | cordelia | That seems to work except it does not correct the < and & the way it is corrected on the website. | |
| Mar 10, 2018 at 10:06 | comment | added | Jens |
If you can live some data loss then original_tweet.encode("utf-8").decode("ascii", errors="ignore") should work. First, encode() the string into an array of bytes, then decode() that array and dismiss possible decode errors.
|
|
| Mar 10, 2018 at 10:01 | comment | added | cordelia | Thank you @Jens but how do I do this in Python 3.6 everything seems to be done in Python 2.7. | |
| Mar 10, 2018 at 9:58 | comment | added | Jens |
@cordelia, that’s unlikely to work considering the Unicode emoji 🙂 in the original string which can not be represented in plain ASCII. Take a look at this or this question to convert the UTF8 encoded string original_tweet into a plain ASCII string.
|
|
| Mar 10, 2018 at 9:56 | comment | added | cordelia | I thinnk ascii is the format | |
| Mar 10, 2018 at 9:51 | comment | added | Jens | @cordelia Change the encoding format to what? | |
| Mar 10, 2018 at 9:50 | comment | added | cordelia | I need to transform the data and change the encoding format. | |
| Mar 10, 2018 at 9:49 | comment | added | Jens | @cordelia what are you trying to achieve? Every string in Py3+ is a UTF8 encoded Unicode string already. | |
| Mar 10, 2018 at 9:48 | comment | added | cordelia | How do I tweak this for Python 3.6 then? SHould I put a u in front of the original_tweet code? | |
| Mar 10, 2018 at 9:48 | history | rollback | jonrsharpe |
Rollback to Revision 2
|
|
| Mar 10, 2018 at 9:48 | comment | added | cordelia | Thanks a lot for your comments. This has put me on the right track and prevented me from going round in circles. | |
| Mar 10, 2018 at 9:47 | comment | added | Ulrich Eckhardt |
I believe that the code on that website was written for Python 2. There, a regular string (without u prefix) is a byte sequence, which can be decoded.
|
|
| Mar 10, 2018 at 9:44 | comment | added | phihag |
@cordelia That website's code does not make any sense. If your original_tweet value is a character string already, there's no need to encode or decode it. If it's a byte string (i.e. a bytes object), decode it once to get a character string.
|
|
| S Mar 10, 2018 at 9:43 | history | suggested | baduker | CC BY-SA 3.0 |
Minor style improvements.
|
| Mar 10, 2018 at 9:43 | comment | added | Jens |
You can not use str.encode() and bytes.decode() to handle the HTML entities < and & if that’s what you’re trying to do. Look into libs like Parsing HTML with lxml for that (based on you importing a HTML parser). However, your string original_tweet isn’t proper HTML, so you may consider fudging that first...
|
|
| Mar 10, 2018 at 9:41 | comment | added | cordelia | This is the website I am following and am unable to understand what is going on: analyticsvidhya.com/blog/2014/11/… | |
| Mar 10, 2018 at 9:39 | comment | added | Ulrich Eckhardt | You seem to be confused what a string in Python represents and what encoding or decoding does. Encoding turns a string into bytes, decoding the opposite. In that light, your call doesn't make sense and hence it also fails. | |
| Mar 10, 2018 at 9:38 | review | Suggested edits | |||
| S Mar 10, 2018 at 9:43 | |||||
| Mar 10, 2018 at 9:38 | history | edited | jonrsharpe | CC BY-SA 3.0 |
deleted 17 characters in body
|
| Mar 10, 2018 at 9:36 | history | asked | cordelia | CC BY-SA 3.0 |