Skip to main content
Stack Overflow
  1. About
  2. For Teams

Timeline for Decoding and Encoding in Python

Current License: CC BY-SA 3.0

27 events
when toggle format what by license comment
Mar 10, 2018 at 11:19 vote accept cordelia
Mar 10, 2018 at 11:05 comment added cordelia How do I avoid losing the 're in the you're ? I have reposted this because I posted it in the Answer section by mistake.
Mar 10, 2018 at 10:23 vote accept cordelia
Mar 10, 2018 at 10:56
Mar 10, 2018 at 10:23 vote accept cordelia
Mar 10, 2018 at 10:23
Mar 10, 2018 at 10:20 answer added Jens timeline score: 1
Mar 10, 2018 at 10:13 comment added cordelia 'code'original_tweet = "I luv my <3 iphone & you’re awsm apple. DisplayIsAwesome, sooo happppppy 🙂 apple.com" tweet = html.parser.unescape(original_tweet) print (tweet) 'code'
Mar 10, 2018 at 10:13 comment added cordelia Do you recommend my using a parser on original_tweet and then applying your encode and decode code to that?
Mar 10, 2018 at 10:12 comment added cordelia That seems to work except it does not correct the &lt and &amp the way it is corrected on the website.
Mar 10, 2018 at 10:06 comment added Jens If you can live some data loss then original_tweet.encode("utf-8").decode("ascii", errors="ignore") should work. First, encode() the string into an array of bytes, then decode() that array and dismiss possible decode errors.
Mar 10, 2018 at 10:01 comment added cordelia Thank you @Jens but how do I do this in Python 3.6 everything seems to be done in Python 2.7.
Mar 10, 2018 at 9:58 comment added Jens @cordelia, that’s unlikely to work considering the Unicode emoji 🙂 in the original string which can not be represented in plain ASCII. Take a look at this or this question to convert the UTF8 encoded string original_tweet into a plain ASCII string.
Mar 10, 2018 at 9:56 comment added cordelia I thinnk ascii is the format
Mar 10, 2018 at 9:51 comment added Jens @cordelia Change the encoding format to what?
Mar 10, 2018 at 9:50 comment added cordelia I need to transform the data and change the encoding format.
Mar 10, 2018 at 9:49 comment added Jens @cordelia what are you trying to achieve? Every string in Py3+ is a UTF8 encoded Unicode string already.
Mar 10, 2018 at 9:48 comment added cordelia How do I tweak this for Python 3.6 then? SHould I put a u in front of the original_tweet code?
Mar 10, 2018 at 9:48 history rollback jonrsharpe
Rollback to Revision 2
Mar 10, 2018 at 9:48 comment added cordelia Thanks a lot for your comments. This has put me on the right track and prevented me from going round in circles.
Mar 10, 2018 at 9:47 comment added Ulrich Eckhardt I believe that the code on that website was written for Python 2. There, a regular string (without u prefix) is a byte sequence, which can be decoded.
Mar 10, 2018 at 9:44 comment added phihag @cordelia That website's code does not make any sense. If your original_tweet value is a character string already, there's no need to encode or decode it. If it's a byte string (i.e. a bytes object), decode it once to get a character string.
S Mar 10, 2018 at 9:43 history suggested baduker CC BY-SA 3.0
Minor style improvements.
Mar 10, 2018 at 9:43 comment added Jens You can not use str.encode() and bytes.decode() to handle the HTML entities < and & if that’s what you’re trying to do. Look into libs like Parsing HTML with lxml for that (based on you importing a HTML parser). However, your string original_tweet isn’t proper HTML, so you may consider fudging that first...
Mar 10, 2018 at 9:41 comment added cordelia This is the website I am following and am unable to understand what is going on: analyticsvidhya.com/blog/2014/11/…
Mar 10, 2018 at 9:39 comment added Ulrich Eckhardt You seem to be confused what a string in Python represents and what encoding or decoding does. Encoding turns a string into bytes, decoding the opposite. In that light, your call doesn't make sense and hence it also fails.
Mar 10, 2018 at 9:38 review Suggested edits
S Mar 10, 2018 at 9:43
Mar 10, 2018 at 9:38 history edited jonrsharpe CC BY-SA 3.0
deleted 17 characters in body
Mar 10, 2018 at 9:36 history asked cordelia CC BY-SA 3.0
toggle format

AltStyle によって変換されたページ (->オリジナル) /