Timeline for Decoding and Encoding in Python

Current License: CC BY-SA 3.0

27 events

when toggle format	what	by	license	comment
Mar 10, 2018 at 11:19	vote	accept	cordelia
Mar 10, 2018 at 11:05	comment	added	cordelia		How do I avoid losing the 're in the you're ? I have reposted this because I posted it in the Answer section by mistake.
Mar 10, 2018 at 10:23	vote	accept	cordelia
Mar 10, 2018 at 10:56
Mar 10, 2018 at 10:23	vote	accept	cordelia
Mar 10, 2018 at 10:23
Mar 10, 2018 at 10:20	answer	added	Jens		timeline score: 1
Mar 10, 2018 at 10:13	comment	added	cordelia		'code'original_tweet = "I luv my <3 iphone & you’re awsm apple. DisplayIsAwesome, sooo happppppy 🙂 apple.com" tweet = html.parser.unescape(original_tweet) print (tweet) 'code'
Mar 10, 2018 at 10:13	comment	added	cordelia		Do you recommend my using a parser on original_tweet and then applying your encode and decode code to that?
Mar 10, 2018 at 10:12	comment	added	cordelia		That seems to work except it does not correct the &lt and &amp the way it is corrected on the website.
Mar 10, 2018 at 10:06	comment	added	Jens		If you can live some data loss then `original_tweet.encode("utf-8").decode("ascii", errors="ignore")` should work. First, encode() the string into an array of bytes, then decode() that array and dismiss possible decode errors.
Mar 10, 2018 at 10:01	comment	added	cordelia		Thank you @Jens but how do I do this in Python 3.6 everything seems to be done in Python 2.7.
Mar 10, 2018 at 9:58	comment	added	Jens		@cordelia, that’s unlikely to work considering the Unicode emoji 🙂 in the original string which can not be represented in plain ASCII. Take a look at this or this question to convert the UTF8 encoded string `original_tweet` into a plain ASCII string.
Mar 10, 2018 at 9:56	comment	added	cordelia		I thinnk ascii is the format
Mar 10, 2018 at 9:51	comment	added	Jens		@cordelia Change the encoding format to what?
Mar 10, 2018 at 9:50	comment	added	cordelia		I need to transform the data and change the encoding format.
Mar 10, 2018 at 9:49	comment	added	Jens		@cordelia what are you trying to achieve? Every string in Py3+ is a UTF8 encoded Unicode string already.
Mar 10, 2018 at 9:48	comment	added	cordelia		How do I tweak this for Python 3.6 then? SHould I put a u in front of the original_tweet code?
Mar 10, 2018 at 9:48	history	rollback	jonrsharpe		Rollback to Revision 2
Mar 10, 2018 at 9:48	comment	added	cordelia		Thanks a lot for your comments. This has put me on the right track and prevented me from going round in circles.
Mar 10, 2018 at 9:47	comment	added	Ulrich Eckhardt		I believe that the code on that website was written for Python 2. There, a regular string (without `u` prefix) is a byte sequence, which can be decoded.
Mar 10, 2018 at 9:44	comment	added	phihag		@cordelia That website's code does not make any sense. If your `original_tweet` value is a character string already, there's no need to encode or decode it. If it's a byte string (i.e. a `bytes` object), `decode` it once to get a character string.
S Mar 10, 2018 at 9:43	history	suggested	baduker	CC BY-SA 3.0	Minor style improvements.
Mar 10, 2018 at 9:43	comment	added	Jens		You can not use `str.encode()` and `bytes.decode()` to handle the HTML entities `<` and `&` if that’s what you’re trying to do. Look into libs like Parsing HTML with lxml for that (based on you importing a HTML parser). However, your string `original_tweet` isn’t proper HTML, so you may consider fudging that first...
Mar 10, 2018 at 9:41	comment	added	cordelia		This is the website I am following and am unable to understand what is going on: analyticsvidhya.com/blog/2014/11/…
Mar 10, 2018 at 9:39	comment	added	Ulrich Eckhardt		You seem to be confused what a string in Python represents and what encoding or decoding does. Encoding turns a string into bytes, decoding the opposite. In that light, your call doesn't make sense and hence it also fails.
Mar 10, 2018 at 9:38	review	Suggested edits
S Mar 10, 2018 at 9:43
Mar 10, 2018 at 9:38	history	edited	jonrsharpe	CC BY-SA 3.0	deleted 17 characters in body
Mar 10, 2018 at 9:36	history	asked	cordelia	CC BY-SA 3.0

toggle format

CollectivesTM on Stack Overflow

Timeline for Decoding and Encoding in Python

Current License: CC BY-SA 3.0