11

I have a python script that connects to the Twitter Firehose and sends data downstream for processing. Before it was working fine, but now I'm trying to get only the text body. (It's not a question about how I should extract data from Twitter or how do encode/decode ascii characters). So when I launch my script directly like this:

python -u fetch_script.py

It works just fine, and I can see messages are coming to the screen. For example:

root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py 
Cuz I'm checking you out >on Facebook<
RT @SearchlightNV: #BarryLies👳🎌 has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A...
"Why do men chase after women? Because they fear death."~Moonstruck
RT @SearchlightNV: #BarryLies👳🎌 has crapped on all honest patriotic hard-working citizens in the USA but his abuse of WWII Vets is sick #2A...
Never let anyone tell you not to chase your dreams. My sister came home crying today, because someone told her she's not good enough.
"I can't even ask anyone out on a date because if it doesn't end up in a high speed chase, I get bored."
RT @ColIegeStudent: Double-checking the attendance policy while still in bed
Well I just handed my life savings to ya.. #trustingyou #abouttomakebankkkkk
Zillow $Z and Redfin useless to Wells Fargo Home Mortgage, $WFC, and FannieMae $FNM. Sale history LTV now 48%, 360ドル appraisal fee 4 no PMI.
The latest Dump and Chase Podcast http://somedomain.com/viaRSA9W3i check it out and subscribe on iTunes, or your favorite android app #Isles

but if I try to output them to the file like this:

python -u fetch_script.py >fetch_output.txt

it immediately throws an error:

root@domU-xx-xx-xx-xx:/usr/local/streaming# python -u fetch_script.py >fetch_output.txt
ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper
 callback(*args)
 File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
 raise_exc_info(exc)
 File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
 ret = fn(*args, **kwargs)
 File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json
 self.parse_response(response)
 File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response
 self._callback(response)
 File "fetch_script.py", line 57, in callback
 print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)
ERROR:tornado.application:Exception in callback <functools.partial object at 0x187c2b8>
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in _run_callback
 callback()
 File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
 raise_exc_info(exc)
 File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
 ret = fn(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/tornado/iostream.py", line 341, in wrapper
 callback(*args)
 File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 331, in wrapped
 raise_exc_info(exc)
 File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 302, in wrapped
 ret = fn(*args, **kwargs)
 File "/usr/local/streaming/twitter-stream.py", line 203, in parse_json
 self.parse_response(response)
 File "/usr/local/streaming/twitter-stream.py", line 226, in parse_response
 self._callback(response)
 File "fetch_script.py", line 57, in callback
 print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128)

P.S

Little more context:

An error is happening in callback function:

def callback(self, message):
 if message:
 msg = message
 msg_props = pika.BasicProperties()
 msg_props.content_type = 'application/text'
 msg_props.delivery_mode = 2
 #print self.count
 print msg['text']
 #self.count += 1
 ...

However If I remove ['text'] and would live only print msg both cases are working like a charm.

Rob Bednark
28.8k28 gold badges90 silver badges131 bronze badges
asked Oct 2, 2013 at 19:24
4
  • 1
    You get the same problem with a simple script: print u'\u2026', so don't worry about adding context! The problem is that python sets up an output encoding when you write to a terminal but not when you write to a file. I'm not sure what current best practice for fixing it is and am interested in the answers. Commented Oct 2, 2013 at 19:48
  • this is a good point, have to google it, but why I do not have problems when I insert the whole payload to the file??? like I explained in P.S section. Commented Oct 2, 2013 at 19:51
  • that's because you printed the string representation of the dict. print {'text':u'2026円'} outputs {'text': u'\x826'}, that is, its printing an ascii view of the escaped unicode character. Commented Oct 2, 2013 at 20:00
  • Possible duplicate of Setting the correct encoding when piping stdout in Python Commented Jan 10, 2016 at 23:15

1 Answer 1

14

Since nobody's jumped in yet, here's my shot. Python sets stdout's encoding when writing to a console but not when writing to a file. This script reproduces the problem:

import sys
msg = {'text':u'2026円'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
print msg['text']

when running the above shows the error:

$ python bad.py>/tmp/xxx
default encoding: None
Traceback (most recent call last):
 File "fix.py", line 5, in <module>
 print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\x82' in position 0: ordinal not in range(128)

Adding the encoding to the above script:

import sys
msg = {'text':u'2026円'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
encoding = sys.stdout.encoding or 'utf-8'
print msg['text'].encode(encoding)

and the problem is solved:

$ python good.py >/tmp/xxx
default encoding: None
$ cat /tmp/xxx
6
Rob Bednark
28.8k28 gold badges90 silver badges131 bronze badges
answered Oct 2, 2013 at 20:45
Sign up to request clarification or add additional context in comments.

4 Comments

Man you ROCK! thank you so much! )) I was breaking my head how to do this.
Thats something not trivial :) Thanks. You saved me a lot of time with your answer
This has been really useful.
Thanks, this works in another context, where I'm using ASCII color codes.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.