Linked Questions

41 votes
1 answer
69k views

I have a string in python2.7 like this, This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying! How do i convert it to this, This is some text that has to be cleaned! its ...
2 votes
1 answer
25k views

I am pulling tweets in python using tweepy. It gives the entire data in type unicode. Eg: print type(data) gives me <type 'unicode'> It contains unicode characters in it. Eg: hello\u2026 im am ...
-2 votes
2 answers
2k views

I am converting a word file to text string using Python. The resultant text string has Bullet points (in word file) converted to  (in converted string). How can I remove this from the text string ...
0 votes
2 answers
668 views

I'm trying to write an algorithm to remove non-ASCII characters from a list of strings of text. I put together the list by scraping paragraphs from a web page and adding them to a list. To do this, I ...
0 votes
0 answers
62 views

I need to read data from an external source, which is from MS. As you know, MS likes to embed binary to simple text, so sometimes I run into trouble when I encounter such issue: UnicodeEncodeError: ...
Jason Hu's user avatar
  • 6,363
386 votes
16 answers
549k views

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...
270 votes
15 answers
305k views

I want to I check whether a string is in ASCII or not. I am aware of ord(), however when I try ord('é'), I have TypeError: ord() expected a character, but string of length 2 found. I understood it is ...
Nico's user avatar
  • 2,729
193 votes
8 answers
338k views

I am trying to use a Python package called bidi. In a module in this package (algorithm.py) there are some lines that give me error, although it is part of the package. Here are the lines: # utf-8 ? ...
TJ1's user avatar
  • 8,710
115 votes
16 answers
196k views

I use to run $s =~ s/[^[:print:]]//g; on Perl to get rid of non printable characters. In Python there's no POSIX regex classes, and I can't write [:print:] having it mean what I want. I know of no ...
139 votes
8 answers
263k views

I'm working with a .txt file. I want a string of the text from the file with no non-ASCII characters. However, I want to leave spaces and periods. At present, I'm stripping those too. Here's the code: ...
user avatar
112 votes
13 answers
234k views

I have a string that looks like so: 6Â 918Â 417Â 712 The clear cut way to trim this string (as I understand Python) is simply to say the string is in a variable called s, we get: s.replace('Â ', '') ...
adergaard's user avatar
  • 1,231
113 votes
7 answers
34k views

Joining a list: >>> ''.join([ str(_) for _ in xrange(10) ]) '0123456789' join must take an iterable. Apparently, join's argument is [ str(_) for _ in xrange(10) ], and it's a list ...
Alcott's user avatar
  • 18.8k
26 votes
6 answers
47k views

I often work with utf-8 text containing characters like: \xc2\x99 \xc2\x95 \xc2\x85 etc These characters confuse other libraries I work with so need to be replaced. What is an ...
hoju's user avatar
  • 29.7k
15 votes
3 answers
49k views

I am unable to convert the following Unicode to ASCII without losing data: u'ABRA\xc3O JOS\xc9' I tried encode and decode and they won’t do it. Does anyone have a suggestion?
4 votes
3 answers
18k views

I am currently inserting data in my django models using csv file. Below is a simple save function that am using: def save(self): myfile = file.csv data = csv.reader(myfile, delimiter=',', quotechar='"...

15 30 50 per page
1
2 3 4