Linked Questions
54 questions linked to/from Replace non-ASCII characters with a single space
41
votes
1
answer
69k
views
Removing unicode \u2026 like characters in a string in python2.7 [duplicate]
I have a string in python2.7 like this,
This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!
How do i convert it to this,
This is some text that has to be cleaned! its ...
2
votes
1
answer
25k
views
Remove unicode characters python [duplicate]
I am pulling tweets in python using tweepy.
It gives the entire data in type unicode.
Eg: print type(data) gives me <type 'unicode'>
It contains unicode characters in it.
Eg: hello\u2026 im am ...
-2
votes
2
answers
2k
views
Removing from the text [duplicate]
I am converting a word file to text string using Python. The resultant text string has Bullet points (in word file) converted to (in converted string). How can I remove this from the text string ...
0
votes
2
answers
668
views
Removing non-supported unicode characters using a list comprehension [duplicate]
I'm trying to write an algorithm to remove non-ASCII characters from a list of strings of text. I put together the list by scraping paragraphs from a web page and adding them to a list. To do this, I ...
0
votes
0
answers
62
views
how to turn characters in wrong codec into space in python? [duplicate]
I need to read data from an external source, which is from MS.
As you know, MS likes to embed binary to simple text, so sometimes I run into trouble when I encounter such issue:
UnicodeEncodeError: ...
386
votes
16
answers
549k
views
How to remove \xa0 from string in Python?
I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...
270
votes
15
answers
305k
views
How to check if a string in Python is in ASCII?
I want to I check whether a string is in ASCII or not.
I am aware of ord(), however when I try ord('é'), I have TypeError: ord() expected a character, but string of length 2 found. I understood it is ...
193
votes
8
answers
338k
views
NameError: global name 'unicode' is not defined - in Python 3
I am trying to use a Python package called bidi. In a module in this package (algorithm.py) there are some lines that give me error, although it is part of the package.
Here are the lines:
# utf-8 ? ...
115
votes
16
answers
196k
views
Stripping non printable characters from a string in python
I use to run
$s =~ s/[^[:print:]]//g;
on Perl to get rid of non printable characters.
In Python there's no POSIX regex classes, and I can't write [:print:] having it mean what I want. I know of no ...
139
votes
8
answers
263k
views
How can I remove non-ASCII characters but leave periods and spaces?
I'm working with a .txt file. I want a string of the text from the file with no non-ASCII characters. However, I want to leave spaces and periods. At present, I'm stripping those too. Here's the code:
...
112
votes
13
answers
234k
views
How to make the python interpreter correctly handle non-ASCII characters in string operations?
I have a string that looks like so:
6Â 918Â 417Â 712
The clear cut way to trim this string (as I understand Python) is simply to say the string is in a variable called s, we get:
s.replace('Â ', '')
...
113
votes
7
answers
34k
views
List comprehension without [ ] in Python [duplicate]
Joining a list:
>>> ''.join([ str(_) for _ in xrange(10) ])
'0123456789'
join must take an iterable.
Apparently, join's argument is [ str(_) for _ in xrange(10) ], and it's a list ...
26
votes
6
answers
47k
views
efficiently replace bad characters
I often work with utf-8 text containing characters like:
\xc2\x99
\xc2\x95
\xc2\x85
etc
These characters confuse other libraries I work with so need to be replaced.
What is an ...
15
votes
3
answers
49k
views
Python - Unicode to ASCII conversion
I am unable to convert the following Unicode to ASCII without losing data:
u'ABRA\xc3O JOS\xc9'
I tried encode and decode and they won’t do it.
Does anyone have a suggestion?
4
votes
3
answers
18k
views
Removing non-ascii characters in a csv file
I am currently inserting data in my django models using csv file. Below is a simple save function that am using:
def save(self):
myfile = file.csv
data = csv.reader(myfile, delimiter=',', quotechar='"...