Linked Questions

54 questions linked to/from Replace non-ASCII characters with a single space

41 votes

1 answer

69k views

Removing unicode \u2026 like characters in a string in python2.7 [duplicate]

I have a string in python2.7 like this, This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying! How do i convert it to this, This is some text that has to be cleaned! its ...

Sandeep Raju Prabhakar's user avatar

Sandeep Raju Prabhakar

20.5k

asked Mar 10, 2013 at 10:17

2 votes

1 answer

25k views

Remove unicode characters python [duplicate]

I am pulling tweets in python using tweepy. It gives the entire data in type unicode. Eg: print type(data) gives me <type 'unicode'> It contains unicode characters in it. Eg: hello\u2026 im am ...

ashish1512's user avatar

ashish1512

asked May 5, 2016 at 7:43

-2 votes

2 answers

2k views

Removing  from the text [duplicate]

I am converting a word file to text string using Python. The resultant text string has Bullet points (in word file) converted to  (in converted string). How can I remove this from the text string ...

Srinivasan A's user avatar

Srinivasan A

asked Jun 30, 2016 at 10:23

0 votes

2 answers

668 views

Removing non-supported unicode characters using a list comprehension [duplicate]

I'm trying to write an algorithm to remove non-ASCII characters from a list of strings of text. I put together the list by scraping paragraphs from a web page and adding them to a list. To do this, I ...

pancham2016's user avatar

pancham2016

asked Mar 7, 2021 at 5:08

0 votes

0 answers

62 views

how to turn characters in wrong codec into space in python? [duplicate]

I need to read data from an external source, which is from MS. As you know, MS likes to embed binary to simple text, so sometimes I run into trouble when I encounter such issue: UnicodeEncodeError: ...

Jason Hu's user avatar

Jason Hu

6,363

asked Jun 8, 2015 at 17:38

386 votes

16 answers

549k views

How to remove \xa0 from string in Python?

I am currently using Beautiful Soup to parse an HTML file and calling get_text(), but it seems like I'm being left with a lot of \xa0 Unicode representing spaces. Is there an efficient way to remove ...

zhuyxn's user avatar

zhuyxn

7,121

asked Jun 12, 2012 at 9:12

270 votes

15 answers

305k views

How to check if a string in Python is in ASCII?

I want to I check whether a string is in ASCII or not. I am aware of ord(), however when I try ord('é'), I have TypeError: ord() expected a character, but string of length 2 found. I understood it is ...

Nico's user avatar

Nico

2,729

asked Oct 13, 2008 at 0:13

193 votes

8 answers

338k views

NameError: global name 'unicode' is not defined - in Python 3

I am trying to use a Python package called bidi. In a module in this package (algorithm.py) there are some lines that give me error, although it is part of the package. Here are the lines: # utf-8 ? ...

TJ1's user avatar

TJ1

8,710

asked Nov 9, 2013 at 14:51

115 votes

16 answers

196k views

Stripping non printable characters from a string in python

I use to run $s =~ s/[^[:print:]]//g; on Perl to get rid of non printable characters. In Python there's no POSIX regex classes, and I can't write [:print:] having it mean what I want. I know of no ...

Vinko Vrsalovic's user avatar

Vinko Vrsalovic

342k

asked Sep 18, 2008 at 13:17

139 votes

8 answers

263k views

How can I remove non-ASCII characters but leave periods and spaces?

I'm working with a .txt file. I want a string of the text from the file with no non-ASCII characters. However, I want to leave spaces and periods. At present, I'm stripping those too. Here's the code: ...

user avatar

user1120342

asked Dec 31, 2011 at 18:23

112 votes

13 answers

234k views

How to make the python interpreter correctly handle non-ASCII characters in string operations?

I have a string that looks like so: 6Â 918Â 417Â 712 The clear cut way to trim this string (as I understand Python) is simply to say the string is in a variable called s, we get: s.replace('Â ', '') ...

adergaard's user avatar

adergaard

1,231

asked Aug 27, 2009 at 15:53

113 votes

7 answers

34k views

List comprehension without [ ] in Python [duplicate]

Joining a list: >>> ''.join([ str(_) for _ in xrange(10) ]) '0123456789' join must take an iterable. Apparently, join's argument is [ str(_) for _ in xrange(10) ], and it's a list ...

Alcott's user avatar

Alcott

18.8k

asked Jan 30, 2012 at 7:29

26 votes

6 answers

47k views

efficiently replace bad characters

I often work with utf-8 text containing characters like: \xc2\x99 \xc2\x95 \xc2\x85 etc These characters confuse other libraries I work with so need to be replaced. What is an ...

hoju's user avatar

hoju

29.7k

asked Jul 7, 2011 at 11:31

15 votes

3 answers

49k views

Python - Unicode to ASCII conversion

I am unable to convert the following Unicode to ASCII without losing data: u'ABRA\xc3O JOS\xc9' I tried encode and decode and they won’t do it. Does anyone have a suggestion?

Adriano Almeida's user avatar

Adriano Almeida

5,396

asked Oct 22, 2013 at 20:05

4 votes

3 answers

18k views

Removing non-ascii characters in a csv file

I am currently inserting data in my django models using csv file. Below is a simple save function that am using: def save(self): myfile = file.csv data = csv.reader(myfile, delimiter=',', quotechar='"...

Njogu Mbau's user avatar

Njogu Mbau

asked Aug 29, 2013 at 22:43

15 30 50 per page

2 3 4 Next

CollectivesTM on Stack Overflow