Skip to main content
Stack Overflow
  1. About
  2. For Teams

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Replace non-ASCII characters with a single space

I need to replace all non-ASCII (\x00-\x7F) characters with a space. I'm surprised that this is not dead-easy in Python, unless I'm missing something. The following function simply removes all non-ASCII characters:

def remove_non_ascii_1(text):
 return ''.join(i for i in text if ord(i)<128)

And this one replaces non-ASCII characters with the amount of spaces as per the amount of bytes in the character code point (i.e. the character is replaced with 3 spaces):

def remove_non_ascii_2(text):
 return re.sub(r'[^\x00-\x7F]',' ', text)

How can I replace all non-ASCII characters with a single space?

Of the myriad of similar SO questions, none address character replacement as opposed to stripping, and additionally address all non-ascii characters not a specific character.

Answer*

Draft saved
Draft discarded
Cancel
5
  • 2
    Though this is rather inelegant, it is very readable. Thank you. Commented Aug 21, 2016 at 6:28
  • 1
    +1 for unicode handling... @dotancohen IMNSHO "readable" implies "practical" which adds to "elegant", so i'd say "a bit inelegant" Commented Oct 1, 2016 at 0:00
  • notional -1 for calling non-ascii characters "trash" Commented Feb 10, 2022 at 21:30
  • @axolotl I meant no offense. If I recall correctly when I was writing it I was indeed dealing with very weird characters that are not from any alphabet. Commented Feb 10, 2022 at 23:01
  • 1
    I know :) it's a light hearted comment Commented Feb 10, 2022 at 23:02

lang-py

AltStyle によって変換されたページ (->オリジナル) /