Function to censor single word in a sentence

Question 1

The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears. How can I further simplify this code?

def censor(text,word):
 text_list = text.split()
 for i in range(0,len(text_list)):
 if text_list[i] == word:
 text_list[i] = '*' * len(text_list[i])
 else:
 continue
 return" ".join(text_list)

Question 2

Why not use the replace method of strings?

Question 3

Simple string replacement would be one option and I would even think about regular expressions to cover special cases like different whitespace separators.

Question 4

Your code does not deal with punctuation. For insance, if the word to be censored is at the end of a sentence, .split() will not seperate the word from the punctuation mark, meaning it is not matched by your if-clause.

Question 5

As mentioned in the comments, you could use str.replace() to replace all occurrences of a certain substring:

def censor(text, word):
 return text.replace(word, "*"*len(word))

To ensure that only entire words are replaced, though, you should use regular expressions:

import re
def censor(text, word):
 pattern = r"\b" + re.escape(word) + r"\b"
 return re.sub(pattern, "*"*len(word), text)

Unlike your original code, the regular expression solution also handles words that have punctuation attached.

Some other things to think about:

Your else: continue statement can be omitted;
Your function does not have a docstring. If you plan on making it a public function (as part of an API, for example), you should explain what arguments it expects and what it returns. If you plan on using it internally, at the very least, provide a single-line docstring explaining what the function does.
range() assumes start is 0 if no explicit start is passed, thus you can change it to for i in range(len(text_list)):

Question 6

Well, first thing first, you should know that in Python, you can iterate over the words in the text and their indexes by using enumerate. That is, you might want to do something like this:

def censor(text, word):
 text = text.split()
 for count, part in enumerate(text):
 if part == word:
 text[count] = '*' * len(part)
 return ' '.join(text)

As you can see, the else clause is not necessary anymore (it wasn't necessary before either) as we're only changing a word if the condition is met.

The above piece of code might be better re-written as a list comprehension:

def censor(list_text, word):
 return ' '.join(['*' * len(part)
 if part == word else part
 for count, part in enumerate(list_text)])

Which you can use it like this:

print(censor('some string some other string string'.split(), 'string'))

Output:

some ****** some other ****** ******

More, in Python, the indentation should consist of 4 spaces (not 2). After each , you should put a space. I'd also recommend you stick to a constant censored length and avoid calculating each time how many * you should put in there.

Question 7

I feel like the .split should happen in the function, not in the calling code. Or you should return an array, rather than a string. Nothing in the function name suggests that it should change the type.

Question 8

A more robust option would be to properly handle the natural language with nltk - tokenize a sentence into words, make replacements and detokenize back into a string:

from nltk import word_tokenize
from nltk.tokenize.moses import MosesDetokenizer
def censor(text, word):
 words = word_tokenize(text)
 replacement = "*" * len(word)
 words = [replacement if current_word == word else current_word 
 for current_word in words]
 detokenizer = MosesDetokenizer()
 return detokenizer.detokenize(words, return_str=True)

Demo:

In [1]: sentence = 'some string some other string string'
In [2]: censor(sentence, 'string')
Out[2]: 'some ****** some other ****** ******'
In [3]: sentence = 'The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears.'
In [4]: censor(sentence, 'word')
Out[4]: 'The following function censors a single **** of choice in a sentence with asterisks, regardless of how many times the **** appears.'

One of the bonus advantages of solving it with nltk, aside from simplicity of the solution and not worrying about punctuation, is that you can take it a step further and categorize/tag words and replace words based on an assigned tag or category.

Daniel Daniel 4,6122 gold badges18 silver badges40 bronze badges · Answer 1 · 2017-08-27 11:47:18Z

As mentioned in the comments, you could use str.replace() to replace all occurrences of a certain substring:

def censor(text, word):
 return text.replace(word, "*"*len(word))

To ensure that only entire words are replaced, though, you should use regular expressions:

import re
def censor(text, word):
 pattern = r"\b" + re.escape(word) + r"\b"
 return re.sub(pattern, "*"*len(word), text)

Unlike your original code, the regular expression solution also handles words that have punctuation attached.

Some other things to think about:

Your else: continue statement can be omitted;
Your function does not have a docstring. If you plan on making it a public function (as part of an API, for example), you should explain what arguments it expects and what it returns. If you plan on using it internally, at the very least, provide a single-line docstring explaining what the function does.
range() assumes start is 0 if no explicit start is passed, thus you can change it to for i in range(len(text_list)):

score 4 · Answer 2 · 2017-08-27 11:46:02Z

Well, first thing first, you should know that in Python, you can iterate over the words in the text and their indexes by using enumerate. That is, you might want to do something like this:

def censor(text, word):
 text = text.split()
 for count, part in enumerate(text):
 if part == word:
 text[count] = '*' * len(part)
 return ' '.join(text)

As you can see, the else clause is not necessary anymore (it wasn't necessary before either) as we're only changing a word if the condition is met.

The above piece of code might be better re-written as a list comprehension:

def censor(list_text, word):
 return ' '.join(['*' * len(part)
 if part == word else part
 for count, part in enumerate(list_text)])

Which you can use it like this:

print(censor('some string some other string string'.split(), 'string'))

Output:

some ****** some other ****** ******

More, in Python, the indentation should consist of 4 spaces (not 2). After each , you should put a space. I'd also recommend you stick to a constant censored length and avoid calculating each time how many * you should put in there.

I feel like the .split should happen in the function, not in the calling code. Or you should return an array, rather than a string. Nothing in the function name suggests that it should change the type.

alecxe alecxe 17.5k8 gold badges52 silver badges93 bronze badges · Answer 3 · 2017-08-27 13:19:40Z

A more robust option would be to properly handle the natural language with nltk - tokenize a sentence into words, make replacements and detokenize back into a string:

from nltk import word_tokenize
from nltk.tokenize.moses import MosesDetokenizer
def censor(text, word):
 words = word_tokenize(text)
 replacement = "*" * len(word)
 words = [replacement if current_word == word else current_word 
 for current_word in words]
 detokenizer = MosesDetokenizer()
 return detokenizer.detokenize(words, return_str=True)

Demo:

In [1]: sentence = 'some string some other string string'
In [2]: censor(sentence, 'string')
Out[2]: 'some ****** some other ****** ******'
In [3]: sentence = 'The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears.'
In [4]: censor(sentence, 'word')
Out[4]: 'The following function censors a single **** of choice in a sentence with asterisks, regardless of how many times the **** appears.'

One of the bonus advantages of solving it with nltk, aside from simplicity of the solution and not worrying about punctuation, is that you can take it a step further and categorize/tag words and replace words based on an assigned tag or category.

Stack Exchange Network

Function to censor single word in a sentence

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Function to censor single word in a sentence

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions