8
\$\begingroup\$

The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears. How can I further simplify this code?

def censor(text,word):
 text_list = text.split()
 for i in range(0,len(text_list)):
 if text_list[i] == word:
 text_list[i] = '*' * len(text_list[i])
 else:
 continue
 return" ".join(text_list)
200_success
146k22 gold badges190 silver badges479 bronze badges
asked Aug 27, 2017 at 10:18
\$\endgroup\$
3
  • 2
    \$\begingroup\$ Why not use the replace method of strings? \$\endgroup\$ Commented Aug 27, 2017 at 10:51
  • \$\begingroup\$ Simple string replacement would be one option and I would even think about regular expressions to cover special cases like different whitespace separators. \$\endgroup\$ Commented Aug 27, 2017 at 11:27
  • 3
    \$\begingroup\$ Your code does not deal with punctuation. For insance, if the word to be censored is at the end of a sentence, .split() will not seperate the word from the punctuation mark, meaning it is not matched by your if-clause. \$\endgroup\$ Commented Aug 27, 2017 at 12:45

3 Answers 3

8
\$\begingroup\$

As mentioned in the comments, you could use str.replace() to replace all occurrences of a certain substring:

def censor(text, word):
 return text.replace(word, "*"*len(word))

To ensure that only entire words are replaced, though, you should use regular expressions:

import re
def censor(text, word):
 pattern = r"\b" + re.escape(word) + r"\b"
 return re.sub(pattern, "*"*len(word), text)

Unlike your original code, the regular expression solution also handles words that have punctuation attached.


Some other things to think about:

  • Your else: continue statement can be omitted;
  • Your function does not have a docstring. If you plan on making it a public function (as part of an API, for example), you should explain what arguments it expects and what it returns. If you plan on using it internally, at the very least, provide a single-line docstring explaining what the function does.
  • range() assumes start is 0 if no explicit start is passed, thus you can change it to for i in range(len(text_list)):
200_success
146k22 gold badges190 silver badges479 bronze badges
answered Aug 27, 2017 at 11:47
\$\endgroup\$
0
4
\$\begingroup\$

Well, first thing first, you should know that in Python, you can iterate over the words in the text and their indexes by using enumerate. That is, you might want to do something like this:

def censor(text, word):
 text = text.split()
 for count, part in enumerate(text):
 if part == word:
 text[count] = '*' * len(part)
 return ' '.join(text)

As you can see, the else clause is not necessary anymore (it wasn't necessary before either) as we're only changing a word if the condition is met.

The above piece of code might be better re-written as a list comprehension:

def censor(list_text, word):
 return ' '.join(['*' * len(part)
 if part == word else part
 for count, part in enumerate(list_text)])

Which you can use it like this:

print(censor('some string some other string string'.split(), 'string'))

Output:

some ****** some other ****** ******

More, in Python, the indentation should consist of 4 spaces (not 2). After each , you should put a space. I'd also recommend you stick to a constant censored length and avoid calculating each time how many * you should put in there.

answered Aug 27, 2017 at 11:46
\$\endgroup\$
1
  • 3
    \$\begingroup\$ I feel like the .split should happen in the function, not in the calling code. Or you should return an array, rather than a string. Nothing in the function name suggests that it should change the type. \$\endgroup\$ Commented Aug 27, 2017 at 12:13
2
\$\begingroup\$

A more robust option would be to properly handle the natural language with nltk - tokenize a sentence into words, make replacements and detokenize back into a string:

from nltk import word_tokenize
from nltk.tokenize.moses import MosesDetokenizer
def censor(text, word):
 words = word_tokenize(text)
 replacement = "*" * len(word)
 words = [replacement if current_word == word else current_word 
 for current_word in words]
 detokenizer = MosesDetokenizer()
 return detokenizer.detokenize(words, return_str=True)

Demo:

In [1]: sentence = 'some string some other string string'
In [2]: censor(sentence, 'string')
Out[2]: 'some ****** some other ****** ******'
In [3]: sentence = 'The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears.'
In [4]: censor(sentence, 'word')
Out[4]: 'The following function censors a single **** of choice in a sentence with asterisks, regardless of how many times the **** appears.'

One of the bonus advantages of solving it with nltk, aside from simplicity of the solution and not worrying about punctuation, is that you can take it a step further and categorize/tag words and replace words based on an assigned tag or category.

answered Aug 27, 2017 at 13:19
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.