The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears. How can I further simplify this code?
def censor(text,word):
text_list = text.split()
for i in range(0,len(text_list)):
if text_list[i] == word:
text_list[i] = '*' * len(text_list[i])
else:
continue
return" ".join(text_list)
3 Answers 3
As mentioned in the comments, you could use str.replace()
to replace all occurrences of a certain substring:
def censor(text, word):
return text.replace(word, "*"*len(word))
To ensure that only entire words are replaced, though, you should use regular expressions:
import re
def censor(text, word):
pattern = r"\b" + re.escape(word) + r"\b"
return re.sub(pattern, "*"*len(word), text)
Unlike your original code, the regular expression solution also handles words that have punctuation attached.
Some other things to think about:
- Your
else: continue
statement can be omitted; - Your function does not have a docstring. If you plan on making it a public function (as part of an API, for example), you should explain what arguments it expects and what it returns. If you plan on using it internally, at the very least, provide a single-line docstring explaining what the function does.
range()
assumesstart
is0
if no explicitstart
is passed, thus you can change it tofor i in range(len(text_list)):
Well, first thing first, you should know that in Python, you can iterate over the words in the text and their indexes by using enumerate
. That is, you might want to do something like this:
def censor(text, word):
text = text.split()
for count, part in enumerate(text):
if part == word:
text[count] = '*' * len(part)
return ' '.join(text)
As you can see, the else
clause is not necessary anymore (it wasn't necessary before either) as we're only changing a word if the condition is met.
The above piece of code might be better re-written as a list comprehension:
def censor(list_text, word):
return ' '.join(['*' * len(part)
if part == word else part
for count, part in enumerate(list_text)])
Which you can use it like this:
print(censor('some string some other string string'.split(), 'string'))
Output:
some ****** some other ****** ******
More, in Python, the indentation should consist of 4 spaces (not 2). After each ,
you should put a space. I'd also recommend you stick to a constant censored length and avoid calculating each time how many * you should put in there.
-
3\$\begingroup\$ I feel like the .split should happen in the function, not in the calling code. Or you should return an array, rather than a string. Nothing in the function name suggests that it should change the type. \$\endgroup\$anon– anon2017年08月27日 12:13:09 +00:00Commented Aug 27, 2017 at 12:13
A more robust option would be to properly handle the natural language with nltk
- tokenize a sentence into words, make replacements and detokenize back into a string:
from nltk import word_tokenize
from nltk.tokenize.moses import MosesDetokenizer
def censor(text, word):
words = word_tokenize(text)
replacement = "*" * len(word)
words = [replacement if current_word == word else current_word
for current_word in words]
detokenizer = MosesDetokenizer()
return detokenizer.detokenize(words, return_str=True)
Demo:
In [1]: sentence = 'some string some other string string'
In [2]: censor(sentence, 'string')
Out[2]: 'some ****** some other ****** ******'
In [3]: sentence = 'The following function censors a single word of choice in a sentence with asterisks, regardless of how many times the word appears.'
In [4]: censor(sentence, 'word')
Out[4]: 'The following function censors a single **** of choice in a sentence with asterisks, regardless of how many times the **** appears.'
One of the bonus advantages of solving it with nltk
, aside from simplicity of the solution and not worrying about punctuation, is that you can take it a step further and categorize/tag words and replace words based on an assigned tag or category.
replace
method of strings? \$\endgroup\$