This is my first Python program in years, it's just a simple string manipulation program. Given a string, determine if it contains repetitions of any substring. For example, testtest
contains two substring test
and nothing else.
def onlySubstrings(stringOrig):
stringCopy = (stringOrig+'.')[:-1] # copy string contents
for i in range(1, len(stringCopy)): # for each character
stringCopy = stringCopy[1:]+stringCopy[0] # shift first char to end
if stringCopy == stringOrig: # if equivalent to original input
return True # it consists of substrings only
else: # otherwise the only substring is the string itself
return False
if __name__ == '__main__':
wordList = ['testtest','testteste','test','tetestst'] # some test words
print('The below words only contain substring repetitions:')
for word in wordList: # for each word
print(onlySubstrings(word), '\t', word)
If you imagine the string as circular unto itself, the string can only contain equal substring(s) if you can move the characters to the left or right and get the same original string in less than len
shifts:
testtest - 0 shifts, original string
esttestt - 1 shift, not original
sttestte - 2 shifts, not original
ttesttes - 3 shifts, not original
testtest - 4 shifts, original again
Since 4 < len
, it contains only substring repetitions.
testteste - 0 shifts, original string
esttestet - 1 shift, not original
...
etesttest - (len-1) shifts, not original
testteste - len shifts, original again
Since we couldn't find the original string again in under len
shifts, it does not only contain repetitions of a substring.
4 Answers 4
The code is relatively concise and easy to read. Here are some points that could be improved:
- A general thing is the amount of commenting, you should generally resort to comments only when it's hard to make the code self-explanatory. You don't really need to comment each line of the code.
stringCopy = (stringOrig+'.')[:-1] # copy string contents
You can make a copy this way insteadstringCopy = stringOrig[:]
.To make the code more readable, you can create a function that shifts a string n characters. This will really help make the code more readable.
The
else
keyword after thefor
loop is not needed since you return anyway if you didn't finish the loop.else
is usually used withbreak
in thefor
loop.The name of the function could be slightly improved, maybe
consists_of_repeated_substrings
. This is longer but is very easy to understand.A note on performance: your algorithm creates a copy of the original string in each iteration. This works fine for shorter strings, but if you wanna make it more efficient you can consider matching the string without making a shifted copy of it.
Incorporating these comments you could rewrite the code as follows:
def shifted_string(str, n):
return str[n:] + str[:n]
def consists_of_repeated_substrings(str):
'''
Checks whether a string consists of a substring repeated multiple times.
Ex. 'testtesttest' is the string 'test' repeated 3 times.
'''
# The string can only contain equal substrings if you can shift it n
# times (0 < n < len) and get the same original string.
for i in range(1, len(str)):
if shifted_string(str, i) == str:
return True
return False
A clever algorithm for doing the same thing is as follows:
double_str = (str + str)[1:-1]
return double_str.find(str) != -1
The idea is similar to yours, so I will leave understanding it as an exercise.
-
\$\begingroup\$ That is quite clever and an impressive truncated-to-one-line. \$\endgroup\$gator– gator2019年05月12日 06:59:18 +00:00Commented May 12, 2019 at 6:59
This can be shortened to a one-liner using regular expressions:
import re
def isMadeFromRepeatedSubstrings(text):
return re.search(r'^(.+?)1円+$', text)
The returned object will evaluate true or false, as in the original, and the substring itself is accessible via .groups(1)
:
>>> isMadeFromRepeatedSubstrings("testtest").groups(1)
('test',)
-
1\$\begingroup\$ This is exactly what I thought of when I read the question title (under HNQ) but I would go with
(.+?)
instead of(.+)
. It will make a difference if the input is something like "TestTestTestTest". \$\endgroup\$41686d6564– 41686d65642019年05月12日 15:46:40 +00:00Commented May 12, 2019 at 15:46 -
\$\begingroup\$ excellent point, edited \$\endgroup\$Oh My Goodness– Oh My Goodness2019年05月12日 16:26:54 +00:00Commented May 12, 2019 at 16:26
-
\$\begingroup\$ Please change your code to
snake_case
so people don't thinkcamelCase
is a widely accepted style in Python. \$\endgroup\$2019年05月12日 17:14:57 +00:00Commented May 12, 2019 at 17:14
My impression is that the code seems good: no suggestions at that level,
other than a very minor one. Python's for-else
structure is a bit of
an oddball: I never use it and a almost never see it used. More to
the point, it adds no clarity in this specific case. Just return False
outside the loop.
Regarding the algorithm, however, I do have a suggestion. It seems fairly low level (in the sense of mucking around with character shifting and copying), not super easy to explain, and it did not seem intuitive at first glance to me.
A different approach is to rely on string multiplication: if the
full string equals exactly N copies of a substring, then return True
.
You just need to loop over all possible substring lengths. For example:
def onlySubstrings(orig):
orig_len = len(orig)
max_len = int(orig_len / 2)
for i in range(1, max_len + 1):
substr = orig[0:i]
n_copies = int(orig_len / i)
if orig == substr * n_copies:
return True
return False
On top of the other great answers, here are a few additional comments.
Style
There is a Style Guide for Python code called PEP 8 and I'd recommend reading it and trying to follow it more or less strictly. It your case, you could change the functions/variables names.
Better tests
Your test suite can be improved with a few simple details:
add the edge cases (strings of length 0 or 1) - also, these would need to be documented in the function docstring
add to your structure the expected output so that you can check the result automatically
You could write something like:
def unit_test_only_substring():
# test cases - list of (input, expected_output)
tests = [
# Edge cases
('', False), # To be confirmed
('t', False), # To be confirmed
# No repetition
('ab', False),
('tetestst', False),
('testteste', False),
# Repetition
('tt', True),
('ttt', True),
('testtest', True),
('testtesttest', True),
]
for str_input, expected_out in tests:
out = onlySubstrings(str_input)
print(out, '\t', expected_out, '\t', str_input)
assert out == expected_out
if __name__ == '__main__':
unit_test_only_substring()
You could go further and use a proper unit-test framework.
More code improvement
(Based on Hesham Attia)
This is great chance to learn about Python builtins all
and any
:
def onlySubstrings(string):
# The string can only contain equal substrings if you can shift it n
# times (0 < n < len) and get the same original string.
return any(shiftedString(string, i) == string for i in range(1, len(string)))
len-1
times, rather I need to shiftceil(len/2)
times. Saves a little time. \$\endgroup\$i
does not evenly dividelen(str)
since doing so would attempt to match partials which we know won't be true \$\endgroup\$