If a string T of length K appears K or more times in a string S, then it is potentially communistic. For example, 10 in 10/10 is potentially communistic, for it appears 2 times and is of length 2. Note that these substrings cannot overlap.
A communistic transformation is one that takes this string T and moves each character ti of T to the i occurrence of T in S. So, for the previous example, the communistic transformation would yield 1/0; the first char of 10 replaces 10 the first time 10 is found, and 0 the second time.
A communistic normalization is a function that takes all such strings T with K ≥ 2 and performs a communistic transformation on them.
Some specifics on the algorithm:
- Perform communistic transformations on the longest valid strings T first. Favor the first occurrences of T.
- Then, perform communistic transformations on the next-longest strings, then the next-next-longest... etc.
- Repeat until no such strings exist in the string.
Note that some strings, such as the "Hello, Hello" example in the test cases, can be interpreted two different ways. You can use ell for T, but you can also use llo. In this case, your code can choose either option. The shown test case uses llo, but you may get a different and equally valid output.
Your task is to implement communistic normalization. The input will only ever consist of printable ASCII characters (0x20 to 0x7E, space to tilde). You may write a program or function to solve this task; the input may be taken as a line from STDIN, string/character array argument, argument from ARGV, etc.
Test cases
'123' -> '123'
'111' -> '111'
'1111' -> '11'
'ABAB' -> 'AB'
'111111111' -> '111'
'asdasdasd' -> 'asd'
'10/10' -> '1/0'
'100/100+100' -> '1/0+0'
' + + ' -> ' + '
'Hello, hello, dear fellow!' -> 'Hel he, dear feow!' OR 'Heo hl, dear flow!'
'11122333/11122333/11122333' -> '112/13' OR '132/23'
'ababab1ababab' -> 'a1bab'
'1ab2ab3ab4ab5ab6' -> '1a2b3a4b5ab6'
Worked out test case
Format is 'string', 'substring', at each step of replacement. Replaced bits are bracketed.
'11[122]333/11[122]333/11[122]333', '122'
'111[333]/112[333]/112[333]', '333'
'1113/11[23]/11[23]', '23'
'11[13]/112/1[13]', '13'
'1[11]/[11]2/13', '11'
'1[/1]12[/1]3', '/1'
'112/13', ''
Another test case:
'Hello, hello, dear fellow!', 'llo'
'Hel, hel, dear feow!', 'l,'
'Hel he, dear feow!', ''
Reference code (Python)
You may find this useful to visualize the algorithm.
#!/usr/bin/env python
import re
def repeater(string):
def repeating_substring(substring):
return (string.count(substring) == len(substring)) and string.count(substring) > 1
return repeating_substring
def get_substrings(string):
j = 1
a = set()
while True:
for i in range(len(string) - j+1):
a.add(string[i:i+j])
if j == len(string):
break
j += 1
return list(a)
def replace_each_instance(string, substring):
assert `string`+',', `substring`
for i in substring:
string = re.sub(re.escape(substring), i, string, 1)
return string
def main(s):
repeats = repeater(s)
repeating_substr = filter(repeater(s), get_substrings(s))
while repeating_substr:
repeating_substr.sort(lambda x,y: cmp(len(y), len(x)))
s = replace_each_instance(s, repeating_substr[0])
repeating_substr = filter(repeater(s), get_substrings(s))
return s
assert main('123') == '123'
assert main('111') == '111'
assert main('1111') == '11'
assert main('ABAB') == 'AB'
assert main('111111111') == '111'
assert main('asdasdasd') == 'asd'
assert main('10/10') == '1/0'
assert main('100/100+100') == '1/0+0'
assert main(' + + ') == ' + '
assert main('Hello, hello, dear fellow!') == 'Hel he, dear feow!'
assert main('11122333/11122333/11122333') == '112/13'
Thanks to @ConorO'Brien for posting the original idea of this challenge.
3 Answers 3
JavaScript (ES6), 121 bytes
f=(s,j=2,o,m=s.match(`(.{${j}})(.*\1円){${(j-1)}}`))=>m?f(s,j+1,s.split(m[1]).map((e,i)=>e+(m[1][i]||'')).join``):o?f(o):s
Recursively matches the pattern:
(.{2})(.*1円){1} //2 characters, repeated 1 time
(.{3})(.*1円){2} //3 characters, repeated 2 times
(.{4})(.*1円){3} //4 characters, repeated 3 times
etc.
... until the pattern isn't found. (This guarantees that the longest string is handled first.)
It then performs the "communistic transformations" on the last-found pattern, by splitting on the match, and joining on each of the match's characters. (map is used for this purpose. Too bad join doesn't take a callback.)
It finally recurses on this new string until it is no longer communistic.
Test cases:
f=(s,j=2,o,m=s.match(`(.{${j}})(.*\1円){${(j-1)}}`))=>m?f(s,j+1,s.split(m[1]).map((e,i)=>e+(m[1][i]||'')).join``):o?f(o):s
console.log(f('123')); //123
console.log(f('111')); //111
console.log(f('1111')); //11
console.log(f('ABAB')); //AB
console.log(f('111111111')); //111
console.log(f('asdasdasd')); //asd
console.log(f('10/10')); //1/0
console.log(f('100/100+100')); //1/0+0
console.log(f(' + + ')); // +
console.log(f('Hello, hello, dear fellow!')); //Heo hl, dear flow!
console.log(f('11122333/11122333/11122333')); //132/23
Pyth, 22 bytes
u.xse.iLcGdf>cGTlTt#.:
To actually see what the program is doing, check out this:
In particular, the program always uses the final-occuring replacement of the longest replacements.
Explanation:
u.xse.iLcGdf>cGTlTt#.:
u.xse.iLcGdf>cGTlTt#.:G)GQ Implicit
u Q Starting with the input, repeat the following
until a fixed point is reached.
.:G) Construct all substrings of the current value
ordered smallest to largest, front to back.
t# Filter on having more than 1 element.
These are the eligible substrings.
f Filter these substrings on
cGT Chop the current value on the substring,
> lT Then remove the first len(substring) pieces.
The result is nonempty if the substring is
one we're looking for.
Chopping gives nonoverlapping occurrences.
.iL Interlace the substrings with
cGd Chop the current value on that substring
se Take the final result, make it a string.
.x G If there weren't any, the above throws an error,
So keep the current value to halt.
Clean, (削除) 420 (削除ここまで) ... 368 bytes
import StdEnv,StdLib
l=length
%q=any((==)q)
?_[]=[]
?a[(y,x):b]|isPrefixOf a[x:map snd b]=[y: ?a(drop(l a-1)b)]= ?a b
$s=sortBy(\a b=l a>l b)(flatten[[b: $a]\\(a,b)<-map((flip splitAt)s)[0..l s-1]])
f s#i=zip2[0..]s
#r=filter(\b=l(?b i)>=l b&&l b>1)($s)
|r>[]#h=hd r
#t=take(l h)(?h i)
=f[if(%n t)(h!!hd(elemIndices n t))c\\(n,c)<-i|not(%n[u+v\\u<-t,v<-[1..l h-1]])]=s
-
-
\$\begingroup\$ @Riker interesting, since it's a direct port of the reference solution. I'll delete until it's fixed. \$\endgroup\$Οurous– Οurous2018年01月05日 20:41:12 +00:00Commented Jan 5, 2018 at 20:41
-
\$\begingroup\$ @Riker fixed now. \$\endgroup\$Οurous– Οurous2018年01月06日 01:10:01 +00:00Commented Jan 6, 2018 at 1:10
ababab1ababab,1ab2ab3ab4ab5ab6\$\endgroup\$aboccurs at least twice in both strings. \$\endgroup\$