Communistic Substring Normalization

Question 1

If a string T of length K appears K or more times in a string S, then it is potentially communistic. For example, 10 in 10/10 is potentially communistic, for it appears 2 times and is of length 2. Note that these substrings cannot overlap.

A communistic transformation is one that takes this string T and moves each character t_i of T to the i occurrence of T in S. So, for the previous example, the communistic transformation would yield 1/0; the first char of 10 replaces 10 the first time 10 is found, and 0 the second time.

A communistic normalization is a function that takes all such strings T with K ≥ 2 and performs a communistic transformation on them.

Some specifics on the algorithm:

Perform communistic transformations on the longest valid strings T first. Favor the first occurrences of T.
Then, perform communistic transformations on the next-longest strings, then the next-next-longest... etc.
Repeat until no such strings exist in the string.

Note that some strings, such as the "Hello, Hello" example in the test cases, can be interpreted two different ways. You can use ell for T, but you can also use llo. In this case, your code can choose either option. The shown test case uses llo, but you may get a different and equally valid output.

Your task is to implement communistic normalization. The input will only ever consist of printable ASCII characters (0x20 to 0x7E, space to tilde). You may write a program or function to solve this task; the input may be taken as a line from STDIN, string/character array argument, argument from ARGV, etc.

Test cases

'123' -> '123'
'111' -> '111'
'1111' -> '11'
'ABAB' -> 'AB'
'111111111' -> '111'
'asdasdasd' -> 'asd'
'10/10' -> '1/0'
'100/100+100' -> '1/0+0'
' + + ' -> ' + '
'Hello, hello, dear fellow!' -> 'Hel he, dear feow!' OR 'Heo hl, dear flow!'
'11122333/11122333/11122333' -> '112/13' OR '132/23'
'ababab1ababab' -> 'a1bab'
'1ab2ab3ab4ab5ab6' -> '1a2b3a4b5ab6'

Worked out test case

Format is 'string', 'substring', at each step of replacement. Replaced bits are bracketed.

'11[122]333/11[122]333/11[122]333', '122'
'111[333]/112[333]/112[333]', '333'
'1113/11[23]/11[23]', '23'
'11[13]/112/1[13]', '13'
'1[11]/[11]2/13', '11'
'1[/1]12[/1]3', '/1'
'112/13', ''

Another test case:

'Hello, hello, dear fellow!', 'llo'
'Hel, hel, dear feow!', 'l,'
'Hel he, dear feow!', ''

Reference code (Python)

You may find this useful to visualize the algorithm.

#!/usr/bin/env python
import re
def repeater(string):
 def repeating_substring(substring):
 return (string.count(substring) == len(substring)) and string.count(substring) > 1
 return repeating_substring
def get_substrings(string):
 j = 1
 a = set()
 while True:
 for i in range(len(string) - j+1):
 a.add(string[i:i+j])
 if j == len(string):
 break
 j += 1
 return list(a)
def replace_each_instance(string, substring):
 assert `string`+',', `substring`
 for i in substring:
 string = re.sub(re.escape(substring), i, string, 1)
 return string
def main(s):
 repeats = repeater(s)
 repeating_substr = filter(repeater(s), get_substrings(s))
 while repeating_substr:
 repeating_substr.sort(lambda x,y: cmp(len(y), len(x)))
 s = replace_each_instance(s, repeating_substr[0])
 repeating_substr = filter(repeater(s), get_substrings(s))
 return s
assert main('123') == '123'
assert main('111') == '111'
assert main('1111') == '11'
assert main('ABAB') == 'AB'
assert main('111111111') == '111'
assert main('asdasdasd') == 'asd'
assert main('10/10') == '1/0'
assert main('100/100+100') == '1/0+0'
assert main(' + + ') == ' + '
assert main('Hello, hello, dear fellow!') == 'Hel he, dear feow!'
assert main('11122333/11122333/11122333') == '112/13'

Thanks to @ConorO'Brien for posting the original idea of this challenge.

Question 2

Test cases: ababab1ababab, 1ab2ab3ab4ab5ab6

Question 3

Why is there no change? ab occurs at least twice in both strings.

Question 4

@Zgarb looks like my tester is bad, I'll fix it later. Fixing the test cases manually though.

Question 5

JavaScript (ES6), 121 bytes

f=(s,j=2,o,m=s.match(`(.{${j}})(.*\1円){${(j-1)}}`))=>m?f(s,j+1,s.split(m[1]).map((e,i)=>e+(m[1][i]||'')).join``):o?f(o):s

Recursively matches the pattern:

(.{2})(.*1円){1} //2 characters, repeated 1 time 
(.{3})(.*1円){2} //3 characters, repeated 2 times 
(.{4})(.*1円){3} //4 characters, repeated 3 times 
etc.

... until the pattern isn't found. (This guarantees that the longest string is handled first.)

It then performs the "communistic transformations" on the last-found pattern, by splitting on the match, and joining on each of the match's characters. (map is used for this purpose. Too bad join doesn't take a callback.)

It finally recurses on this new string until it is no longer communistic.

Test cases:

f=(s,j=2,o,m=s.match(`(.{${j}})(.*\1円){${(j-1)}}`))=>m?f(s,j+1,s.split(m[1]).map((e,i)=>e+(m[1][i]||'')).join``):o?f(o):s
console.log(f('123')); //123
console.log(f('111')); //111
console.log(f('1111')); //11
console.log(f('ABAB')); //AB
console.log(f('111111111')); //111
console.log(f('asdasdasd')); //asd
console.log(f('10/10')); //1/0
console.log(f('100/100+100')); //1/0+0
console.log(f(' + + ')); // + 
console.log(f('Hello, hello, dear fellow!')); //Heo hl, dear flow!
console.log(f('11122333/11122333/11122333')); //132/23

Question 6

Pyth, 22 bytes

u.xse.iLcGdf>cGTlTt#.:

Test suite

To actually see what the program is doing, check out this:

Internals

In particular, the program always uses the final-occuring replacement of the longest replacements.

Explanation:

u.xse.iLcGdf>cGTlTt#.:
u.xse.iLcGdf>cGTlTt#.:G)GQ Implicit
u Q Starting with the input, repeat the following
 until a fixed point is reached.
 .:G) Construct all substrings of the current value
 ordered smallest to largest, front to back.
 t# Filter on having more than 1 element.
 These are the eligible substrings.
 f Filter these substrings on
 cGT Chop the current value on the substring,
 > lT Then remove the first len(substring) pieces.
 The result is nonempty if the substring is
 one we're looking for. 
 Chopping gives nonoverlapping occurrences.
 .iL Interlace the substrings with
 cGd Chop the current value on that substring
 se Take the final result, make it a string.
 .x G If there weren't any, the above throws an error,
 So keep the current value to halt.

Question 7

Clean, (削除) 420 (削除ここまで) ... 368 bytes

import StdEnv,StdLib
l=length
%q=any((==)q)
?_[]=[]
?a[(y,x):b]|isPrefixOf a[x:map snd b]=[y: ?a(drop(l a-1)b)]= ?a b
$s=sortBy(\a b=l a>l b)(flatten[[b: $a]\\(a,b)<-map((flip splitAt)s)[0..l s-1]])
f s#i=zip2[0..]s
#r=filter(\b=l(?b i)>=l b&&l b>1)($s)
|r>[]#h=hd r
#t=take(l h)(?h i)
=f[if(%n t)(h!!hd(elemIndices n t))c\\(n,c)<-i|not(%n[u+v\\u<-t,v<-[1..l h-1]])]=s

Try it online!

Question 8

This answer is invalid. See here. That should be changed, see the test cases.

Question 9

@Riker interesting, since it's a direct port of the reference solution. I'll delete until it's fixed.

Question 10

@Riker fixed now.

Rick Hitchcock 2,57112 silver badges18 bronze badges · Accepted Answer · 2017-12-18 22:51:52Z

JavaScript (ES6), 121 bytes

f=(s,j=2,o,m=s.match(`(.{${j}})(.*\1円){${(j-1)}}`))=>m?f(s,j+1,s.split(m[1]).map((e,i)=>e+(m[1][i]||'')).join``):o?f(o):s

Recursively matches the pattern:

(.{2})(.*1円){1} //2 characters, repeated 1 time 
(.{3})(.*1円){2} //3 characters, repeated 2 times 
(.{4})(.*1円){3} //4 characters, repeated 3 times 
etc.

... until the pattern isn't found. (This guarantees that the longest string is handled first.)

It then performs the "communistic transformations" on the last-found pattern, by splitting on the match, and joining on each of the match's characters. (map is used for this purpose. Too bad join doesn't take a callback.)

It finally recurses on this new string until it is no longer communistic.

Test cases:

f=(s,j=2,o,m=s.match(`(.{${j}})(.*\1円){${(j-1)}}`))=>m?f(s,j+1,s.split(m[1]).map((e,i)=>e+(m[1][i]||'')).join``):o?f(o):s
console.log(f('123')); //123
console.log(f('111')); //111
console.log(f('1111')); //11
console.log(f('ABAB')); //AB
console.log(f('111111111')); //111
console.log(f('asdasdasd')); //asd
console.log(f('10/10')); //1/0
console.log(f('100/100+100')); //1/0+0
console.log(f(' + + ')); // + 
console.log(f('Hello, hello, dear fellow!')); //Heo hl, dear flow!
console.log(f('11122333/11122333/11122333')); //132/23

Stack Exchange Network

Communistic Substring Normalization

Test cases

Worked out test case

Reference code (Python)

3 Answers 3

JavaScript (ES6), 121 bytes

Pyth, 22 bytes

Clean, (削除) 420 (削除ここまで) ... 368 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Communistic Substring Normalization

Test cases

Worked out test case

Reference code (Python)

3 Answers 3

JavaScript (ES6), 121 bytes

Pyth, 22 bytes

Clean, (削除) 420 (削除ここまで) ... 368 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions