The goal of a Rosetta Stone Challenge is to write solutions in as many languages as possible. Show off your programming multilingualism!
The Challenge
We've done run-length encoding be fore but only considered runs of a single character. Of course, we can sometimes make even greater savings if we consider runs of multiple characters.
Take aaaxyzxyzxyzdddd for instance. This can be compressed to 3a3{xyz}4d. Your task is to write a program of function which, given a case-sensitive string of letters and spaces, compresses it optimally using run-length encoding for multi-character runs.
You may take input via function argument, STDIN or ARGV and return the result or print it to STDOUT.
Runs must not be nested, so aaabbbaaabbb must be 3a3b3a3b, not 2{3a3b}. I.e. the string should be encoded (or decoded) in a single pass.
Consequences of Optimal Compression
Some examples where naive application of run-length encoding might lead to suboptimal results:
ababmust not be "compressed" to2{ab}aaaabcdeabcdemust not be compressed to4abcdeabcdebut3a2{abcde}instead.
If there are two optimal versions (e.g. aa and 2a or abcabc and 2{abc}) either result is fine.
Examples
Input Output
aa aa -or- 2a
aaaaaAAAAA 5a5A
ababa ababa
abababa a3{ba} -or- 3{ab}a
foo foo bar 2{foo }bar
aaaabcdeabcde 3a2{abcde}
xYzxYz xYzxYz -or- 2{xYz}
abcabcdefcdef abcab2{cdef}
pppqqqpppqqq 3p3q3p3q
pppqqqpppqqqpppqqq 3{pppqqq}
Scoring
Each language is a separate competition as to who can write the shortest entry, but the overall winner would be the person who wins the most of these sub-competitions. This means that a person who answers in many uncommon languages can gain an advantage. Code golf is mostly a tiebreaker for when there is more than one solution in a language: the person with the shortest program gets credit for that language.
If there is a tie, the winner would be the person with the most second-place submissions (and so on).
Rules, Restrictions, and Notes
Please keep all of your different submissions contained within a single answer.
Also, no shenanigans with basically the same answer in slightly different language dialects. I will be the judge as to what submissions are different enough.
Current Leaderboard
This section will be periodically updated to show the number of languages and who is leading in each.
- C# (265) — edc65
- JavaScript (206) — edc65
- Python (214) — Will
- VB.NET (346) — edc65
Current User Rankings
- edc65 (3)
- Will (1)
2 Answers 2
Python 214
def t(s):
b=s;l=len;r=range(1,l(s))
for i in r:
p=s[:i];b=min(b,p+t(s[i:]),key=l);x=1
for j in r:k=i*j+i;x*=p==s[i*j:k];b=min(b,(b,''.join((`j+1`,"{"[i<2:],p,"}"[i<2:],t(s[k:]))))[x],key=l)
return b
(second level indent is tab)
As this is code-golf, this is the naive recursive approach without early-exit, so its really really slow.
JavaScript (E6) 206
Almost sure it is optimal. I start trying to encode the longest string with the max gain, and recursively encode what remains at left and right. After each try I keep the shortest result.
Golf notes (JS specific)
- Pseudo arguments instead of locals (for recursion)
- Compare lengths using out of bounds index that gives 'undefined'
R=(s,n=s,d=s.length>>1,w,i,j,c)=>{
for(;d;--d){
for(i=-1;s[d+d+i++];){
w=s.slice(i,j=i+d);
for(r=1;w==s.substr(j,d);j+=d)++r;
c=d>1?r+'{'+w+'}':r+w,
// c[d*r-1]|| /* uncomment this line (+10 bytes) to have faster response
n[(c=R(s.slice(0,i))+c+R(s.slice(j))).length]&&(n=c)
}
}
return n
}
Test In FireFox/FireBug console
;['aa','aaaaaAAAAA','ababa','abababa','foo foo bar', 'aaaabcdeabcde',
'xYzxYz', 'abcabcdefcdef', 'pppqqqpppqqq','pppqqqpppqqqpppqqq']
.forEach(x=>console.log(x+' -> '+R(x)))
Output
aa -> aa
aaaaaAAAAA -> 5a5A
ababa -> ababa
abababa -> 3{ab}a
foo foo bar -> 2{foo }bar
aaaabcdeabcde -> 3a2{abcde}
xYzxYz -> xYzxYz
abcabcdefcdef -> abcab2{cdef}
pppqqqpppqqq -> 3p3q3p3q
pppqqqpppqqqpppqqq -> 3{pppqqq}
C# (.NET 4.0) 265
Exactly the same algorithm
string R(string s)
{
string n=s,c,w;
int l=s.Length,d=l/2,i,j,r;
for(;d>0;--d)
{
for(i=0;d+d+i<=l;i++)
{
w=s.Substring(i,d);
for(j=i+d,r=1;j+d<=l&&w==s.Substring(j,d);j+=d)++r;
c=d>1?r+"{"+w+"}":r+w;
// if(c.Length<d*r){ // uncomment this line for faster execution
c=R(s.Substring(0,i))+c+R(s.Substring(j));
n=c.Length<n.Length?c:n;
//} // and this one of course
}
}
return n;
}
Test in LinqPad, mode 'C# program', add a main after the R function
void Main()
{
string[] test = {"aa","aaaaaAAAAA","ababa","abababa","foo foo bar","aaaabcdeabcde",
"xYzxYz","abcabcdefcdef","pppqqqpppqqq","pppqqqpppqqqpppqqq"};
test.Select(x => new { x, r=R(x) }).Dump("Result");
}
Same Output
VB.Net (.NET 4) 346 (probably)
Same algorithm and same library, but the language is different enough
I can't be sure about byte count, as Visual Studio keep reformatting the code and adding spaces.
Golf notes: better use something else
function R(s as string)
dim n=s,c,w
dim l=s.Length,i,j,d
for d=l2円 to 1 step-1
for i=0 to l-d-d
w=s.Substring(i,d)
j=i+d
r=1
do while (j+d<=l andalso w=s.Substring(j,d))
r=r+1
j=j+d
loop
c=if(d>1,r & "{" & w & "}",r & w)
if(c.Length<d*r) then
c=R(s.Substring(0,i))+c+R(s.Substring(j))
if c.Length<n.Length then n=c
end if
next
next
return n
end function
-
\$\begingroup\$ You can undo the adding of spaces. If you type a character that makes it reformat hit ctrl-z to just undo the formatting. It means two characters instead of one every so often but it's better than removing all the spaces and linefeeds afterward. \$\endgroup\$Jerry Jeremiah– Jerry Jeremiah2014年09月23日 03:45:21 +00:00Commented Sep 23, 2014 at 3:45
Explore related questions
See similar questions with these tags.