Objective
Write a routine that accepts a string of printable ASCII characters, s, and returns a string containing the same characters as s, reordered so that no two-character substring appears more than once. The program must process all benchmark strings (see below) in under one minute each on a modern computer. I will also award a special bonus of 50 rep to the lowest scoring answer that processes any valid 30-character string in under one minute.
For example, given the input Mississippi
, a valid output would be issiMspiips
(no two-character substrings appear twice), while an invalid output would be ipMsispiiss
(since the substring is
appears twice).
The routine may take the form of:
- a complete program reading from
stdin
(or equivalent) or the command line, and outputting tostdout
(or equivalent) - a function accepting a single string argument and returning a string
You may assume that the input string always admits at least one valid output.
The Challenge
Your routine must comprise 5 or more lines of code separated by newlines. Empty lines (which includes lines containing only whitespace) are ignored in all contexts and do not count towards total line count.
Swapping any two lines in your source code must produce a fatal error. By "fatal error", we refer to any of the following conditions:
- the source code fails to compile, with the compiler/interpreter declaring a fatal error
- the routine aborts with a runtime fatal error or an unhandled runtime exception
- the routine is forced into an abrupt, abnormal program termination that produces no output of any kind except for a possible error message and/or stack dump
Alternatively, contiguous blocks of code containing no newline characters may be used in place of lines. These blocks should each be displayed on their own line in the source file, with the understanding that newlines are stripped out before the source code is compiled/interpreted.
For example, the code
aaaa
bbbb
cccc
would condense to
aaaabbbbcccc
before being evaluated.
In this mode, the fatal error condition applies to swapping any two code blocks (and thus to swapping lines in the source code before newlines are stripped out). Hence in the above example the routines aaaaccccbbbb
, bbbbaaaacccc
, and ccccbbbbaaaa
must all produce fatal errors, either at compiletime or runtime.
Submissions using this alternative mode should declare its use.
Scoring
Let n be the number of non-empty textlines in your source file, with n ≥ 5. Let c be the number of bytes comprised by the longest textline (by byte length) in your source file, not counting any trailing newline.
A submission's score is given by c(n + 10).
The submission with the lowest score is the winner.
Best of luck. ;)
Benchmark Strings
Abracadabra Alacazam
Is Miss. Mississauga Missing?
Ask Alaska's Alaskans
GGGGAAAATTTTCCCCgggaaatttccc
A Man A Plan A Canal Panama
4 Answers 4
PHP, score = 289 (×ばつ(7+10))
PHP's built-in functions make it quite easy to do this badly. The following code just shuffles the string until a valid result is obtained:
function f($s){
while(preg_match(
'/(..).*?1円/',$s)
+preg_match('/(.'
.')1円1円/',$s))$s=
str_shuffle(
$s);return $s;}
Benchmarks
Average and maximum execution times calculated using the following code:
$s = 'Mississippi';
$t_total = 0;
$t_max = 0;
for ($i=0; $i<10; $i++) {
$t0 = microtime(true);
echo f($s);
$t = microtime(true) - $t0;
printf(" %10.7f\n",$t);
$t_total += $t;
if ($t > $t_max) $t_max = $t;
}
printf("Avg: %10.7f secs; Max: %10.7f secs\n",$t_total/$i, $t_max);
Results:
- Mississippi: Avg: 0.0002460 secs; Max: 0.0005491 secs
- Anticonstitutionnellement: Avg: 0.0001470 secs; Max: 0.0002971 secs
- Pneumonoultramicroscopicsilicovolcanoconiosis: Avg: 0.0587177 secs; Max: 0.1668079 secs
- Donaudampfschiffahrtselektrizitatenhauptbetriebswerkbauunterbeamtengesellschaft*: Avg: 9.5642390 secs; Max: 34.9904099 secs
- baaacadaeafbbcbdbebfccdcecfdde†: Avg: 5.0858626 secs; Max: 9.8927171 secs
* I changed ä
to a
to avoid multi-byte issues
† This was the hardest 30-character string I could come up with. It's actually the first 30 characters of the De Bruijn sequence for k='abcdef' and n=2, with the first 'b' moved to avoid an instant match.
-
5\$\begingroup\$ This doesn't really satisfy > The program must process any valid 30-character string in under one minute on a modern computer., considering the potentially infinite runtime. \$\endgroup\$Bob– Bob2014年11月10日 05:05:58 +00:00Commented Nov 10, 2014 at 5:05
-
\$\begingroup\$ @Bob I've added some benchmarks to the answer. The code may be inefficient, but the likelihood of it taking more than one minute to process a 30-character string is, I think, very small indeed. \$\endgroup\$r3mainer– r3mainer2014年11月10日 12:08:24 +00:00Commented Nov 10, 2014 at 12:08
Dyalog APL (11(5+10)=165)
f←{a←{⍴⍵}
b←a⌸2,/⍵
c←b⊢⍵[?⍨a⍵]
1∨.≠b:∇c⋄⍵
}
Proof:
- Lines 1 and 5 bound the function. Swapping any line for those would result in
⍵
occurring outside of a function, which is aSYNTAX ERROR
. - Line 2 defines
b
, so it cannot be swapped for lines3
or4
, which depend onb
. There would be aVALUE ERROR
. (And it obviously can't be swapped with1
or5
either.) - Line 3 defines
c
, so it cannot be swapped for line4
, which depends onc
. (And we've already proven no other line can be swapped with line3
.) - Line 4 depends on the variables from lines 2 and 3 and must therefore be last.
APL(Dyalog), 6(5+10) = 90
{1∧.=
+/∘.≡⍨
2,/⍵:⍵
⋄∇⍵[
?⍨⍴⍵]}
I'm using the alternative, so:
{1∧.=+/∘.≡⍨2,/⍵:⍵⋄∇⍵[?⍨⍴⍵]}
This is the same old algorithm.
Explanation
2,/⍵
gives an array of character pairs in the input string
+/∘.≡⍨
generates a numerical array of how many pairs does each pair equal (including itself)
1∧.=
checks if each element of that array equals 1, and logical AND the results together
:⍵
If that is true (1
), return the input string
∇⍵[?⍨⍴⍵]
else scramble the input string and do a recursive call
Swapping
If line 1 is swapped with line 2, then you end up with +/∘.≡⍨{...}
which is just a mess of functions and operators that gives SYNTAX ERROR
.
If line 1 is swapped with line 3 or 4, then you have ⍵
outside of a function definition, and that's a SYNTAX ERROR
.
If line 1 is swapped with line 5, unbalanced braces alone would cause SYNTAX ERROR
, so don't worry about the other 4 syntax errors.
If line 5 is swapped with line 2/3/4, again you have ⍵
outside of a function definition.(SYNTAX ERROR
)
If line 2 is swapped with line 3, you end up with 1∧.=2,/⍵:⍵
. This syntax is called a guard (think of it as a conditional). The guard condition must evaluate to 0
or 1
or a 1-element array of 0
or 1
. Here, it evaluates to something more complex than that (a scalar containing a 2-element array). So this is a DOMAIN ERROR
.
If line 2 is swapped with line 4, you get the statement 1∧.=
, which attempts to apply the function ∧.=
without the required left argument. (SYNTAX ERROR
).
If line 3 is swapped with line 4, again you get a mess of functions and operators (1∧.=+/∘.≡⍨
) so you get SYNTAX ERROR
.
Benchmarks
(numbers in milliseconds)
Abracadabra Alacazam
11 1 3 5 2
Avg: 4.4
Is Miss. Mississauga Missing?
1260 2000 222 117 111
Avg: 742
Ask Alaska's Alaskans
7 2 3 3 4
Avg: 3.8
GGGGAAAATTTTCCCCgggaaatttccc
31 15 24 13 11
Avg: 18.8
A Man A Plan A Canal Panama
377 2562 23 301 49
Avg: 662.4
I am still thinking about different splittings. Also I have a deterministic, systematic way of doing the task. I just can't make it into an algorithm (take away the creative part of "making the numbers right") and can't make sure it works every time.
Haskell, 129 = 3x(33 + 10)
this uses the alternative mode.
imp
ort
Da
ta.
Lis
t;a
%[]
=[[
]];
a%s
=[x
:j|
x<-
s,n
ot$
isI
nfi
xOf
[la
st
a,x
]a,
j<-
(a+
+[x
])%
(s\
\[x
])]
;g
s=[
]%s
!!0
or in a readable form:
import Data.List
a%[]=[[]]
a%s=[x:j|x<-s,not$isInfixOf[last a,x]a,j<-(a++[x])%(s\\[x])]
g s=[]%s!!0
Haskell is a very strict language. for example, the import
must come first; the definitions of s
must come together; all the types must agree and there is no way to cast between them, and so forth. this leads to having a non-fatal error almost impossible. in fact, having a runtime fatal error is too almost near impossible.
note that if g
is a valid function but has an the wrong type (any type different then [a]->[a]
or String -> String
and the like) than this is a fatal error because it's impossible to apply g
to the inputs.
outputs:
Abracadabar Alaazcma
Is Miss. iMsiasusgsa sMniig?s
Ask Alasak's lAaankss
GGAGTGCAATACTTCCggagtaacttcc
A Man AP la nAC aalnnaPama
Explore related questions
See similar questions with these tags.
CooliO
, outputoOoCli
? \$\endgroup\$Mspiisiipss
valid since the only repetition isii
which does not occur inMississippi
? \$\endgroup\$