This is a model of a forgiving HTML parser. Instead of parsing HTML and extracting attributes, in this code golf, the tag parser will be simple.
Write a function that parses a tag structure and returns its parenthized form.
An opening tag consists of one lowercase letter, and a closing tag consists of one uppercase letter. For example, aAbaAB
parses into (a)(b(a))
, or in HTML, <a></a><b><a></a></b>
.
Of course, tags can be in juxtaposition and nest.
"Prematurely" closed tags must be handled. For example, in abcA
, the A
closes the outermost a
, so it parses into (a(b(c)))
.
Extra closing tags are simply ignored: aAB
parses into (a)
.
Overlapping tags are NOT handled. For example, abAB
parses into (a(b))
, not (a(b))(b)
, by the previous rule of extra closing tags (abAB
-> abA
((a(b))
) + B
(extra)).
Assuming no whitespaces and other illegal characters in the input.
You are not allowed to use any library.
Here is a reference implementation and a list of test cases:
#!/usr/bin/python
def pars(inpu):
outp = ""
stac = []
i = 0
for x in inpu:
lowr = x.lower()
if x == lowr:
stac.append(x)
outp += "(" + x
i = i + 1
else:
while len(stac) > 1 and stac[len(stac) - 1] != lowr:
outp += ")"
stac.pop()
i = i - 1
if len(stac) > 0:
outp += ")"
stac.pop()
i = i - 1
outp += ")" * i
return outp
tests = [
("aAaAbB", "(a)(a)(b)"),
("abBcdDCA", "(a(b)(c(d)))"),
("bisSsIB", "(b(i(s)(s)))"),
("aAabc", "(a)(a(b(c)))"),
("abcdDA", "(a(b(c(d))))"),
("abcAaA", "(a(b(c)))(a)"),
("acAC", "(a(c))"),
("ABCDEFG", ""),
("AbcBCabA", "(b(c))(a(b))")
]
for case, expe in tests:
actu = pars(case)
print "%s: C: [%s] E: [%s] A: [%s]" % (["FAIL", "PASS"][expe == actu], case, expe, actu)
Shortest code wins.
5 Answers 5
Haskell, 111 characters
s@(d:z)§c|c>'^'=toEnum(fromEnum c-32):s++'(':[c]|d<'='=s|d==c=z++")"|1<3=(z++")")§c
p=tail.foldl(§)"$".(++"$")
This one's pretty golf'd for Haskell. Fun feature: The stack and the accumulating output are kept in the same string!
Test cases:
> runTests
Pass: aAbaAB parsed correctly as (a)(b(a))
Pass: abcA parsed correctly as (a(b(c)))
Pass: aAB parsed correctly as (a)
Pass: abAB parsed correctly as (a(b))
Pass: aAaAbB parsed correctly as (a)(a)(b)
Pass: abBcdDCA parsed correctly as (a(b)(c(d)))
Pass: bisSsIB parsed correctly as (b(i(s)(s)))
Pass: aAabc parsed correctly as (a)(a(b(c)))
Pass: abcdDA parsed correctly as (a(b(c(d))))
Pass: abcAaA parsed correctly as (a(b(c)))(a)
Pass: acAC parsed correctly as (a(c))
Pass: AbcBCabA parsed correctly as (b(c))(a(b))
- Edit: (113 → 111) used an
@
pattern as suggested by FUZxxl
-
\$\begingroup\$ Using an @-pattern for d:z might save two chars. \$\endgroup\$FUZxxl– FUZxxl2011年05月09日 12:07:11 +00:00Commented May 9, 2011 at 12:07
Z80 Machine Code for TI-83+, 41 bytes
This is an implementation in hexadecimal machine code for a z80 cpu running on a TI-83+.
11XXXX131AFE61380F6FE53E28CD9DB47DCD9DB4188EE1BDC03E29CD9DB4189BEF4504E5214CE1C9
The XXXX (3 - 6 inclusive) is the 16-bit address of the string you are parsing, minus 1 byte.
Encoded in Z80-ASCII:
1XX≤ ̄•⟙8Fo↥>(xïÑ}xïÑ≠á↑γ∊>)xïÑ≠Ì⬆︎Ew↥!4L↑Φ
(Approximate, because TI calculators have their own character set.)
NOTE THAT THE AsmPrgm
IS NOT INCLUDED IN THE ABOVE
Windows PowerShell, 142 (削除) 146 (削除ここまで) (削除) 147 (削除ここまで) (削除) 152 (削除ここまで) (削除) 156 (削除ここまで) (削除) 169 (削除ここまで)
{$s=''
-join([char[]]"$args "|%{if(90-ge$_){')'*(($x=$s.indexOf("$_".ToLower())+1)+$s.Length*!$x)
$s=$s.substring($x)}else{"($_"
$s="$_$s"}})}
Some things to note: This is just a script block. It can be assigned to a variable or given a function name, if necessary. You can also run it by putting .
or &
in front of it and the arguments at the end. Uses a final space to terminate unclosed tags.
Passes all tests. Test script:
$tests = ("aAaAbB","(a)(a)(b)"),("abBcdDCA","(a(b)(c(d)))"),("bisSsIB","(b(i(s)(s)))"),("aAabc","(a)(a(b(c)))"),("abcdDA","(a(b(c(d))))"),("abcAaA", "(a(b(c)))(a)"),("acAC","(a(c))")
"function f " + ((gc ./tags.ps1)-join"`n") | iex
$tests | %{
$result = f $_[0]
("FAIL: $($_[0]):$($_[1]) - $result", 'PASS')[$result -ceq $_[1]]
}
Python - (削除) 114 (削除ここまで) (削除) 113 (削除ここまで) (削除) 153 (削除ここまで) (削除) 192 (削除ここまで) (削除) 174 (削除ここまで) 159 characters
from sys import *
s="";c=a=argv[1]
for f in a:
o=c.find;p=f.lower
if '@'<f<'\\':
\td=o(f)-o(p())
\ts+=")"*d
\tc=(c[:o(p())]+c[o(f)+1:])
else:s+=("("+f)
print s
Abuses python's indentation parser to use one space for a full tab, five for two tabs.
Edit 1 - saved an unneeded space in the range() function
Edit 2 - fixed to deal with improper parse grammars, unterminated tags.
Edit 3 - fixed a bug whereby "incorrect" parses could be generated by ambiguity in the tag tree. Implemented a stack-based strategy, rather than a counter.
Edit 4 - renamed s.find to o to prevent save the chars used to repeatedly call it. did the same for f.lower.
Edit 5 - added the space/tab hack, saving three chars.
Edit 6 - ditched the loop in favor of ")"*d.
-
1\$\begingroup\$ instead of
ord(f)...
you can use'@'<f<'\\'
If you don't need to check for'\\'
you can use']'
instead \$\endgroup\$gnibbler– gnibbler2011年05月09日 06:15:32 +00:00Commented May 9, 2011 at 6:15 -
1\$\begingroup\$ you can use a single tab instead of 5 spaces. SO code markup can't handle it though :(. In your case just put leave out the newline and the spaces altogether. eg
if ...:s+=")";c-=1
andelse:s+="("+f;c+=1
\$\endgroup\$gnibbler– gnibbler2011年05月09日 06:18:35 +00:00Commented May 9, 2011 at 6:18 -
1\$\begingroup\$
for i in range(d):s+=")"
can be rewritten ass+=")"*d
. And you have 174 chars. \$\endgroup\$cemper93– cemper932011年05月09日 18:06:37 +00:00Commented May 9, 2011 at 18:06 -
\$\begingroup\$ @cemper - good point that. I do "_"*80 all day long and forget about it when golfing.... Also, thanks to @gnibbler for the suggestions! \$\endgroup\$arrdem– arrdem2011年05月09日 20:10:54 +00:00Commented May 9, 2011 at 20:10
-
\$\begingroup\$ Actually, I meant that you had had 174 chars before. So you are at 159 now. \$\endgroup\$cemper93– cemper932011年05月09日 21:01:17 +00:00Commented May 9, 2011 at 21:01
Golfscript, 54 chars
{[]:|\{.96>{.|+:|;40\}{32+|?).')'*\|>:|;}if}%|,')'*}:$
Tests
;["aAaAbB" "abBcdDCA" "bisSsIB" "aAabc" "abcdDA" "abcAaA" "acAC" "aAB" "abAB" "AbcBCabA"]{.' '\$n}%
aAaAbBaAaAbB (a)(a)(b)
abBcdDCA (a(b)(c(d)))
bisSsIB (b(i(s)(s)))
aAabc (a)(a(b(c)))
abcdDA (a(b(c(d))))
abcAaA (a(b(c)))(a)
acAC (a(c))
aAB (a)
abAB (a(b))
AbcBCabA (b(c))(a(b))
AbcBCabA
(should parse as(b(c))(a(b))
. My code could have been shorter except for this case. \$\endgroup\$