Simple Tag Parser

Question 1

This is a model of a forgiving HTML parser. Instead of parsing HTML and extracting attributes, in this code golf, the tag parser will be simple.

Write a function that parses a tag structure and returns its parenthized form. An opening tag consists of one lowercase letter, and a closing tag consists of one uppercase letter. For example, aAbaAB parses into (a)(b(a)), or in HTML, <a></a><b><a></a></b>. Of course, tags can be in juxtaposition and nest.

"Prematurely" closed tags must be handled. For example, in abcA, the A closes the outermost a, so it parses into (a(b(c))).

Extra closing tags are simply ignored: aAB parses into (a).

Overlapping tags are NOT handled. For example, abAB parses into (a(b)), not (a(b))(b), by the previous rule of extra closing tags (abAB -> abA ((a(b))) + B (extra)).

Assuming no whitespaces and other illegal characters in the input.

You are not allowed to use any library.

Here is a reference implementation and a list of test cases:

#!/usr/bin/python
def pars(inpu):
 outp = ""
 stac = []
 i = 0
 for x in inpu:
 lowr = x.lower()
 if x == lowr:
 stac.append(x)
 outp += "(" + x
 i = i + 1
 else:
 while len(stac) > 1 and stac[len(stac) - 1] != lowr:
 outp += ")"
 stac.pop()
 i = i - 1
 if len(stac) > 0:
 outp += ")"
 stac.pop()
 i = i - 1
 outp += ")" * i
 return outp
tests = [
 ("aAaAbB", "(a)(a)(b)"),
 ("abBcdDCA", "(a(b)(c(d)))"),
 ("bisSsIB", "(b(i(s)(s)))"),
 ("aAabc", "(a)(a(b(c)))"),
 ("abcdDA", "(a(b(c(d))))"),
 ("abcAaA", "(a(b(c)))(a)"),
 ("acAC", "(a(c))"),
 ("ABCDEFG", ""),
 ("AbcBCabA", "(b(c))(a(b))")
]
for case, expe in tests:
 actu = pars(case)
 print "%s: C: [%s] E: [%s] A: [%s]" % (["FAIL", "PASS"][expe == actu], case, expe, actu)

Shortest code wins.

Question 2

like any other code golfs, standard library allowed

Question 3

no limit on length nor nesting level

Question 4

You should add a test case for input that leads with a closing tag, such as AbcBCabA (should parse as (b(c))(a(b)). My code could have been shorter except for this case.

Question 5

Haskell, 111 characters

s@(d:z)§c|c>'^'=toEnum(fromEnum c-32):s++'(':[c]|d<'='=s|d==c=z++")"|1<3=(z++")")§c
p=tail.foldl(§)"$".(++"$")

This one's pretty golf'd for Haskell. Fun feature: The stack and the accumulating output are kept in the same string!

Test cases:

> runTests 
Pass: aAbaAB parsed correctly as (a)(b(a))
Pass: abcA parsed correctly as (a(b(c)))
Pass: aAB parsed correctly as (a)
Pass: abAB parsed correctly as (a(b))
Pass: aAaAbB parsed correctly as (a)(a)(b)
Pass: abBcdDCA parsed correctly as (a(b)(c(d)))
Pass: bisSsIB parsed correctly as (b(i(s)(s)))
Pass: aAabc parsed correctly as (a)(a(b(c)))
Pass: abcdDA parsed correctly as (a(b(c(d))))
Pass: abcAaA parsed correctly as (a(b(c)))(a)
Pass: acAC parsed correctly as (a(c))
Pass: AbcBCabA parsed correctly as (b(c))(a(b))

Edit: (113 → 111) used an @ pattern as suggested by FUZxxl

Question 6

Using an @-pattern for d:z might save two chars.

Question 7

Z80 Machine Code for TI-83+, 41 bytes

This is an implementation in hexadecimal machine code for a z80 cpu running on a TI-83+.

11XXXX131AFE61380F6FE53E28CD9DB47DCD9DB4188EE1BDC03E29CD9DB4189BEF4504E5214CE1C9

The XXXX (3 - 6 inclusive) is the 16-bit address of the string you are parsing, minus 1 byte.

Encoded in Z80-ASCII:

1XX≤ ̄•⟙8Fo↥>(xïÑ}xïÑ≠á↑γ∊>)xïÑ≠Ì⬆︎Ew↥!4L↑Φ

(Approximate, because TI calculators have their own character set.)

NOTE THAT THE AsmPrgm IS NOT INCLUDED IN THE ABOVE

Question 8

Windows PowerShell, 142 (削除) 146 (削除ここまで) (削除) 147 (削除ここまで) (削除) 152 (削除ここまで) (削除) 156 (削除ここまで) (削除) 169 (削除ここまで)

{$s=''
-join([char[]]"$args "|%{if(90-ge$_){')'*(($x=$s.indexOf("$_".ToLower())+1)+$s.Length*!$x)
$s=$s.substring($x)}else{"($_"
$s="$_$s"}})}

Some things to note: This is just a script block. It can be assigned to a variable or given a function name, if necessary. You can also run it by putting . or & in front of it and the arguments at the end. Uses a final space to terminate unclosed tags.

Passes all tests. Test script:

$tests = ("aAaAbB","(a)(a)(b)"),("abBcdDCA","(a(b)(c(d)))"),("bisSsIB","(b(i(s)(s)))"),("aAabc","(a)(a(b(c)))"),("abcdDA","(a(b(c(d))))"),("abcAaA", "(a(b(c)))(a)"),("acAC","(a(c))")
"function f " + ((gc ./tags.ps1)-join"`n") | iex
$tests | %{
 $result = f $_[0]
 ("FAIL: $($_[0]):$($_[1]) - $result", 'PASS')[$result -ceq $_[1]]
}

Question 9

Python - (削除) 114 (削除ここまで) (削除) 113 (削除ここまで) (削除) 153 (削除ここまで) (削除) 192 (削除ここまで) (削除) 174 (削除ここまで) 159 characters

from sys import *
s="";c=a=argv[1]
for f in a:
 o=c.find;p=f.lower
 if '@'<f<'\\':
\td=o(f)-o(p())
\ts+=")"*d
\tc=(c[:o(p())]+c[o(f)+1:])
 else:s+=("("+f)
print s

Abuses python's indentation parser to use one space for a full tab, five for two tabs.

Edit 1 - saved an unneeded space in the range() function

Edit 2 - fixed to deal with improper parse grammars, unterminated tags.

Edit 3 - fixed a bug whereby "incorrect" parses could be generated by ambiguity in the tag tree. Implemented a stack-based strategy, rather than a counter.

Edit 4 - renamed s.find to o to prevent save the chars used to repeatedly call it. did the same for f.lower.

Edit 5 - added the space/tab hack, saving three chars.

Edit 6 - ditched the loop in favor of ")"*d.

Question 10

instead of ord(f)... you can use '@'<f<'\\' If you don't need to check for '\\' you can use ']' instead

Question 11

you can use a single tab instead of 5 spaces. SO code markup can't handle it though :(. In your case just put leave out the newline and the spaces altogether. eg if ...:s+=")";c-=1 and else:s+="("+f;c+=1

Question 12

for i in range(d):s+=")" can be rewritten as s+=")"*d. And you have 174 chars.

Question 13

@cemper - good point that. I do "_"*80 all day long and forget about it when golfing.... Also, thanks to @gnibbler for the suggestions!

Question 14

Actually, I meant that you had had 174 chars before. So you are at 159 now.

Question 15

Golfscript, 54 chars

{[]:|\{.96>{.|+:|;40\}{32+|?).')'*\|>:|;}if}%|,')'*}:$

Tests

;["aAaAbB" "abBcdDCA" "bisSsIB" "aAabc" "abcdDA" "abcAaA" "acAC" "aAB" "abAB" "AbcBCabA"]{.' '\$n}%
aAaAbBaAaAbB (a)(a)(b)
abBcdDCA (a(b)(c(d)))
bisSsIB (b(i(s)(s)))
aAabc (a)(a(b(c)))
abcdDA (a(b(c(d))))
abcAaA (a(b(c)))(a)
acAC (a(c))
aAB (a)
abAB (a(b))
AbcBCabA (b(c))(a(b))

MtnViewMark MtnViewMark 5,01919 silver badges26 bronze badges · Accepted Answer · 2011-05-09 07:06:42Z

Haskell, 111 characters

s@(d:z)§c|c>'^'=toEnum(fromEnum c-32):s++'(':[c]|d<'='=s|d==c=z++")"|1<3=(z++")")§c
p=tail.foldl(§)"$".(++"$")

This one's pretty golf'd for Haskell. Fun feature: The stack and the accumulating output are kept in the same string!

Test cases:

> runTests 
Pass: aAbaAB parsed correctly as (a)(b(a))
Pass: abcA parsed correctly as (a(b(c)))
Pass: aAB parsed correctly as (a)
Pass: abAB parsed correctly as (a(b))
Pass: aAaAbB parsed correctly as (a)(a)(b)
Pass: abBcdDCA parsed correctly as (a(b)(c(d)))
Pass: bisSsIB parsed correctly as (b(i(s)(s)))
Pass: aAabc parsed correctly as (a)(a(b(c)))
Pass: abcdDA parsed correctly as (a(b(c(d))))
Pass: abcAaA parsed correctly as (a(b(c)))(a)
Pass: acAC parsed correctly as (a(c))
Pass: AbcBCabA parsed correctly as (b(c))(a(b))

Edit: (113 → 111) used an @ pattern as suggested by FUZxxl

\$\begingroup\$ Using an @-pattern for d:z might save two chars. \$\endgroup\$

FUZxxl
– FUZxxl

2011年05月09日 12:07:11 +00:00
Commented May 9, 2011 at 12:07

Stack Exchange Network

Simple Tag Parser

5 Answers 5

Haskell, 111 characters

Z80 Machine Code for TI-83+, 41 bytes

Windows PowerShell, 142 (削除) 146 (削除ここまで) (削除) 147 (削除ここまで) (削除) 152 (削除ここまで) (削除) 156 (削除ここまで) (削除) 169 (削除ここまで)

Python - (削除) 114 (削除ここまで) (削除) 113 (削除ここまで) (削除) 153 (削除ここまで) (削除) 192 (削除ここまで) (削除) 174 (削除ここまで) 159 characters

Golfscript, 54 chars

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple Tag Parser

5 Answers 5

Haskell, 111 characters

Z80 Machine Code for TI-83+, 41 bytes

Windows PowerShell, 142 (削除) 146 (削除ここまで) (削除) 147 (削除ここまで) (削除) 152 (削除ここまで) (削除) 156 (削除ここまで) (削除) 169 (削除ここまで)

Python - (削除) 114 (削除ここまで) (削除) 113 (削除ここまで) (削除) 153 (削除ここまで) (削除) 192 (削除ここまで) (削除) 174 (削除ここまで) 159 characters

Golfscript, 54 chars

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions