Parse nested absolute values

Question 1

The absolute value of a number \$x\$ is normally written as \$|x|\$. The left and right side of the absolute value uses the same symbol, so it is not immediately obvious how to parse nested absolute values e.g. \$||1-2|+|3-|4-5|||\$

Your goal is to parse such an expression containing nested absolute values:

The expression will be given as a string of characters.
For simplicity the expression will only contain single-digit numbers (or letters if that is easier in your language), the operators + and - (you can use any two distinct characters to represent these operations), and the symbol | for the left and right side of an absolute value.
You do not need to handle the case where a number is directly adjacent to an absolute value (e.g. 2|3| or |2|3)

Your output should be the same expression in a form that allows you to determine how the absolute values are bracketed.

The output has to satisfy the following rules:

The expression within an absolute value must not end with an operator ( + or - )
The expression within an absolute value cannot be empty
Each | has to be part of exactly one absolute value

You may assume there is a valid way to parse the given input.

Examples:

|2| -> (2)
|2|+|3| -> (2)+(3)
||2|| -> ((2))
||2|-|3|| -> ((2)-(3))
|-|-2+3|| -> (-(-2+3))
|-|-2+3|+|4|-5| -> (-(-2+3)+(4)-5)
|-|-2+|-3|+4|-5| -> (-(-2+(-3)+4)-5)
||1-2|+|3-|4-5||| -> ((1-2)+(3-(4-5)))

This is code-golf the shortest solution wins.

Optional additional requirement:

Also support expressions that allow a number direct before or after a bracket. If the result is not unique, you may return any valid solution.

test-cases (for optional requirement):

|2|3|4| -> (2(3)4)
|2|3|4| -> (2)3(4)
||3|4| -> ((3)4)
|2|3-|4| -> (2)3-(4)
|1+|2|3| -> (1+(2)3)
|1+2|3|| -> (1+2(3))

Question 2

The expression within a absolute value must not end with an operator. Do you have a testcase in mind where this could be a problem? I have a solution which doesn't check for this, but I'm having a hard time finding an example where this is actually an issue

Question 3

@ovs one possible example would be parsing the second test case as (2(+)3)

Question 4

Would |1+2|3|| be a valid test case for the optional requirement?

Question 5

(Your optional requirement test cases seem to test for a number after a bracket, not before.)

Question 6

JavaScript (ES6), 52 bytes

f=s=>s>(s=s.replace(/\|(.*?[\d)])\|/,"(1ドル)"))?f(s):s

Try it online!

Method

At each iteration, we look for the first substring consisting of:

\| a leading pipe
.*? the shortest possible string
[\d)] either a digit or a closing parenthesis
\| a trailing pipe

and replace the pipes with ( and ) respectively.

If a replacement occurs, the new string is less than the original string in lexicographical order, because the leading pipe is turned into an opening parenthesis. Hence the test s > (s = s.replace(...)) which triggers a recursive call if true.

NB: A safer way would be to use [^|]* instead of .*?. If you find a test case where .*? fails, please let me know.

Question 7

The pattern works wonderfully with JavaScript regexes, but cannot make it work the same way with RE2, possibly because of differences in empty string matching.

Question 8

[^|]* wouldn't need a ?, would it?

Question 9

Retina 0.8.2, (削除) 22 (削除ここまで) 16 bytes

T`|`)`\d\|+
\|
(

Try it online! Link includes test cases. Explanation:

T`|`)`\d\|+

Replace all |s that appear after digits with )s.

\|
(

Replace the remaining |s with (s.

Question 10

I think you can split this up into two separate replace to save a byte: TIO

Question 11

@ovs I took your idea to its logical conclusion...

Question 12

K (ngn/k), (削除) 27 (削除ここまで) 22 bytes

Based on Arnauld's answer, but without regex.

{`c$(84-0=\"0|"'x)!'x}

Try it online!

Any closing parenthesis in the output directly follows a digit or another closing parenthesis, and none of the opening ones do. This means we can replace all consecutive |'s following a digit by closing parentheses. The remaining |'s are replaced by opening ones.

K (ngn/k), 36 bytes

My first idea, tests all ways of replacing | with parentheses using try/eval.

{c@*<.[.:;;`]','c:+`c$(83+!2&x)!''x}

Try it online!

It is not very easy to show this is correct, as (1+) is a valid expression in K. However after toying around with this for quite some time, I'm convinced that you can't construct a valid input where this actually becomes a problem

Question 13

Charcoal, 16 bytes

FS≡ι|§)(‹ψ0«ι≔ιψ

Try it online! Link is to verbose version of code. Explanation:

FS≡ι

Loop over the input characters.

|§)(‹ψ0

If the current characters is a | then output a ) or ( depending on whether the last non-| was less than 0.

«ι≔ιψ

Otherwise output the character and save it as the last non-|.

Question 14

JavaScript (Node.js), 44 bytes

s=>s.replace(d=/./g,c=>c>{}?++d?')':'(':d=c)

Try it online!

Look last non-pipe character before each pipe. Converting pipe to ) only when last non-pipe is a digit, and converting to ( otherwise. Does not work with the optional additional requirement.

Question 15

05AB1E, 13 bytes

ε'|Qižu®dèëy©

Port of @Neil's Charcoal answer, so make sure to upvote that answer as well!

Outputs as a list of characters.

Try it online or verify all test cases.

Explanation:

ε # Map over the characters of the (implicit) input:
 '|Qi '# If the current character is a "|":
 žu # Push builtin string "()<>[]{}"
 ® # Push variable `®` (which is -1 by default)
 d # Check whether it's a non-negative (>=0) number
 è # Use that check (0 or 1) to (0-based) index into string "()<>[]{}"
 ë # Else:
 y # Simply push the current character
 © # And store it in variable `®` (without popping)
 # (after which the mapped list of characters is output implicitly as result)

Question 16

Haskell + hgl, 26 bytes

gkx"(/d(/|{)_)+|/|{(_|.)+"

This solution uses a "regex" to replace each of the |s with either ( or ).

A closing paren must be preceded by a valid statement, and valid statements can only end in either digits, or a closing paren. Meanwhile a open paren can never come after a digit or a closing paren, so this classification is complete.

We translate these rules to the "regex":

( )+ -- at least once
 /d -- a digit
 ( )+ -- at least one of
 /| -- a |
 {)_ -- replaced with )
 | -- or
 /| -- a |
 {(_ -- replaced with (
 | -- or 
 . -- any character

Without "regex", 36 bytes

gkY$dgS<>sO(hh"|"")")++hh"|""("++hdS

This is the same idea as the algorithm above, but implemented using parser combinators.

This is considerably longer, as parser combinators tend to be at a disadvantage to the "regex" when it comes to challenges involving the specific behavior of specific characters.

36 bytes

rA"|""("<gkY(dgS<>sO(hh"|"")")++hdS)

This is another solution which is also the same length, but which uses a mix of parser and non-parser functions.

Reflection

Although these solutions don't exactly feel good, there is remarkably little I see in the way of improvement.

Regex

There's very little that can be done to improve the "regex" based solution. I have no plans to change the syntax of "regex" in the near future, and to be honest I don't even know what I would do.

The only thing I can think of is to make a combined version of gkx and gkY. This would pull the (...)+ out and save 3 bytes.

Non-regex

I have more thoughts here, but even with all my suggestions this doesn't come close to catching up with the regex.

The time has probably come for a combined version of <> and sO.
hh"|"")" is annoyingly long. There are two ways to solve this:
- Simplest is that like other characters we could just add a shortcut to hh"|". Of course next time we are going to want to do something like hh"@""*" and this will have been completely useless, but if we keep adding these eventually we will get there.
- A more principled solution would be to implement something like xx but taking a single string argument and unconsing it to get two arguments. e.g. xxh(x:xs)=xx x xs. This would allow it to be xxh"|)" which saves only a byte.

Arnauld Arnauld 204k21 gold badges186 silver badges667 bronze badges · Accepted Answer · 2023-12-05 13:17:33Z

JavaScript (ES6), 52 bytes

f=s=>s>(s=s.replace(/\|(.*?[\d)])\|/,"(1ドル)"))?f(s):s

Try it online!

Method

At each iteration, we look for the first substring consisting of:

\| a leading pipe
.*? the shortest possible string
[\d)] either a digit or a closing parenthesis
\| a trailing pipe

and replace the pipes with ( and ) respectively.

If a replacement occurs, the new string is less than the original string in lexicographical order, because the leading pipe is turned into an opening parenthesis. Hence the test s > (s = s.replace(...)) which triggers a recursive call if true.

NB: A safer way would be to use [^|]* instead of .*?. If you find a test case where .*? fails, please let me know.

The pattern works wonderfully with JavaScript regexes, but cannot make it work the same way with RE2, possibly because of differences in empty string matching.

Stack Exchange Network

Parse nested absolute values

Optional additional requirement:

7 Answers 7

JavaScript (ES6), 52 bytes

Method

Retina 0.8.2, (削除) 22 (削除ここまで) 16 bytes

K (ngn/k), (削除) 27 (削除ここまで) 22 bytes

K (ngn/k), 36 bytes

Charcoal, 16 bytes

JavaScript (Node.js), 44 bytes

05AB1E, 13 bytes

Haskell + hgl, 26 bytes

Without "regex", 36 bytes

36 bytes

Reflection

Regex

Non-regex

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Parse nested absolute values

Optional additional requirement:

7 Answers 7

JavaScript (ES6), 52 bytes

Method

Retina 0.8.2, (削除) 22 (削除ここまで) 16 bytes

K (ngn/k), (削除) 27 (削除ここまで) 22 bytes

K (ngn/k), 36 bytes

Charcoal, 16 bytes

JavaScript (Node.js), 44 bytes

05AB1E, 13 bytes

Haskell + hgl, 26 bytes

Without "regex", 36 bytes

36 bytes

Reflection

Regex

Non-regex

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions