Return to Revisions

26 of 47

Use an atom ⚛ symbol to denote solutions that used non-atomic lookahead

edited Mar 13, 2021 at 11:58

12.9k
2
71
55

ECMAScript Regex, (削除) 733^{^⚛} (削除ここまで) (削除) 690^{^⚛} (削除ここまで) (削除) 158 (削除ここまで) (削除) 119 (削除ここまで) (削除) 118 (削除ここまで) 92 bytes

My interest in regex has been sparked with renewed vigor after over 41⁄2 years of inactivity. As such, I went in search of more natural number sets and functions to match with unary ECMAScript regexes, resumed improving my regex engine, and started brushing up on PCRE as well.

I'm fascinated by the alienness of constructing mathematical functions in ECMAScript regex. Problems must be approached from an entirely different perspective, and until the arrival of a key insight, it's unknown whether they're solvable at all. It forces casting a much wider net in finding which mathematical properties might be able to be used to make a particular problem solvable.

Matching factorial numbers was a problem I didn't even consider tackling in 2014 – or if I did, only momentarily, dismissing it as too unlikely to be possible. But last month, I realized that it could be done.

As with my other ECMA regex posts, I'll give a warning: I highly recommend learning how to solve unary mathematical problems in ECMAScript regex. It's been a fascinating journey for me, and I don't want to spoil it for anybody who might potentially want to try it themselves, especially those with an interest in number theory. See this earlier post for a list of consecutively spoiler-tagged recommended problems to solve one by one.

So do not read any further if you don't want some advanced unary regex magic spoiled for you. If you do want to take a shot at figuring out this magic yourself, I highly recommend starting by solving some problems in ECMAScript regex as outlined in that post linked above.

This was my idea:

The problem with matching this number set, as with most others, is that in ECMA it's usually not possible to keep track of two changing numbers in a loop. Sometimes they can be multiplexed (e.g. powers of the same base can be added together unambiguously), but it depends on their properties. So I couldn't just start with the input number, and divide it by an incrementally increasing dividend until reaching 1 (or so I thought, at least).

Then I did some research on the multiplicities of prime factors in factorial numbers, and learned that there's a formula for this – and it's one that I could probably implement in an ECMA regex!

After stewing on it for a while, and constructing some other regexes in the meantime, I took up the task of writing the factorial regex. It took a number of hours, but ended up working nicely. As an added bonus, the algorithm could return the inverse factorial as a match. There was no avoiding it, even; by the very nature of how it must be implemented in ECMA, it's necessary to take a guess as to what the inverse factorial is before doing anything else.

The downside was that this algorithm made for a very long regex... but I was pleased that it ended up requiring a technique used in my 651 byte multiplication regex (the one that ended up being obsolete, because a different method made for a 50 byte regex). I had been hoping a problem would pop up that required this trick: Operating on two numbers, which are both powers of the same base, in a loop, by adding them together unambiguously and separating them at each iteration.

But because of the difficulty and length of this algorithm, I used molecular lookaheads (of the form (?*...)) to implement it. That is a feature not in ECMAScript or any other mainstream regex engine, but one that I had implemented in my engine. Without any captures inside a molecular lookahead, it's functionally equivalent to an atomic lookahead, but with captures it can be very powerful. The engine will backtrack into the lookahead, and this can be used to conjecture a value which cycles through all possibilities (for later testing) without consuming characters of the input. Using them can make for a much cleaner implementation. (Variable-length lookbehind is at the very least equal in power to molecular lookahead, but the latter tends to make for more straightforward and elegant implementations.)

So the 733 and 690 byte lengths do not actually represent ECMAScript-compatible incarnations of the solution – hence the "+" after them; it's surely possible to port that algorithm to pure ECMAScript (which would increase its length quite a bit) but I didn't get around to it... because I thought of a much simpler and more compact algorithm! One that could easily be implemented without molecular lookaheads. It's also significantly faster.

This new one, like the previous, must take a guess at the inverse factorial, cycling through all possibilities and testing them for a match. It divides N by 2 to make room for the work it needs to do, and then seeds a loop in which it will repeatedly divide the input by a divisor that starts at 3 and increments each time. (As such, 1! and 2! can't be matched by the main algorithm, and must be dealt with separately.) The divisor is kept track of by adding it to the running quotient; these two numbers can be unambiguously separated because, assuming M! == N, the running quotient will continue to be divisible by M until it equals M.

This regex does division-by-a-variable in the innermost portion of the loop. The division algorithm is the same as in my other regexes (and similar to the multiplication algorithm): for A≤B, A*B=C if any only if C%A=0 and B is the largest number which satisfies B≤C and C%B=0 and (C-B-(A-1))%(B-1)=0, where C is the dividend, A is the divisor, and B is the quotient. (A similar algorithm can be used for the case that A≥B, and if it is not known how A compares to B, one extra divisibility test is all that is needed.)

So I love that the problem was able to be reduced to even less complexity than my golf-optimized Fibonacci regex, but I do sigh with disappointment that my multiplexing-powers-of-the-same-base technique will have to wait for another problem that actually requires it, because this one doesn't. It's the story of my 651 byte multiplication algorithm being supplanted by a 50 byte one, all over again!

dropped 1 byte (119 → 118) using a trick found by Grimy that can futher shorten division in the case that the quotient is guaranteed to be greater than or equal to the divisor
dropped 26 bytes (118 ~~(削除) (117🐌) (削除ここまで)~~ → 92) thanks to Grimy in collaboration with H.PWiz

With no further ado, here's the regex:

True/false version (92 bytes):

^(((x+)3円+)(?=3円3円2円$)(x(?=3円+(x*))(?=5円+$)(?=((x+)(?=5円7円*$)x)6円*$)3円+(?=5円7円$))*3円xx|x?)x$

Try it online!

Return inverse factorial or no-match (112 bytes):

^(?=((x*)x*)(?=1円$)(?=(xxx2円)+$)((?=2円3円*(x(?!3円)xx(x*)))6円(?=5円+$)(?=((x*)(?=5円(8円*$))x)7円*$)x9円(?=x6円3円+$))*2円3円$)3円|^xx?$

Try it online!

Return inverse factorial or no-match, in ECMAScript + \K (106 bytes):

^(((x+)3円+)(?=3円3円2円$)(x(?=3円+(x*))(?=5円+$)(?=((x+)(?=5円7円*$)x)6円*$)3円+(?=5円7円$))*x\Kxx((?!x$)\K|)3円|xx?)$

And the free-spaced version with comments:

# Match factorial numbers in the domain ^x*$
^ # N = tail = input
(
 ((x+)3円+)(?=3円3円2円$) # 3円 = conjectured inverse factorial of N if N >= 4! = 24;
 # 2円 = N/2 - 3円; assert tail is divisible by 3円 and by 2;
 # assert tail is divisible by a number >= 6;
 # tail = N/2 + 3円;
 # Note that if N = 3! = 6, 3円 will behave differently,
 # because sticking to the pattern, we'd need 2円=0, which
 # is of course impossible, because we must have 2円 >= 3円.
 # However, when N=6, 3円=1 happens to result in a match after
 # zero loop iterations.
 # The loop is seeded: X = N/2; I = 3円 - 1; tail = X + I + 1
 # The loop will divide X by the numbers in the range 3円-1 downward to 3, inclusive.
 # Since we don't divide by 3円 itself, and already divided by 2, if N is factorial,
 # the end result of these divisions will be X = 3円.
 ( # Main loop
 x # tail = tail-1 = X + I
 (?=3円+(x*)) # 5円 = tail % 3円 = I; assert X >= 3円
 (?=5円+$) # assert X is divisible by I; if 3円=1, then 5円=0,
 # and the main loop will exit after zero iterations
 (?=
 ( # 6円 = tail / 5円 = (X+I)/I = X/I + 1
 (x+) # 7円 = 6円-1 = X/I; assert X/I >= 1
 (?=5円7円*$)
 x
 )
 6円*$
 )
 3円+(?=5円7円$) # Prepare the next iteration of the loop: X = 7円; I -= 1;
 # tail = X + I + 1;
 # Assert that the tail difference is divisible by 3,円 i.e.
 # that, with our previous values of X and I:
 # X + I - (X/I + I-1 + 1)
 # = X/I*(I-1) is divisible by 3円
 # Since I is coprime with I-1, if X was divisible by 3円
 # but X/I is not, then X/I*(I-1) will not be either, so this
 # indirectly asserts that X/I, our new X, is divisible by 3円.
 # The initial value of X was asserted to be divisible by 3円
 # before the loop started, so by induction, every new value
 # of X will be asserted to be divisible by 3円. This is
 # important because we use modulo by 3円 to recover the value
 # of I at the beginning of each loop iteration.
 # And if N = 3円! and the previous of I=2 and X was able to be
 # divided by 2, this will force this iteration of the loop to
 # fail, and the loop will exit with the values of X and I
 # backtracked to their values before this iteration, i.e.
 # with X = 3円 and I = 2.
 )*
 3円xx # assert tail = 3円 + 2 + 1, meaning when the loop finished
 # we ended up with X = 3円 and I = 2
|
 x? # match N=1 and N=2, which the main algorithm can't
)
x$

The full history of my golf optimizations of these regexes is on github:

regex for matching factorial numbers - multiplicity-comparing method, with molecular lookahead.txt
regex for matching factorial numbers.txt (the one shown above)

answered Jan 22, 2019 at 4:01

Deadcode

12.9k
2
71
55

Return to Revisions

ECMAScript Regex, (削除) 733⚛ (削除ここまで) (削除) 690⚛ (削除ここまで) (削除) 158 (削除ここまで) (削除) 119 (削除ここまで) (削除) 118 (削除ここまで) 92 bytes

ECMAScript Regex, (削除) 733^{^⚛} (削除ここまで) (削除) 690^{^⚛} (削除ここまで) (削除) 158 (削除ここまで) (削除) 119 (削除ここまで) (削除) 118 (削除ここまで) 92 bytes