An application I've been working on for a while needs a manner to find a missing element from a sorted array, the problem with the application is that these values aren't numeric. They have a formula, but they aren't inherently numbers.
This means that using a basic binary search is out of the question, and it's also not applicable because we don't know the element we want. We only know what elements we have. So to find a missing element, we need to be less concerned about what value is there, and more concerned about what value should be there.
To do this I wrote an algorithm (see below) that's similar in nature to a binary-search, but it's not concerned about a missing value, or x
, perse, but instead is concerned if f(x) != g(x)
, of which f
and g
are arbitrary functions.
This algorithm requires three functions to be provided: a getItem
, which is g(x)
, where g
is a function to transform a BigInteger
to the actual value; a formula
, which is f(x)
, where f
is a function to transform a BigInteger
to the expected value; and an equals
, which provides true
if the f(x)
is equal to g(x)
.
static readonly BigInteger Zero = new BigInteger(0);
static readonly BigInteger One = new BigInteger(1);
static readonly BigInteger Two = new BigInteger(2);
static T SearchForOpening<T>(Func<BigInteger, T> getItem, Func<BigInteger, T> formula, Func<T, T, bool> equals, BigInteger l, BigInteger r)
{
if (r >= One)
{
if (l + One == r || l == r)
{
if (!equals(getItem(l), formula(l)))
{
return formula(l);
}
else if (!equals(getItem(r), formula(r)))
{
return formula(r);
}
return formula(r + One);
}
var mid = (l + r) >> 1;
if (!equals(getItem(mid), formula(mid)))
{
return SearchForOpening(getItem, formula, equals, l, mid);
}
return SearchForOpening(getItem, formula, equals, mid + One, r);
}
return formula(Zero);
}
This method runs in \$O(\log n)\$ time, which is entirely necessary as I need to be able to call it frequently, and have many concurrent sessions running.
I used BigInteger
for the index tracking because I absolutely need to be able to use this with astronomically huge values. This, again, threw a monkey-wrench into my original design, but the new one seems to perform well for large arrays, quite likely because of the \$O(\log n)\$ nature. (Do note that the \$n\$ in all cases is the total element count.)
I have a few test cases written into a console application, I added a calls++;
to the actual method to prove to myself it was \$O(\log n)\$ which have all succeeded, so the following code works as a decent test:
static int calls = 0; static string Formula(BigInteger i) { return i.ToString("X"); } static void Main(string[] args) { var rangeEnd = 2048; var items = Enumerable.Range(0, rangeEnd).Select(x => new BigInteger(x)).Select(Formula).ToList(); var enumeration = Enumerable.Range(0, rangeEnd).Select(x => new BigInteger(x)).Select(Formula).ToList(); var value = Formula(0); foreach (var j in enumeration) { calls = 0; items.Remove(j); value = SearchForOpening(i => items[(int)i], Formula, (x, y) => x == y, 0, items.Count - 1); Console.WriteLine($"{(value == j ? "PASS" : "FAIL")} {calls} {value}"); if (value != j) Console.ReadLine(); items = Enumerable.Range(0, rangeEnd).Select(x => new BigInteger(x)).Select(Formula).ToList(); } var expected = String.Empty; calls = 0; expected = "4"; items = Enumerable.Range(0, 4).Select(x => new BigInteger(x)).Select(Formula).ToList(); items.AddRange(Enumerable.Range(rangeEnd * 4, 6).Select(x => new BigInteger(x)).Select(Formula)); value = SearchForOpening(i => items[(int)i], Formula, (x, y) => x == y, 0, items.Count - 1); Console.WriteLine($"{(value == expected ? "PASS" : "FAIL")} {calls} {value}"); calls = 0; expected = "0800"; items = Enumerable.Range(0, rangeEnd).Select(x => new BigInteger(x)).Select(Formula).ToList(); items.AddRange(Enumerable.Range(rangeEnd * 4, rangeEnd * 2).Select(x => new BigInteger(x)).Select(Formula)); value = SearchForOpening(i => items[(int)i], Formula, (x, y) => x == y, 0, items.Count - 1); Console.WriteLine($"{(value == expected ? "PASS" : "FAIL")} {calls} {value}"); calls = 0; expected = "1"; items = new List<BigInteger> { new BigInteger(0), new BigInteger(999), new BigInteger(1000) }.Select(Formula).ToList(); value = SearchForOpening(i => items[(int)i], Formula, (x, y) => x == y, 0, items.Count - 1); Console.WriteLine($"{(value == expected ? "PASS" : "FAIL")} {calls} {value}"); Console.ReadLine(); }
1 Answer 1
- Personally I think using recursion here makes the progam a little harder to understand and read. Rather than using recursion you can easily change to wrapping your code in a
while (true)
loop. - Rather than using the arrow anti-pattern, I'd
return formula(Zero)
straight away ifr < One
. Which is known as the guard pattern. - Rather than making your code work when there are two indexes left in the array, you can instead loop again, to force there to be one item left. Allowing you to remove the
else if (!equals(getItem(r), formula(r)))
andl + One == r
checks. - I bruteforce checked if you needed
!equals(getItem(l), formula(l))
, as it seemed a bit fishy. And found it wasn't needed. - Finally, The check
r < One
is actually not needed, as the only time thatr
is less than 1, is when it's the same asl
- When they're both zero. This means you can merge that and the followingif
s into one.
This left me with:
static readonly BigInteger One = new BigInteger(1);
static T SearchForOpening<T>(Func<BigInteger, T> getItem, Func<BigInteger, T> formula, Func<T, T, bool> equals, BigInteger r)
{
BigInteger l = 0;
while (true) {
if (l == r)
{
return formula(l);
}
var mid = (l + r) >> 1;
if (!equals(getItem(mid), formula(mid)))
{
r = mid;
}
else
{
l = mid + One;
}
}
}
I don't currently have an install of C#, so I couldn't test the C# code above. However I checked the code works with the following Python code, which doesn't error or print an index:
def search_for_opening(get_item, formula, equals, r):
l = 0
while True:
if l == r:
return l
mid = (l + r) >> 1
if not equals(get_item(mid), formula(mid)):
r = mid
else:
l = mid + 1
for size in range(2, 50):
for index in range(size):
l = list(map(str, range(size)))
del l[index]
if index != search_for_opening(lambda i: l[i], str, lambda a, b: a == b, len(l)):
print(index)
-
\$\begingroup\$
var mid = (l + r) >> 1;
? Why? Any recent (and decent) compiler will compile a division by two to a bitshift if the underlying architecture supports it. There's no need for the developer to do that. Edit: my bad. EBrown did that already. \$\endgroup\$Zeta– Zeta2017年09月01日 05:55:53 +00:00Commented Sep 1, 2017 at 5:55 -
\$\begingroup\$ @Zeta Did a basic benchmark of
/ 2
vs.>> 1
(just to get an order-of-magnitutde on what the difference might be), the>> 1
is ~20% faster than/ 2
onBigInteger
. (As I suspected.) Ordinarily I would agree that>> 1
is more obscure than/ 2
, butBigInteger
isn't a native type, so it's very likely much harder to optimize in that respect. \$\endgroup\$Der Kommissar– Der Kommissar2017年09月01日 11:25:11 +00:00Commented Sep 1, 2017 at 11:25
Explore related questions
See similar questions with these tags.
T
isstring
and it's sorted by a specific algorithm, thus when anyx
is piped tof
org
it will always return the same value because it's going to guarantee it's sorting. That's entirely on the caller, there doesn't have to be any sorting whatsoever, really, because thegetItem
andformula
don't know about sorting, they're only provided a value. \$\endgroup\$