https://leetcode.com/problems/implement-strstr/
I implemented Rabin-Karp algorithm https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
please review for performance, also if you were in an interview what do you think about using functions like GetHashCode()? would you like the interviewee to implement the hashing on their own?
Implement strStr().
Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Example 1: Input: haystack = "hello", needle = "ll" Output: 2 Example 2: Input: haystack = "aaaaa", needle = "bba" Output: -1 Clarification:
What should we return when needle is an empty string? This is a great question to ask during an interview.
For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's strstr() and Java's indexOf().
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace StringQuestions
{
/// <summary>
///https://leetcode.com/problems/implement-strstr/
/// </summary>
[TestClass]
public class StrStrLeetCode
{
[TestMethod]
public void ValidNeedleTest()
{
string haystack = "hello";
string needle = "ll";
Assert.AreEqual(2, StrStr(haystack, needle));
}
[TestMethod]
public void NotValidNeedleTest()
{
string haystack = "aaaaa";
string needle = "ba";
Assert.AreEqual(-1, StrStr(haystack, needle));
}
public int StrStr(string haystack, string needle)
{
if (string.IsNullOrEmpty(needle))
{
return 0;
}
int n = haystack.Length;
int m = needle.Length;
var hash = needle.GetHashCode();
for (int i = 0; i < n - m + 1; i++)
{
string tempStr = haystack.Substring(i, m);
var hashTemp = tempStr.GetHashCode();
if (hash == hashTemp)
{
return i;
}
}
return -1;
}
}
}
3 Answers 3
As dfhwze and Roland already pointed out, a hash alone is not sufficient to determine whether two things are equal, so you still need to do a string comparison afterwards if the hashes match. Otherwise you will get wrong results from time to time. Not to mention the effect of hash randomization between different application runs...
The idea behind Rabin-Karp's use of hashes is to replace costly string comparisons with cheap hash comparisons. But in your case, the cost of creating a substring and calculating its hash (which involves some calculations for every character) is often greater than doing a direct string comparison (which can bail out at the first difference).
As the Wikipedia article that you linked to says, you'll want to use a rolling hash, a hashing algorithm that allows you to calculate the hash of the next substring with just a few operations, regardless of how long that substring is.
Also, as far as I can tell, storing string.Length
in a local variable doesn't offer any performance improvements. It does make the code slightly less readable though, in my opinion.
Wouldn't it be a minor optimization if you check haystack[i] == needle[0]
before you call Substring()
and calculate the hash?:
if (haystack[i] == needle[0])
{
string tempStr = haystack.Substring(i, m);
var hashTemp = tempStr.GetHashCode();
if (hash == hashTemp)
{
return i;
}
}
Your code is both buggy and inefficient.
You should never call String.Substring since that method allocates a new string. In a programming language like Go, where a string is implemented as a view to a simple byte array, that would be ok since getting the substring involves only 3 memory operations and no object allocations. But not so in C# or Java.
If String.GetHashCode had a fixed and documented hashing algorithm like in Java, I could provide you with a reliable way of finding a counterexample. But since the exact algorithm is not specified, you'd have to try several random strings until you find a counterexample. Using a fuzzer is a good way of finding this bug:
- Generate two random strings
- Ensure that
StrStr(haystack, needle) == haystack.IndexOf(needle)
- goto 1, until the test fails
I don't see any point in allowing null
as an argument. Your code should just throw an exception in such a case. And if you allow needle
to be null
, why don't you allow haystack
to be null
as well? And where are the unit tests corresponding to these edge cases? Especially for simple utility functions like this one, it's trivial to reach 100% test coverage, therefore you should do that.
Assert.AreEqual(0, StrStr("anything", ""))
? \$\endgroup\$