I need to know how many times a substring occurs in a given string. I figured that I might as well create an extension method:
public static int Occurences(this string str, string val)
{
string copy = str;
int instancesOf = 0;
int indexOfVal;
while ((indexOfVal = copy.IndexOf(val)) != -1)
{
copy = copy.Remove(indexOfVal, val.Count());
instancesOf++;
}
return instancesOf;
}
I'm not sure that the assignment in the while
loop is good practice. Should I change this to compute the value once outside the loop, and once in the loop, like this?
int indexOfVal = copy.IndexOf(val);
while (indexOfVal != -1)
{
copy = copy.Remove(indexOfVal, val.Count());
instancesOf++;
indexOfVal = copy.IndexOf(val);
}
Any and all comments appreciated, the more the better.
1 Answer 1
Your current implementation could be more efficient. Copying the passed in string and chopping it up is probably pretty expensive (I'm not a master of .NET internals though).
val.Count()
Why not val.Length
? Count()
is very inefficient compared to Length
.
copy = copy.Remove(indexOfVal, val.Count());
We don't need to remove part of the string, just search again starting at indexOfVal + val.Length
.
Here is an example using an overload of String.IndexOf
:
public static int Occurences(this string str, string val)
{
int occurrences = 0;
int startingIndex = 0;
while ((startingIndex = str.IndexOf(val, startingIndex)) >= 0)
{
++occurrences;
++startingIndex;
}
return occurrences;
}
This implementation will count overlapping occurrences (so "bbb".Occurences("bb")
will return 2.
If you don't want to count overlapping occurences, you can replace ++startingIndex;
with:
startingIndex += val.Length
On why an exception isn't thrown in the case of "foo".Occurrences("o")
, from MSDN:
If startIndex equals the length of the string instance, the method returns -1.
-
\$\begingroup\$ Calling
.IndexOf()
with astartingIndex
greater than the length of the string crashes, but it actually works in this statement because the assignment fails. \$\endgroup\$user34073– user340732015年03月30日 00:34:24 +00:00Commented Mar 30, 2015 at 0:34 -
\$\begingroup\$ "probably pretty expensive" and "very inefficient" are not terms I would use to describe
Count()
and splitting strings. These are micro-optimizations and most of the time won't make a licking difference. Almost certainly, the implementation ofCount()
just returnsLength
for a string. \$\endgroup\$craftworkgames– craftworkgames2015年03月31日 00:23:21 +00:00Commented Mar 31, 2015 at 0:23 -
\$\begingroup\$ @craftworkgames It's a community wiki answer. You're welcome to improve it as you see fit. \$\endgroup\$nhgrif– nhgrif2015年03月31日 00:24:16 +00:00Commented Mar 31, 2015 at 0:24
-
\$\begingroup\$ @craftworkgames Probably true, but using
.Length
is probably the more correct version, and my code needs to be fast as it will be running on mobile devices as well as desktops. \$\endgroup\$user34073– user340732015年03月31日 00:25:38 +00:00Commented Mar 31, 2015 at 0:25 -
\$\begingroup\$ Knowing different ways to code something is great. That's what code reviews are all about. It's just that "The biggest problems with 'premature optimization' are that it can introduce unexpected bugs and can be a huge time waster." programmers.stackexchange.com/questions/80084/… \$\endgroup\$craftworkgames– craftworkgames2015年03月31日 06:26:40 +00:00Commented Mar 31, 2015 at 6:26
"bbb".Occurences("bb")
returns 1. \$\endgroup\$"bbb".Occurrences("bb")
return 1 or 2? \$\endgroup\$val
is long, and yourstr
is pathological, the running time could be O(n^2). To avoid that, you could use a smarter string-searching algorithm, such as the Z Algorithm, which works in O(n) time. \$\endgroup\$