Counting occurrences of substring in string

Question 1

I need to know how many times a substring occurs in a given string. I figured that I might as well create an extension method:

public static int Occurences(this string str, string val)
{
 string copy = str;
 int instancesOf = 0;
 int indexOfVal;
 while ((indexOfVal = copy.IndexOf(val)) != -1)
 {
 copy = copy.Remove(indexOfVal, val.Count());
 instancesOf++;
 }
 return instancesOf;
}

I'm not sure that the assignment in the while loop is good practice. Should I change this to compute the value once outside the loop, and once in the loop, like this?

int indexOfVal = copy.IndexOf(val);
while (indexOfVal != -1)
{
 copy = copy.Remove(indexOfVal, val.Count());
 instancesOf++;
 indexOfVal = copy.IndexOf(val);
}

Any and all comments appreciated, the more the better.

Question 2

Do you want to find overlapping instances? "bbb".Occurences("bb") returns 1.

Question 3

@mjolka That might be nice, but is not necessary.

Question 4

@Hosch250 You need to document expected behavior. Should "bbb".Occurrences("bb") return 1 or 2?

Question 5

Either will be fine, but I might as well have it return 2 for the sake of a thorough implementation.

Question 6

If your val is long, and your str is pathological, the running time could be O(n^2). To avoid that, you could use a smarter string-searching algorithm, such as the Z Algorithm, which works in O(n) time.

Question 7

Your current implementation could be more efficient. Copying the passed in string and chopping it up is probably pretty expensive (I'm not a master of .NET internals though).

val.Count()

Why not val.Length? Count() is very inefficient compared to Length.

copy = copy.Remove(indexOfVal, val.Count());

We don't need to remove part of the string, just search again starting at indexOfVal + val.Length.

Here is an example using an overload of String.IndexOf:

public static int Occurences(this string str, string val)
{ 
 int occurrences = 0;
 int startingIndex = 0;
 while ((startingIndex = str.IndexOf(val, startingIndex)) >= 0) 
 {
 ++occurrences;
 ++startingIndex;
 }
 return occurrences;
}

This implementation will count overlapping occurrences (so "bbb".Occurences("bb") will return 2.

If you don't want to count overlapping occurences, you can replace ++startingIndex; with:

startingIndex += val.Length

On why an exception isn't thrown in the case of "foo".Occurrences("o"), from MSDN:

If startIndex equals the length of the string instance, the method returns -1.

Question 8

Calling .IndexOf() with a startingIndex greater than the length of the string crashes, but it actually works in this statement because the assignment fails.

Question 9

"probably pretty expensive" and "very inefficient" are not terms I would use to describe Count() and splitting strings. These are micro-optimizations and most of the time won't make a licking difference. Almost certainly, the implementation of Count() just returns Length for a string.

Question 10

@craftworkgames It's a community wiki answer. You're welcome to improve it as you see fit.

Question 11

@craftworkgames Probably true, but using .Length is probably the more correct version, and my code needs to be fast as it will be running on mobile devices as well as desktops.

Question 12

Knowing different ways to code something is great. That's what code reviews are all about. It's just that "The biggest problems with 'premature optimization' are that it can introduce unexpected bugs and can be a huge time waster." programmers.stackexchange.com/questions/80084/…

user34073 – user34073 · Accepted Answer · 2015-03-30 00:08:18Z

Your current implementation could be more efficient. Copying the passed in string and chopping it up is probably pretty expensive (I'm not a master of .NET internals though).

val.Count()

Why not val.Length? Count() is very inefficient compared to Length.

copy = copy.Remove(indexOfVal, val.Count());

We don't need to remove part of the string, just search again starting at indexOfVal + val.Length.

Here is an example using an overload of String.IndexOf:

public static int Occurences(this string str, string val)
{ 
 int occurrences = 0;
 int startingIndex = 0;
 while ((startingIndex = str.IndexOf(val, startingIndex)) >= 0) 
 {
 ++occurrences;
 ++startingIndex;
 }
 return occurrences;
}

This implementation will count overlapping occurrences (so "bbb".Occurences("bb") will return 2.

If you don't want to count overlapping occurences, you can replace ++startingIndex; with:

startingIndex += val.Length

On why an exception isn't thrown in the case of "foo".Occurrences("o"), from MSDN:

If startIndex equals the length of the string instance, the method returns -1.

Calling .IndexOf() with a startingIndex greater than the length of the string crashes, but it actually works in this statement because the assignment fails.
"probably pretty expensive" and "very inefficient" are not terms I would use to describe Count() and splitting strings. These are micro-optimizations and most of the time won't make a licking difference. Almost certainly, the implementation of Count() just returns Length for a string.
@craftworkgames It's a community wiki answer. You're welcome to improve it as you see fit.
@craftworkgames Probably true, but using .Length is probably the more correct version, and my code needs to be fast as it will be running on mobile devices as well as desktops.
Knowing different ways to code something is great. That's what code reviews are all about. It's just that "The biggest problems with 'premature optimization' are that it can introduce unexpected bugs and can be a huge time waster." programmers.stackexchange.com/questions/80084/…

Stack Exchange Network

Counting occurrences of substring in string

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Counting occurrences of substring in string

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions