I have written a program, in which I have to find the number of occurrences (overlapping + non-overlapping) of a string in another string.
// variable to keep track of word
int i=0;
for(vector<string>::iterator it = p.begin(); it!=p.end(); it++){
frequency[i]=0;
int start=0;
while(true){
start = s.find(*it, start);
if(start==-1){ // not found
break;
} else{
frequency[i]++;
start++;
}
}
i++;
}
I am finding the starting index start
of the match, and then again searching for the string starting from 'start+1' this time, and so on. The length of the string s
, in which searching has to be performed is 50,00,000. And total number of keywords (stored in vector p
here) is 500 (each having length of 5000).
-
\$\begingroup\$ What exactly are you looking to improve? Or are you looking for a general review? \$\endgroup\$svick– svick2013年08月11日 12:35:38 +00:00Commented Aug 11, 2013 at 12:35
1 Answer 1
I don't like nested loops. They are generally unnecessry and are often confusing. So instead of your two loops, I would extrac the inner loop to a function, for example:
int key_search(const std::string& s, const std::string& key)
{
int count = 0;
size_t pos=0;
while ((pos = s.find(key, pos)) != std::string::npos) {
++count;
++pos;
}
return count;
}
This searches the inputs string s
for occurances of string key
and returns
the number of matches. A few things are worth noting:
The parameters are passed as const references -
const
because you don't change them and references for efficiency.I used
string::npos
as the idicator of failure fromstring
sfind
method, instead of -1 (-1 would give a warning from the compiler if you have sign conversion warnings enabled).I didn't use a
while(true)
loop but instead used the return froms.find
directly to control the loop.I pre-incremented variables instead of post-increment (++pos, not pos++) which can be more efficient (although with built in types it makes no difference).
pos
(yourstart
) is of typesize_t
, not int. Or you could usestd::string::size_type
, which is the alsosize_t
.It is normal to prefix your use of standard library things with
std::
, sostring
becomesstd::string
etc.
The main loop now just needs to call key_search
int i = 0;
for (auto it = keys.begin(); it != keys.end(); ++it){
frequency[i] = key_search(s, *it);
++i;
}
With C++11 you can use auto
to simplify the iterator definition. You don't
show what frequency
is, but I'm assuming it is an array (which are generally
best avoided in C++).