I'm trying to find subsequences of vector from a vector.
For example, if I want to find {2, 3} from {1, 2, 3, 1, 2, 3, 4}, the result should be {1, 4}.
Here's what I wrote so far.
vector<int> v1 = {1, 2, 3, 1, 2, 3, 4}; //source vector
vector<int> v2 = {2, 3}; //the subsequence to find
vector<size_t> founds; //positions found
for (size_t i=0; i<v1.size(); ++i) {
auto it = search(v1.begin()+i, v1.end(), v2.begin(), v2.end());
if (it != v1.end()) {
size_t pos = distance(v1.begin(), it);
founds.push_back(pos);
i += pos;
}
}
for (size_t i=0; i<founds.size(); ++i) {
cout << founds[i] << endl; //prints 1, 4
}
While this seems to work, I would appreciate any advice to make the code simpler and efficient.
EDIT : I updated the code based on @user1118321's advice.
Here's an updated code.
vector<int> source = {5, 8, 1, 2, 1, 2, 1, 2, 1}; //source vector
vector<int> toSearch = {1, 2, 1}; //the subsequence to find
vector<size_t> results; //positions found
const auto srcBegin = source.begin();
const auto srcEnd = source.end();
for (auto nextIndex = srcBegin; nextIndex < srcEnd; nextIndex++)
{
const auto it = search(nextIndex, srcEnd, toSearch.begin(), toSearch.end());
if (it != srcEnd) {
const size_t pos = distance(srcBegin, it);
results.push_back(pos);
nextIndex = srcBegin + pos;
}
}
for (size_t i=0; i<results.size(); ++i) {
cout << results[i] << endl; //prints 2, 4, 6
}
And if I want to ignore the overlapping copies of a subsequence, I can change nextIndex = srcBegin + pos;
to nextIndex = srcBegin + pos + toSearch.size();
Then it prints 2, 6
2 Answers 2
Instead of converting the found iterator to a position, then adding that on to the beginning iterator to get to the new search position, it's easier to keep it as an iterator:
#include <algorithm>
#include <vector>
std::vector<std::size_t> find_all_indexes(const std::vector<int>& haystack,
const std::vector<int>& needle)
{
std::vector<std::size_t> indexes{};
auto it{haystack.begin()};
while ((it = std::search(it, haystack.end(), needle.begin(), needle.end())) != haystack.end())
indexes.push_back(std::distance(haystack.begin(), it++));
return indexes;
}
The calling code can remain unchanged:
#include <iostream>
int main()
{
static const std::vector<int> v1 = {1, 2, 3, 1, 2, 3, 4}; //source vector
static const std::vector<int> v2 = {2, 3}; //the subsequence to find
const std::vector<std::size_t> founds = find_all_indexes(v1, v2);
for (auto const element: founds) {
std::cout << element << std::endl; //prints 1, 4
}
}
This is a really straightforward and simple solution that's easy to understand! I don't know that you could make it much simpler than it is. But here are some things I would change:
Variable Names
I think the naming could be improved. The names v1
and v2
don't exactly tell us what they're for. I recommend something like source
or toSearch
for v1
and subsequence
or searchFor
for v2
. founds
is also a really weird sounding name. I would name it results
or subsequenceIndexes
or something a little more descriptive.
Loops
You have 2 for
loops that use old-school indexing. It would be better to use an iterator here since you need to pass one to search()
anyway. Something like this:
for(auto nextIndex = v1.begin(); nextIndex < v1.end(); nextIndex++)
{
auto it = search(nextIndex, v1.end(), v2.begin(), v2.end());
// ... etc.
}
Likewise for the second for
loop.
Errors
Also, is the code in the if
correct? It seems wrong to me. It looks like you're trying to skip over the contents of the subsequence you just found and start at the element one past the end of the subsequence. There are 2 problems here. The first is that you're skipping too much. For example, if you subsequence is found at position 7, but the subsequence is only 3 characters long, you're adding 7 to the current position, so you're missing 4 characters that could contain another copy of the subsequence.
But beyond that, there's also the issue of overlapping copies of a subsequence. Let's say the sequence to search for is "1, 2, 1" and the string to search is "5, 8, 1, 2, 1, 2, 1, 2, 1". How many copies of the subsequence are there? If the answer is 2, then you were on the right track. But if the answer is 3, then your solution will miss one. (I'm seeing 3 at index 2, 4, and 6, but you could argue that you only want the ones at indexes 2 and 6.)
-
\$\begingroup\$ Thanks for your answer! I just updated the code based on your advice and indications. Could you please tell me if the errors you mentioned are fixed in my updated code? \$\endgroup\$Zack Lee– Zack Lee2017年10月31日 09:59:29 +00:00Commented Oct 31, 2017 at 9:59
#include
lines, and amain()
that shows how to call your function. It's not mandatory, but it really helps! \$\endgroup\$