Finding subsequences of vector from a vector

Question 1

I'm trying to find subsequences of vector from a vector.

For example, if I want to find {2, 3} from {1, 2, 3, 1, 2, 3, 4}, the result should be {1, 4}.

Here's what I wrote so far.

vector<int> v1 = {1, 2, 3, 1, 2, 3, 4}; //source vector
vector<int> v2 = {2, 3}; //the subsequence to find
vector<size_t> founds; //positions found
for (size_t i=0; i<v1.size(); ++i) {
 auto it = search(v1.begin()+i, v1.end(), v2.begin(), v2.end());
 if (it != v1.end()) {
 size_t pos = distance(v1.begin(), it);
 founds.push_back(pos);
 i += pos;
 }
}
for (size_t i=0; i<founds.size(); ++i) {
 cout << founds[i] << endl; //prints 1, 4
}

While this seems to work, I would appreciate any advice to make the code simpler and efficient.

EDIT : I updated the code based on @user1118321's advice.

Here's an updated code.

vector<int> source = {5, 8, 1, 2, 1, 2, 1, 2, 1}; //source vector
vector<int> toSearch = {1, 2, 1}; //the subsequence to find
vector<size_t> results; //positions found
const auto srcBegin = source.begin();
const auto srcEnd = source.end();
for (auto nextIndex = srcBegin; nextIndex < srcEnd; nextIndex++)
{
 const auto it = search(nextIndex, srcEnd, toSearch.begin(), toSearch.end());
 if (it != srcEnd) {
 const size_t pos = distance(srcBegin, it);
 results.push_back(pos);
 nextIndex = srcBegin + pos;
 }
}
for (size_t i=0; i<results.size(); ++i) {
 cout << results[i] << endl; //prints 2, 4, 6
}

And if I want to ignore the overlapping copies of a subsequence, I can change nextIndex = srcBegin + pos; to nextIndex = srcBegin + pos + toSearch.size(); Then it prints 2, 6

Question 2

Welcome to Code Review! You'll receive better reviews if you show a complete example. For example, I recommend that you edit to show the necessary #include lines, and a main() that shows how to call your function. It's not mandatory, but it really helps!

Question 3

Instead of converting the found iterator to a position, then adding that on to the beginning iterator to get to the new search position, it's easier to keep it as an iterator:

#include <algorithm>
#include <vector>
std::vector<std::size_t> find_all_indexes(const std::vector<int>& haystack,
 const std::vector<int>& needle)
{
 std::vector<std::size_t> indexes{};
 auto it{haystack.begin()};
 while ((it = std::search(it, haystack.end(), needle.begin(), needle.end())) != haystack.end())
 indexes.push_back(std::distance(haystack.begin(), it++));
 return indexes;
}

The calling code can remain unchanged:

#include <iostream>
int main()
{
 static const std::vector<int> v1 = {1, 2, 3, 1, 2, 3, 4}; //source vector
 static const std::vector<int> v2 = {2, 3}; //the subsequence to find
 const std::vector<std::size_t> founds = find_all_indexes(v1, v2);
 for (auto const element: founds) {
 std::cout << element << std::endl; //prints 1, 4
 }
}

Question 4

This is a really straightforward and simple solution that's easy to understand! I don't know that you could make it much simpler than it is. But here are some things I would change:

Variable Names

I think the naming could be improved. The names v1 and v2 don't exactly tell us what they're for. I recommend something like source or toSearch for v1 and subsequence or searchFor for v2. founds is also a really weird sounding name. I would name it results or subsequenceIndexes or something a little more descriptive.

Loops

You have 2 for loops that use old-school indexing. It would be better to use an iterator here since you need to pass one to search() anyway. Something like this:

for(auto nextIndex = v1.begin(); nextIndex < v1.end(); nextIndex++)
{
 auto it = search(nextIndex, v1.end(), v2.begin(), v2.end());
 // ... etc.
}

Likewise for the second for loop.

Errors

Also, is the code in the if correct? It seems wrong to me. It looks like you're trying to skip over the contents of the subsequence you just found and start at the element one past the end of the subsequence. There are 2 problems here. The first is that you're skipping too much. For example, if you subsequence is found at position 7, but the subsequence is only 3 characters long, you're adding 7 to the current position, so you're missing 4 characters that could contain another copy of the subsequence.

But beyond that, there's also the issue of overlapping copies of a subsequence. Let's say the sequence to search for is "1, 2, 1" and the string to search is "5, 8, 1, 2, 1, 2, 1, 2, 1". How many copies of the subsequence are there? If the answer is 2, then you were on the right track. But if the answer is 3, then your solution will miss one. (I'm seeing 3 at index 2, 4, and 6, but you could argue that you only want the ones at indexes 2 and 6.)

Question 5

Thanks for your answer! I just updated the code based on your advice and indications. Could you please tell me if the errors you mentioned are fixed in my updated code?

Toby Speight Toby Speight 87.1k14 gold badges104 silver badges322 bronze badges · Accepted Answer · 2017-10-31 13:13:17Z

Instead of converting the found iterator to a position, then adding that on to the beginning iterator to get to the new search position, it's easier to keep it as an iterator:

#include <algorithm>
#include <vector>
std::vector<std::size_t> find_all_indexes(const std::vector<int>& haystack,
 const std::vector<int>& needle)
{
 std::vector<std::size_t> indexes{};
 auto it{haystack.begin()};
 while ((it = std::search(it, haystack.end(), needle.begin(), needle.end())) != haystack.end())
 indexes.push_back(std::distance(haystack.begin(), it++));
 return indexes;
}

The calling code can remain unchanged:

#include <iostream>
int main()
{
 static const std::vector<int> v1 = {1, 2, 3, 1, 2, 3, 4}; //source vector
 static const std::vector<int> v2 = {2, 3}; //the subsequence to find
 const std::vector<std::size_t> founds = find_all_indexes(v1, v2);
 for (auto const element: founds) {
 std::cout << element << std::endl; //prints 1, 4
 }
}

Stack Exchange Network

Finding subsequences of vector from a vector

2 Answers 2

Variable Names

Loops

Errors

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding subsequences of vector from a vector

2 Answers 2

Variable Names

Loops

Errors

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions