I've made a little function to eliminate continuous duplicate from a std::vector
. I have to use C++03.
For example, if a vector of ints is composed of: 1,1,2,3,,3,1,1,2 my function should return 1,2,3,1,2. I've tried to use templates (I've just began to use c++) and made it as fast as possible!
template<class T>
vector<T> remove_duplicate(vector<T>& vec) {
int length = vec.size();
vector<T> result(length);
result[0] = vec[0];
int last = 0;
for(int i=0; i<length; i++)
if(vec[i] != result[last]){
last++;
result[last] = vec[i];
}
result.resize(last+1);
return result;
}
Here's a simple test case:
static
void test_remove_duplicate() {
vector<int> v;
v.push_back(1); //123131
v.push_back(1);
v.push_back(2);
v.push_back(2);
v.push_back(3);
v.push_back(1);
v.push_back(3);
v.push_back(3);
v.push_back(1);
v.push_back(1);
vector<int> v1;
v1 = remove_duplicate(v);
for(int i=0; i<v1.size(); i++) {
cout << v1[i];
} cout << endl;
}
What do you think about it?
2 Answers 2
Let's go through mechanical errors:
use
size_t
instead ofint
int length = vec.size();
what if there is no zero element?
result[0] = vec[0];
the same as first:
int last = 0;
the same as 1, 3:
for(int i=0; i<length; i++)
Optimization errors:
Is there any reason why do you need to return a COPY of vector? If you need return copy, so why do you pass it by reference?
Extra
resize
of vector is extremely heavy and slow operation.result.resize(last+1);
Prefer pre-increment to post-increment
use reserve and push_back, instead of resize and []. In you case result.size() <= v.size(). So, make following:
std::vector<T> result; result.reserve(v.size); result.push_back( vec[0] );
In the loop:
result.push_back( vec[i] );
So, including all comments above:
template<class T>
std::vector<T> remove_duplicate(const std::vector<T>& vec)
{
std::vector<T> result;
if(!vec.empty())
{
result.reserve(vec.size());
result.push_back(vec.front());
for(size_t i = 0; i< vec.size(); ++i)
if(vec[i] != result.back())
result.push_back( vec[i] );
}
return result;
}
-
\$\begingroup\$ whithout last resize if i try to use the vector with a for loop ended in size() it continue with some trail 0.. \$\endgroup\$nkint– nkint2012年02月25日 17:57:36 +00:00Commented Feb 25, 2012 at 17:57
-
\$\begingroup\$ resize() to a smaller size than capacity does not have any nasty side-effects only that current size will be reduced. But I agree that I would prefer to use reserve() and push_back() instead. \$\endgroup\$Loki Astari– Loki Astari2012年02月26日 04:31:00 +00:00Commented Feb 26, 2012 at 4:31
Know your Standard Library
The standard unique
algorithm performs exactly this task (and is available in C++03). If we take the argument by value, we can modify it in place, and use the erase-remove idiom:
#include <algorithm>
#include <vector>
template<class T>
std::vector<T> remove_duplicate(std::vector<T> vec) {
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
return vec;
}
Note that the result vector has the same capacity as the input vector. If this is likely to be a problem, then std::unique_copy()
could be used to copy values into a new vector, or a move to C++11 would gain the shrink_to_fit()
method on your vector.