My special case: home project, automatic downloader for podcasts.
Overall algorithm is:
- Download a list of available podcasts
- hash the podcasts
- load metadata + hash of downloaded podcasts from sqlite db
- Algorithm this questions is about - throw out all already downloaded.
- Download new podcasts
- save metadata to sqlite db
I'm hoping for comments on using std::rotate
, performance, and the implementation.
For downloaded data, I currently use a vector. For data from DB, I also use a vector.
I merge both together and sort them according to the hashes (obviously).
And algorithm doubleEraser
on the merged vector.
#include <algorithm>
#include <vector>
#include <iostream>
#include <iterator>
#include <string>
template <typename ForwardItr>
ForwardItr doubleEraser(ForwardItr first, ForwardItr last)
{
auto itr = first;
typename std::iterator_traits<ForwardItr>::value_type firstMatch = *first;
bool hasFirstMatch(false);
while(itr != last)
{
auto next = std::next(itr);
if(next != last && *itr == *next)
{
if(!hasFirstMatch)
{
hasFirstMatch = true;
firstMatch = *itr;
}
else
{
if(*itr == firstMatch) // again at first match
{
return itr;
}
}
std::rotate(itr, std::next(itr, 2), last); // throw matched elements to the end of container
}
else
++itr;
}
return last;
}
template <typename T>
void print(T& c)
{
for(auto & element : c)
std::cout << element << " ";
std::cout << "\n";
}
template <class T>
void process(std::vector<T>& t)
{
std::string formating(" \t");
std::cout << "input: " << formating;
print(t);
std::sort(t.begin(), t.end());
std::cout << "sorted:" << formating;
print(t);
auto itr_begin = doubleEraser(t.begin(), t.end());
std::cout << "dEraser:" << formating;
print(t);
t.erase(itr_begin, t.end());
std::cout << "output:" << formating;
print(t);
}
int main()
{
std::vector<int> vec {1,2,3,4,5,6,7,8,9,3,5,6,7,2};
std::vector<char> vec2 {'A', 'C', 'D', 'D', 'G', 'A' };
std::vector<std::string> vec3 {"Hello", "World", "that", "be", "that", "Hello"};
process(vec);
process(vec2);
process(vec3);
}
2 Answers 2
Comment by @T.C.:
Don't merge them; sort them separately and then use std::set_difference
:
#include <algorithm>
#include <vector>
#include <iostream>
template <typename T>
void print(T& c)
{
for(auto & element : c)
std::cout << element << " ";
std::cout << "\n";
}
int main()
{
std::vector<int> vec {1,2,3,4,6,7,8};
std::vector<int> vec4 {3,4,5,6};
std::sort(vec.begin(), vec.end());
std::sort(vec4.begin(), vec4.end());
std::set_difference(vec.begin(), vec.end(), vec4.begin(), vec4.end(), std::inserter(vec5, vec5.begin()));
print(vec5);
}
- Input: {1,2,3,4,6,7,8} and {3,4,5,6}
- Output wanted: 1,2,5,7,8
- Output real: 1,2,7,8
You cannot use std::set_difference
because of the hashes used as comparison criteria. The Hash Distribution of new hashes is not sortable in a way that new items will be the last ones, which is a requirement for std::set_difference
.
-
2\$\begingroup\$ I'm not sure I understand; you have a set of "available" (
{1,2,3,4,6,7,8}
) and a set of "downloaded" ({3,4,5,6}
), right? And you want the set of "available but not downloaded", which is, correctly,{1,2,7,8}
.{1,2,5,7,8}
would be both "available but not downloaded" and "downloaded but not available". If you really want that, there'sstd::set_symmetric_difference
. \$\endgroup\$T.C.– T.C.2015年02月13日 10:28:36 +00:00Commented Feb 13, 2015 at 10:28
If
std::set_difference
orstd::set_symmetric_difference
work for you, then just use those.Else if you have to merge the vectors for some reason, I would recommend replacing
doubleEraser()
with something like this:template <typename ForwardItr> ForwardItr remove_adjacent (ForwardItr first, ForwardItr last) { while ((first = std::adjacent_find (first, last)) != last) { auto value = *first ; last = std::remove (first, last, value) ; } return last ; }
You already sort
t
before you calldoubleEraser()
, so this should always work. It also handles the case when you have more than just two duplicates.If the above for some reason does not fit your requirements, then below is a small review of your
doubleEraser()
function.typename std::iterator_traits<ForwardItr>::value_type firstMatch = *first;
Could just be:
auto firstMatch = *first;
The performance of
std::rotate()
should be fine.
-
\$\begingroup\$ Thank you for your answer. I'll use std::set_ algorithms. Ad 2.) Check your algorithm Given Input: 1 2 3 3 4 4 5 5 6 6 7 8 --sort--- remove_adjacent output: 1 2 4 6 7 8 <--- 4, 6 are illegal numbers. It jumps after it found one dup pair. \$\endgroup\$eul3r– eul3r2015年02月15日 21:18:14 +00:00Commented Feb 15, 2015 at 21:18
-
1\$\begingroup\$ Needs to make a copy of
*first
forremove()
-auto val = *first; last = std::remove (first, last, val);
. Otherwise weird things happens sinceremove
takes the value by const reference, which changes during the process of removal. \$\endgroup\$T.C.– T.C.2015年02月16日 07:22:04 +00:00Commented Feb 16, 2015 at 7:22
std::set_difference
. \$\endgroup\$