2

I have to find the best possible algorithm(complexity) for my task.

Input: indexes first, last and array

Output: sum of integers in the same array after being sorted between positions first and last.

Numbers in array are different(can be negative)!

For example: Input: first = 3, last = 7, array = {5,4,2,6,8,9,0,-1,3}

Output: 26(3+4+5+6+8)

What I've tried =>

  1. We can easily sort array and just calculate it, it will be O(nlogn)

  2. We can count differences between number of elements in array and our indexes first and last and choose counted number of maximum elements or minimum and remove from our actual sum of the array.

For example: count sum of (n-last) maximum integers and then count sum of (first - 0) minimum integers and subtract from our actual sum, however it will not always be good idea, because finding this number of maximum or minimum integers in array can be expensive. Of course I can easily make some improvements such as calculate when it is better to take sum of (n-last) maximum numbers or only (last) maximum numbers.

What am I asking for is whether there is better solution to this problem then solving some equations and making huge amount of if's to improve it.

asked Apr 1, 2018 at 14:54
6
  • Why not a single pass? Just check if the element is between your bounds. Commented Apr 1, 2018 at 14:57
  • But I have indexes, not numbers in array Commented Apr 1, 2018 at 14:59
  • I see what your saying now. Your example threw me off a little with the first one being 3. Commented Apr 1, 2018 at 15:01
  • Since you are dealing with integers, you could essentially create your own type of ordered set. I think this would be more efficient. Commented Apr 1, 2018 at 15:05
  • 1
    @MartinBonner: It depends on whether the range of values is restricted. If it is, you can build a histogram in linear time, and then evaluate the histogram in O(M) time, linear in the range of values. Commented Apr 1, 2018 at 15:18

2 Answers 2

5

Take a look at the std::nth_element algorithm, which separates "first N" from "elements past N" without doing the extra work of sorting inside the two partitions.

For your purposes, you'll need to call nth_element twice. The second call will be on one of the partitions created in the first step, not the whole array. At the end, you'll have three partitions:

  • Elements less than those you need
  • Elements you need
  • Elements greater than those you need

and it typically does this in linear time, although worst-case is still O(N lg N)

answered Apr 1, 2018 at 15:15
1
  • This is an awesome approach. The benchmarks back it up as well. Commented Apr 1, 2018 at 23:55
1

Here is an approach that is faster than the proposed solution by the OP. Though not as elegant or general as the excellent solution provided by @BenVoigt, it is almost as fast.

double boundedSumJoe(std::vector<int> x, int lower, int upper) {
 int myMax = *std::max_element(x.begin(), x.end());
 int offSet = std::abs(*std::min_element(x.begin(), x.end())) + 1;
 unsigned long int myRange;
 if (myMax > 0)
 myRange = myMax + offSet; // E.g. if myMax = 10 & myMin = -2, then myRange = 12
 else
 myRange = offSet;
 offSet--;
 std::vector<int> frequency(myRange, 0);
 std::vector<int> values(myRange, 0);
 std::vector<int>::iterator it, itEnd = x.end();
 int myIndex;
 double mySum = 0;
 for (it = x.begin(); it < itEnd; it++) {
 myIndex = *it + offSet;
 frequency[myIndex]++;
 values[myIndex] = *it;
 }
 int count = 0;
 bool firstHit = true;
 for (std::size_t j = 0; j < myRange; j++) {
 if (frequency[j]) {
 if (count >= lower) {
 if (count <= upper) {
 firstHit = false;
 mySum += values[j] * frequency[j];
 } else {
 if ((count - upper) > 1) {
 int k = j - 1;
 while (!frequency[k]) {k--;}
 mySum -= (values[k] * (count - upper - 1));
 }
 break;
 }
 }
 count += frequency[j];
 if ((count - lower) >= 1 && firstHit) {
 firstHit = false;
 mySum += (values[j] * (count - lower));
 }
 }
 }
 return mySum;
}

We first create two vectors large enough to span the entire range of input values. One of them keeps the values from the input vector and the other keeps a tally of that value (frequency vector above). Elements are added in order as the index is made from the value itself.

We then loop over the frequency vector and sum up the resulting values between our two bounds. (削除) A shortcoming of the above method is that it generally will return incorrect results if there are duplicate values in the input vector. (削除ここまで) Thanks to the suggestions by @BenVoigt, the above can now handle input vectors with duplicate values. As you can see, some care is needed on the edges (hence the additional if ((count - upper) > 1) as well as the lines following if ((count - lower) >= 1 && firstHit)).

Here are some very basic benchmarks that truly show the power of the solution provided by @BenVoigt. First off, here is an implementation of the OP and an implementation using std::nth_element.

double boundedSumOP(std::vector<int> x, int lower, int upper) {
 double mySum = 0;
 std::sort(x.begin(), x.end());
 std::vector<int>::iterator it, itEnd = x.begin() + upper;
 for (it = x.begin() + lower; it <= itEnd; it++)
 mySum += *it;
 return mySum;
}
double boundedSumBen(std::vector<int> x, int lower, int upper) {
 double mySum = 0;
 // First partition vector at larger bound
 std::nth_element(x.begin(), x.begin() + upper, x.end());
 // Now create partition of above at lower bound
 std::nth_element(x.begin(), x.begin() + lower, x.begin() + upper);
 std::vector<int>::iterator it, itEnd = x.begin() + upper;
 for (it = x.begin() + lower; it <= itEnd; it++)
 mySum += *it;
 return mySum;
}

Here is the main function that is used for testing, somewhat crude I might add:

int main() {
 std::vector<int> v(200001);
 std::random_device rd;
 std::mt19937 gen(rd());
 std::iota(v.begin(), v.end(), -100000);
 std::shuffle(v.begin(), v.end(), gen);
 // random-sample without replacement
 std::vector<int> randVec(v.begin(), v.begin() + 100000);
 int val1, val2, val3;
 std::clock_t start_time, end_time;
 start_time = clock();
 for (std::size_t i = 0; i < 100; i++)
 val1 = boundedSumBen(randVec, 49900, 50100);
 end_time = clock();
 std::cout << "time taken on sample w/o rep std::nth_element : " <<
 end_time - start_time << std::endl;
 start_time = clock();
 for (std::size_t i = 0; i < 100; i++)
 val2 = boundedSumJoe(randVec, 49900, 50100);
 end_time = clock();
 std::cout << "time taken on sample w/o rep indexing method by Joe : " <<
 end_time - start_time << std::endl;
 start_time = clock();
 for (std::size_t i = 0; i < 100; i++)
 val3 = boundedSumOP(randVec, 49900, 50100);
 end_time = clock();
 std::cout << "time taken on sample w/o rep naive approach with std::sort : " <<
 end_time - start_time << std::endl;
 std::cout << "All functions on sample w/o rep return the same value of : " <<
 val1 << ", " << val2 << ", and " << val3 << std::endl;
 // Now we test a random sample with replacement
 std::uniform_int_distribution<int> distribution(-100000, 100000);
 for (std::size_t i = 0; i < 100000; i++)
 randVec[i] = distribution(gen);
 start_time = clock();
 for (std::size_t i = 0; i < 100; i++)
 val1 = boundedSumBen(randVec, 9900, 10100);
 end_time = clock();
 std::cout << "time taken on sample with rep std::nth_element : " <<
 end_time - start_time << std::endl;
 start_time = clock();
 for (std::size_t i = 0; i < 100; i++)
 val2 = boundedSumJoe(randVec, 9900, 10100);
 end_time = clock();
 std::cout << "time taken on sample with rep indexing method by Joe : " <<
 end_time - start_time << std::endl;
 start_time = clock();
 for (std::size_t i = 0; i < 100; i++)
 val3 = boundedSumOP(randVec, 9900, 10100);
 end_time = clock();
 std::cout << "time taken on sample with rep naive approach with std::sort : " <<
 end_time - start_time << std::endl;
 std::cout << "All functions on sample with rep return the same value of : " <<
 val1 << ", " << val2 << ", and " << val3 << std::endl;
 std::cout << "Number of unique elements in vector with replacement "
 << std::set<int>(randVec.begin(), randVec.end()).size()
 << std::endl;
 return 0;
}

And the results on my computer* (I'm using clang++) :

time taken on sample w/o rep std::nth_element : 109925
time taken on sample w/o rep indexing method by Joe : 110162
time taken on sample w/o rep naive approach with std::sort : 581368
All functions on sample w/o rep return the same value of : 38849, 38849, and 38849
time taken on sample with rep std::nth_element : 93542
time taken on sample with rep indexing method by Joe : 102780
time taken on sample with rep naive approach with std::sort : 517273
All functions on sample with rep return the same value of : -16069147, -16069147, and -16069147
Number of unique elements in vector with replacement 78605

As you can see, employing the std::nth_element provided by @BenVoigt is superior in terms of both speed and generality, while the indexing method is still quite a bit faster than the naive approach.

  • Here are the results from ideone (running gcc).
answered Apr 1, 2018 at 15:38
3
  • Note that the bitmask method can't handle repeated entries. Building a histogram that can represent repeats is not difficult (replace myBools[myIndex] = 1; with myBools[myIndex]++; and probably rename the variable to be more accurate) but the summation code becomes more complicated. Commented Apr 2, 2018 at 2:25
  • Also your use of nth_element appears to be wrong. It certainly differs from what I answered, that "The second call will be on one of the partitions created in the first step, not the whole array." Since you got the correct result, I'm suspicious that your test case might not be a valid test. Commented Apr 2, 2018 at 2:33
  • @BenVoigt, thanks for your suggestion of building a histogram. I will definitely keep that in mind. Also, sorry for mucking up the nth_element call. I have corrected it. Interestingly enough, before I corrected it an ran it on ideone, I was not getting the same values as the other two methods, but when I ran it on my own machine, it was working fine. I'm guessing the difference in compilers has something to do with it. Anywho, I really appreciate your comments and all of your great answers. I have learned so much reading your post on c++. Commented Apr 2, 2018 at 7:21

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.