In a sorted array, I am trying to find pairs that sum up to a certain value. I was wondering if anyone could help me improve my code in performance or memory. I believe my code is O(n). If there are other method for finding pairs the sum up to a certain value that is much more efficient, I am open to these methods!
static int[] intArray = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
public static ArrayList<String> pairSum(int[] j, int sum) {
int len = j.length;
int pointer1;
int pointer2;
ArrayList<String> storage = new ArrayList<String>();
for (int i = 0; i < len; i++) {
pointer1 = j[i];
for (int k = len - 1; k >= 0; k--) {
pointer2 = j[k];
if (pointer1 + pointer2 == sum && pointer1 != pointer2) {
i++;
String pair = Integer.toString(pointer1) + " and " + Integer.toString(pointer2) ;
storage.add(pair);
}
}
}
return storage;
}
If I ran this code, say pairSum(intArray,16)
, I would get:
[0 and 16, 2 and 14, 4 and 12, 6 and 10, 9 and 7, 11 and 5, 13 and 3, 15 and 1]
If I remove the i++
, the output would be:
[0 and 16, 1 and 15, 2 and 14, 3 and 13, 4 and 12, 5 and 11, 6 and 10, 7 and 9, 9 and 7, 10 and 6, 11 and 5, 12 and 4, 13 and 3, 14 and 2, 15 and 1, 16 and 0]
3 Answers 3
EDIT: Please see update at end because there is an improved solution that is inspired by Mike Chamberlain's answer...
There are two solutions which come to mind for your problem, but first, let's look through your code:
- you have the variables
pointer1
andpointer2
. These are very descriptive names, which is great, but unfortunately they are not pointers. They are the values at thei
andk
positions in the array. In other words,i
andk
are the actual pointers. I would callpointer1
andpointer2
something likeval1
andval2
. But, in fact, I would drop the pointers entirely, and just use an index in to the data (why copy the data when you don't need to....? ) - In most languages the variable name
i
is 'reserved' for indexed looping through data, just like you havefor (int i = 0; i < len; i++) {...}
. It is standard to usej
andk
as the variable names for any inner/nested loops you need.i
is the top level,j
is the first nested level,k
is the subsequent level, and, if you need more than that, you should be calling a function/method (Never usel
as a variable name, it is too easy to confuse with1
). You are breaking this convention by usingj
as an array, and going immediately tok
inside your loop. My suggestion is to renamej
to bedata
, and to renamek
to bej
. - as a general rule, when you have a loop constrained by an index, like you have
for (int i = 0; i < len; i++) { ...
, you should never modify the index inside the loop. I cannot think of a reason why it should ever be necessary, and, certainly, it should never be conditionally modified (inside an if-statement inside another loop). This will lead to bugs. In fact, the simple fact that you have that inneri++
tells me you do not understand your own code's behaviour. - Your method should return the interface type
List<String>
instead of the concrete classArrayList<String>
since there is no reason for outsiders to know the physical implementation. storage
is not a great name for a variable.... everything could be calledstorage
since all variables store things... I would choose something like:results
.- Your inner
if
condition makes sure that the sums are a match, but also checkspointer1 != pointer2
. This second check is a problem, because it should not be testing the values, but the indexes.... For example what if the data is[4,4]
and the target sum is8
? - To get your result pair as a String you do:
String pair = Integer.toString(pointer1) + " and " + Integer.toString(pointer2)
.... this is serious overkill, try this instead:String pair = pointer1 + " and " + pointer2
. In Java the+
String operator will implicitly convert the integers to a String... no need to do it yourself.
So, you have some style problems, and some concerns about your inner workings.
Now, let's talk about your algorithm..... It loops through each value. For each value, it then compares it against every other value (except itself). So, you loop n
times, and, for each n
you do m
checks. Your complexity is O(n x m), but, in reality, n
and m
are the same, so your complexity is O(n2)
Now, about that i++
inside the k
loop.... the reason you need it is because your k
loop is starting from the wrong place. your algorithm is wrong. Let me explain.
If you have the data [ 0, 1, 2, 3, 4, 5 ]
, you are using your i
index to loop through all the data. Your first data will be 0
. You sum this with 5
(Note, you have summed 0 + 5
now), and check the result, then with 4
, and so on. When you are done, you move i
to the value 1
, and you keep moving i
until you get to the value 5
for i
. You then compare **this value 5
to all the other values, including 0
. This is **the second time you sum the items 0 + 5
(but it is now 5 + 0
) **.
You need to change the algorithm to only scan those values that have not yet been scanned. The easiest way to do this is to modify your inner/nested loop. Consider your code:
for (int i = 0; i < len; i++) { ... for (int k = len - 1; k >= 0; k--) { ...
Now, we change the inner loop as follows:
for (int i = 0; i < len; i++) {
....
for (int j = i + 1; j < len; j++) {
(note, I have renamed k
to be j
). We start our j
index at i + 1
, because we don't need to sum values before i
since that sum was done already when i
was smaller.....
Now, our loops only calculate the sum
value once for each pair of values.
Unfortunately, our complexity is still O(n2) because even though our inner loop is only testing half (on average) the values, we have not actually changed the way the algorithm scales (if you double the input data, you with quadruple the execution time).
Still, even with this algorithm change, and some variable renaming, and some bug fixing, your code will look like:
public static List<String> pairSum(int[] data, int sum) {
int len = data.length;
int val1;
int val2;
List<String> results = new ArrayList<String>();
for (int i = 0; i < len; i++) {
val1 = data[i];
for (int j = i + 1; j < len; j++) {
val2 = data[j];
if (val1 + val2 == sum) {
String pair = val1 + " and " + val2;
results.add(pair);
}
}
}
return results;
}
This is OK, but my preference would be to strip all the val1
variables entirely, and you can simplify some other things too. Consider the following more 'compact' code:
public static List<String> pairSum(int[] data, int sum) {
List<String> results = new ArrayList<String>();
for (int i = 0; i < data.length; i++) {
for (int j = i + 1; j < data.length; j++) {
if (data[i] + data[j] == sum) {
results.add(data[i] + " and " + data[j]);
}
}
}
return results;
}
That is what I would recommend for a simple, easy-to-read option that is O(n2). For small data sets (say less than 50 or so), this solution will be great.
But, if you have more data, a better solution would be more complicated, but faster.
In your example code, you have an input test array that is all positive, sorted, and the values are unique. Let's assume that all the data will be this way (we can modify this assumption later).
Consider the algorithm that does:
take all values that are less than half the target `sum`.
find a value that is the difference
It would look like:
public static List<String> pairSumFast(int[] data, int sum) {
List<String> results = new ArrayList<String>();
int limit = sum / 2;
for (int i = 0; i < data.length && data[i] <= limit; i++) {
int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
if (match >= 0) {
results.add(data[i] + " and " + data[match]);
}
}
return results;
}
This has the complexity of O( nlog(n) ) for the loop, and the binary-search respectively.
If your data set is large, and the values are unique, but the data is unsorted, then I would recommend the change:
public static List<String> pairSumFast(int[] indata, int sum) {
List<String> results = new ArrayList<String>();
int[] data = Arrays.copyOf(indata, indata.length);
Arrays.sort(data);
int limit = sum / 2;
for (int i = 0; i < data.length && data[i] <= limit; i++) {
int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
if (match >= 0) {
results.add(data[i] + " and " + data[match]);
}
}
return results;
}
This is O(n log(n) ) for the complexity still... (it scales this way, even through it does more computation....)
Finally, if the data is not unique, then you will get duplicate output values. To accommodate that, I would add an inner loop to the match....:
public static List<String> pairSumFast(int[] data, int sum) {
List<String> results = new ArrayList<String>();
int limit = sum / 2;
for (int i = 0; i < data.length && data[i] <= limit; i++) {
int match = Arrays.binarySearch(data, i + 1, data.length, sum - data[i]);
if (match >= 0) {
String pair = data[i] + " and " + data[match];
results.add(pair);
// binary search will return the index of one match, look around to find others.
int more = match - 1;
while (more >=0 && data[more] == data[match]) {
results.add(pair);
more--;
}
more = match + 1;
while (more < data.length && data[more] == data[match]) {
results.add(pair);
more++;
}
}
}
return results;
}
EDIT: A better solution...
Mike Chamberlain suggests using a Map to preserve the values you have seen in order to match the pairs in a single O(n) iteration of the list.
This single-iteration made me realize that there is a much simpler solution to this problem... iterate from both sides of the data.... consider the following:
public static List<String> pairSumFastest(int[] data, int sum) {
ArrayList<String> results = new ArrayList<String>();
int low = 0;
int high = data.length - 1;
while (low < high) {
while (low < high && data[low] + data[high] < sum) {
low++;
}
while (low < high && data[low] + data[high] > sum) {
high--;
}
if (low < high) {
results.add( data[low] + " and " + data[high]);
low++;
high--;
}
}
return results;
}
This has true O(n) complexity, and has no memory overhead either.
-
\$\begingroup\$ Much more comprehensive than my answer. I hope the OP gives this one the checkmark. \$\endgroup\$Wayne Conrad– Wayne Conrad2014年01月10日 14:08:16 +00:00Commented Jan 10, 2014 at 14:08
-
\$\begingroup\$ Yes thank you so much for taking the time to write this ! I really appreciate it! I learn a lot!! @WayneConrad thank you for your answer as well! \$\endgroup\$Liondancer– Liondancer2014年01月10日 14:10:25 +00:00Commented Jan 10, 2014 at 14:10
-
\$\begingroup\$ Thanks! I was 2-thirds through the answer when I saw your's pop up .... and I figured I covered things you had not, so I posted.... That's why there's a lot of duplication between our answers, sorry ;-) \$\endgroup\$rolfl– rolfl2014年01月10日 14:10:40 +00:00Commented Jan 10, 2014 at 14:10
-
\$\begingroup\$ @rolfi, One of the things I like about codereview is that the fastest gun doesn't always win. I've been out of the Java game for a long time, so I'm glad someone who knows it well was able to cover what I missed. \$\endgroup\$Wayne Conrad– Wayne Conrad2014年01月10日 14:16:46 +00:00Commented Jan 10, 2014 at 14:16
-
\$\begingroup\$ A few things I saw, the
results
declaration should beList<String>
,int more
can be directly set and there's a stray space atmore --
. \$\endgroup\$Bobby– Bobby2014年01月10日 14:30:48 +00:00Commented Jan 10, 2014 at 14:30
Code improvements
Before addressing efficiency, let's see if there are any improvements that could be made to the code.
I suspect some of the names could be improved.
pointer1
->value1
pointer2
->value2
j
->values
storage
->value_pairs
pairSum
->pairsThatAddUpToASum
I would consider getting rid of this temporary:
int len = j.length;
as it's not carrying its weight. Using j.length
inline is not burdensome, and the compiler/runtime are likely to produce adequately fast code without it.
Lastly, I wonder about the decision to return the pairs as strings. This depends upon how the function is being used, but I would often rather return the pairs of numbers, and let the caller worry about formatting. This decouples the function from I/O.
A bug?
This term in the if:
pointer1 != pointer2
Might be a bug. If the list is guaranteed to contain no duplicates, it will work fine (but might be more communicative as i != k
). If, however, the list may contain duplicates, and it is the intent of the code to return the resulting duplicate pairs, then this must be i != k
.
Efficiency.
In order to be O(n), the algorithm must visit each element a constant number of times. We typically think of a O(n) algorithm as one that visits each element once. In this algorithm, both the outer and inner loop visit each element. That makes this an O(n^2) algorithm.
A google search for "efficiently finding pairs that add up to a sum" finds, not surprisingly, several Stack Overflow questions. Here's one of them that may be useful to you: https://stackoverflow.com/questions/1494130/design-an-algorithm-to-find-all-pairs-of-integers-within-an-array-which-sum-to-a
-
\$\begingroup\$ Thanks for this analysis! My code is O(n^2) didnt realize this! \$\endgroup\$Liondancer– Liondancer2014年01月10日 13:54:17 +00:00Commented Jan 10, 2014 at 13:54
-
\$\begingroup\$ Make that
i != k
, asj
currently refers to the array. \$\endgroup\$nmclean– nmclean2014年01月10日 13:55:03 +00:00Commented Jan 10, 2014 at 13:55
I believe you can do this in O(n).
The algorithm below loops through the list only once. It keeps track of the numbers it's seen using a HashSet. For each number, it works out its required complement. If it's seen the complement already then it's found a match. If not, it adds the number to the set - we may see its complement later.
This actually works for unordered as well as ordered data, with the same performance characteristics. It also works for non-unique values (though we get n-1 repeated pairs of x for n repeated values of x - possibly not what you want, but a final stage could be to remove the duplicates if you so wished).
Note that the order of the pairs returned, and indeed the order of two numbers in a pair, may not be the same as your original algorithm. It wasn't specified if this was important, but I'm guessing not.
This reduced complexity comes at the expense of memory, as it must keep track of the numbers its seen in a separate data structure.
One thing I would point out is that the algorithm is more flexible if it returns raw data, rather than formatted strings. Pretty-printing can be done by the caller.
I've done this in C# because that's what I'm familiar with. I guess it should be trivial to convert to Java.
using System;
using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
namespace FindSummingPairs
{
public static class Algorithm
{
public static Tuple<int, int>[] FindSummingPairs(int[] input, int sumTarget)
{
var result = new List<Tuple<int, int>>();
// holds the numbers we have seen so far
var seenValues = new HashSet<int>();
// loop through our input
foreach(var number in input)
{
// work out the required value to make our target sum
var required = sumTarget - number;
// have we seen our required value in the list already?
if (seenValues.Contains(required))
{
// score!
result.Add(new Tuple<int, int>(number, required));
}
else
{
// not found, so add it to the set - we might see its complement later
seenValues.Add(number);
}
}
return result.ToArray();
}
}
[TestFixture]
internal class Tests
{
[Test]
public void EmptyInput()
{
var input = new int[0];
var result = Algorithm.FindSummingPairs(input, 0);
Assert.False(result.Any());
}
[Test]
public void SingleNumberAsInput()
{
var input = new[] {1};
var result = Algorithm.FindSummingPairs(input, 2);
Assert.IsTrue(result.Length == 0);
}
[Test]
public void TwoNumbersThatDontSum()
{
var input = new[] {1, 2};
var result = Algorithm.FindSummingPairs(input, 2);
Assert.IsTrue(result.Length == 0);
}
[Test]
public void TwoNumbersThatDoSum()
{
var input = new[] {1, 2};
var result = Algorithm.FindSummingPairs(input, 3);
var expected = new[] { new Tuple<int, int>(1, 2) };
Assert.IsTrue(AreEquivalent(expected, result));
}
[Test]
public void LotsOfNumbers()
{
var input = new[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
var result = Algorithm.FindSummingPairs(input, 10);
var expected = new[]
{
new Tuple<int, int>(1, 9),
new Tuple<int, int>(2, 8),
new Tuple<int, int>(3, 7),
new Tuple<int, int>(4, 6)
};
Assert.IsTrue(AreEquivalent(expected, result));
}
[Test]
public void LotsOfNumbersThatDontSum()
{
var input = new[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
var result = Algorithm.FindSummingPairs(input, 20);
Assert.IsTrue(result.Length == 0);
}
[Test]
public void TestCaseFromQuestion()
{
var input = new[] {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
var result = Algorithm.FindSummingPairs(input, 16);
var expected = new[]
{
new Tuple<int, int>(0, 16),
new Tuple<int, int>(1, 15),
new Tuple<int, int>(2, 14),
new Tuple<int, int>(3, 13),
new Tuple<int, int>(4, 12),
new Tuple<int, int>(5, 11),
new Tuple<int, int>(6, 10),
new Tuple<int, int>(7, 9)
};
Assert.IsTrue(AreEquivalent(expected, result));
}
// calculates whether two pair arrays are equivalent, without respect to the order of the pairs
private bool AreEquivalent(Tuple<int, int>[] e1, Tuple<int, int>[] e2)
{
if (e1.Length != e2.Length)
return false;
foreach(var pair1 in e1)
{
foreach (var pair2 in e2)
{
if (AreEquivalent(pair1, pair2))
return true;
}
}
return false;
}
// calculates whether 2 pairs are equivalent, without respect to the order of the numbers
private bool AreEquivalent(Tuple<int, int> pair1, Tuple<int, int> pair2)
{
return (pair1.Item1 == pair2.Item1 && pair1.Item2 == pair2.Item2)
|| (pair1.Item2 == pair2.Item1 && pair1.Item2 == pair2.Item1);
}
}
}
i++;
produces the wrong result. I'd recommend you remove it from your question, since the code under review out to run and produce correct results. Why is it there? \$\endgroup\$