I'm working on a project to compute pairs of complex numbers and need to delete the duplicates (within some tolerance, 10^-5 below). I was wondering if the performance of my compiled version can be improved? I'll be working with up to a million or so number pairs so any improvement would be helpful. Thanks.
compiledDeleteDuplicates =
Compile[{{list, _Complex, 2}, {tolerance, _Real}},
Module[{unique = Table[{0. I, 0. I}, {Length@list}], n = 0, i = 0,
j = 0},
Do[j = 1;
While[j <= n && Max@Abs[unique[[j]] - list[[i]]] >= tolerance, j++];
If[j > n, n++;
unique[[n]] = list[[i]]
];, {i, 1, Length[list]}];
unique[[1 ;; n]]], Parallelization -> True];
Now generate a random list of pairs of complex numbers {{a,b},{c,d},...{e,f}} for demo and compare built-in to compiled version:
testList1 = RandomComplex[{-1 - I, 1 + I}, {5000}];
testList2 = RandomComplex[{-1 - I, 1 + I}, {5000}];
theValues = MapThread[{#1, #2} &, {testList1, testList2}];
AbsoluteTiming[
theVals=DeleteDuplicates[theValues,Max@Abs[#1-#2]<10^-5&];
]
AbsoluteTiming[
theVals=compiledDeleteDuplicates[theValues,10^-5];
]
{38.0958, Null}
{5.79203, Null}
I'm running ver. 12.1.
2 Answers 2
It appears that your desired distance function corresponds to the ChessboardDistance (defined as Max[Abs[u-v]])
For most of the built-in distance functions, there are some sophisticated data structures that Nearest can take advantage of to avoid having to check the entire list on each pass.
In the attempt below, we use Nearest to compute the distances to the two nearest points (the nearest point is the point itself) using the ChessboardDistance. We then see if the last of these two points (the non-self point) is within the tolerance. This defines a list of points that satisfies the criterion. We then use Pick to pull out those satisfactory elements.
alt[list_, tolerance_] := With[
{satisfies = (Last[#] > tolerance) & /@
Nearest[list -> "Distance", list, 2, DistanceFunction -> ChessboardDistance]},
Pick[list, satisfies]
]
(*demonstration using your example inputs*)
AbsoluteTiming[
theVals2 = alt[theValues, 10^-5];]. (*{0.044979, Null}*)
This appears to achieve the same output in a fraction of the time of your compiled example.
N.B. You could generate theValues more concisely and directly as theValues = RandomComplex[{-1 - I, 1 + I}, {5000, 2}], avoiding the creation of the intermediate lists.
-
1$\begingroup$
Nearestis definitely the function to use for this. Upvoted. $\endgroup$Daniel Lichtblau– Daniel Lichtblau2020年12月23日 18:34:40 +00:00Commented Dec 23, 2020 at 18:34 -
1$\begingroup$ +1. Note that you have provided
altwith a tolerance argument but you aren't using it. $\endgroup$Simon Woods– Simon Woods2020年12月24日 18:32:33 +00:00Commented Dec 24, 2020 at 18:32 -
$\begingroup$ good catch @SimonW $\endgroup$Joshua Schrier– Joshua Schrier2021年01月02日 21:02:16 +00:00Commented Jan 2, 2021 at 21:02
The "alt" routine seems to be deleting both duplicate values. First I correct the syntax error above replacing 10^-5 with the variable tolerance:
alt[list_,tolerance_]:=
With[{satisfies=(Last[#]>tolerance)&/@Nearest[list->"Distance",
list,2,DistanceFunction->ChessboardDistance]},Pick[list,satisfies]]
Then use the following list of 10 values saved as "theList":
$$ \left( \begin{array}{cc} -1.58459-0.0424956 i & 0.00861865,円 +0.593453 i \\ -1.54092+0.00728155 i & 0.0462379,円 -0.597649 i \\ -1.5379+0.0344158 i & -0.0618265-0.871121 i \\ -1.5379+0.0344158 i & -0.0618265-0.871121 i \\ -1.46272-0.020357 i & 0.0714506,円 -0.530534 i \\ -1.40138+0.0815851 i & 0.13912,円 +0.420419 i \\ -1.40056+0.0689702 i & 0.125141,円 +0.420454 i \\ -1.08562+0.366505 i & 0.530368,円 +0.376754 i \\ -0.0618265-0.871121 i & -1.08562+0.366505 i \\ 0.00861865,円 +0.593453 i & 1.16239,円 +0.263626 i \\ \end{array} \right) $$ Note the duplicate value {-1.53+0.0344,-0.016-0.87}. DeleteDuplicates does what is expected: removing all but one of the duplicate values:
AbsoluteTiming[
val1=DeleteDuplicates[theList,Max@Abs[#1-#2]<10^-5&];
]
val1//N//MatrixForm
$$ \left( \begin{array}{cc} -1.58459-0.0424956 i & 0.00861865,円 +0.593453 i \\ -1.54092+0.00728155 i & 0.0462379,円 -0.597649 i \\ -1.5379+0.0344158 i & -0.0618265-0.871121 i \\ -1.46272-0.020357 i & 0.0714506,円 -0.530534 i \\ -1.40138+0.0815851 i & 0.13912,円 +0.420419 i \\ -1.40056+0.0689702 i & 0.125141,円 +0.420454 i \\ -1.08562+0.366505 i & 0.530368,円 +0.376754 i \\ -0.0618265-0.871121 i & -1.08562+0.366505 i \\ 0.00861865,円 +0.593453 i & 1.16239,円 +0.263626 i \\ \end{array} \right) $$
but the alt routine deletes both values:
AbsoluteTiming[
val2 = alt[theList, 10^-5];
]
val2 // N // MatrixForm
$$ \left( \begin{array}{cc} -1.58459-0.0424956 i & 0.00861865,円 +0.593453 i \\ -1.54092+0.00728155 i & 0.0462379,円 -0.597649 i \\ -1.46272-0.020357 i & 0.0714506,円 -0.530534 i \\ -1.40138+0.0815851 i & 0.13912,円 +0.420419 i \\ -1.40056+0.0689702 i & 0.125141,円 +0.420454 i \\ -1.08562+0.366505 i & 0.530368,円 +0.376754 i \\ -0.0618265-0.871121 i & -1.08562+0.366505 i \\ 0.00861865,円 +0.593453 i & 1.16239,円 +0.263626 i \\ \end{array} \right) $$ And the code, albeit short, is a little difficult for me to understand so I haven't figured out how to change it to behave like the built-in DeleteDuplicates. Sorry if I didn't state that objective in the description.
Also, I noticed in the description of Nearest, that the "2" in the call represents the two nearest values. However, in my list, there may be many, many duplicates of the same value.
Update:
Sorry, I should have just posted this behavior:
In[162]:= testData = {{1, 1}, {2, 2}, {3, 3}, {2, 2}}
alt[testData, 10^-5]
Out[162]= {{1, 1}, {2, 2}, {3, 3}, {2, 2}}
Out[163]= {{1, 1}, {3, 3}}
The desired output should be {{1,1},{2,2},{3,3}}
Explore related questions
See similar questions with these tags.
theValues = RandomComplex[{-1 - I, 1 + I}, {5000, 2}]? $\endgroup$