I have two large lists:
list1 = {{a,v},{a,h},{b,v},{b,k},{c,t},{c,r},{d,r}};
list2 = {{v,dc},{v,rt},{h,kl},{h,oi},{h,po},{k,ö},{k,dc},{t,re},{r,qw},{r,ay},{r,ül}};
Now I want a list that matches both lists so that the first element of each row in list1 is matched with the second element of each row in list 2 based on the same values of second element of each row in list1 and the first element of each row in list2. For example list3 should yield
{{a,dc},{a,rt},{a,kl},{a,oi},{a,po},...}
I have developed a very clumsy (and very very slow) solution with nested tables so I am sure there is a faster one? I have also tried something with Associations but this will either lead to a kernel crash or loss of data (as duplicate keys will be removed).
Thank you
9 Answers 9
Using a connection with graphs:
gr = Graph[Join[DirectedEdge @@@ list1, DirectedEdge @@@ list2],
VertexLabels -> "Name"]
Build a list using VertexOutComponent:
Flatten[Cases[VertexOutComponent[gr, #, {2}], val_ -> {#, val}] & /@ VertexList[gr], 1]
(* {{a, rt}, {a, kl}, {a, dc}, {a, oi}, {a, po}, {b, ö}, {b, rt}, {b, dc},
{c, qw}, {c, ül}, {c, re}, {c, ay}, {d, qw}, {d, ül}, {d, ay}} *)
Faster version would be
Cases[VertexOutComponent[gr, #, {2}], val_ -> {#, val}] & /@ DeleteDuplicates[list1[[All, 1]]] // Catenate
Timing study
Below I report timing results for all codes presented here. The time-measuring routine:
timingTest[n_, p_, merge_] := Module[{a, l, m = Round[p*n], res},
Do[
l[k] =
Transpose[{n*(k - 1) + RandomInteger[{1, n}, m],
n*k + RandomInteger[{1, n}, m]}], {k, 2}];
Timing[res = merge[l[1], l[2]];] // First
]
Collection of computational routines:
mergeY[list1_, list2_] := Module[{gr, ivert},
ivert = DeleteDuplicates[list1[[All, 1]]];
gr = Graph[Join[DirectedEdge @@@ list1, DirectedEdge @@@ list2]];
Catenate[
Cases[VertexOutComponent[gr, #, {2}], val_ -> {#, val}] & /@
ivert]
]
mergeL[list1_, list2_] := Module[{},
Cases[Tuples[{list1, list2}], {{a_, x_}, {x_, b_}} :> {a, b}]
]
mergeK[list1_, list2_] := Module[{},
Catenate[
Values[Merge[{GroupBy[list1, Last -> First],
GroupBy[list2, First -> Last]}, Tuples]]]
]
mergeM[list1_, list2_] := Module[{},
Outer[If[#1[[2]] === #2[[1]], {#1[[1]], #2[[2]]}, Nothing] &, list1,
list2, 1]~Flatten~1 // Sort
]
mergeU[list1_, list2_] := Module[{},
Thread[{#1, ReplaceList[#2, Rule @@@ list2]}] & @@@ list1 // Catenate
]
mergeV[list1_, list2_] := Module[{la1, la2},
la1 = Map[AssociationThread]@Tuples[{{1, 0}} -> list1];
la2 = Map[AssociationThread]@Tuples[{{0, 2}} -> list2];
Map[Values@*KeyDrop[0]]@JoinAcross[la1, la2, Key[0], "Inner"]
]
mergeR[list1_, list2_] := Module[{},
Flatten[Map[ReplaceList[{f_, #} :> {f, #2} & @@@ list2], list1], 1]
]
mergeLL[list1_, list2_] := Module[{},
Catenate[
Thread /@
MapAt[Lookup[GroupBy[list2, First -> Last], #, {}] &,
list1, {All, 2}]]
]
Running tests:
nmn = 6;
nmx = 13;
pval = 0.999;
tY = Table[{2^n, timingTest[2^n, pval, mergeY]}, {n, nmn, nmx}]
tL = Table[{2^n, timingTest[2^n, pval, mergeL]}, {n, nmn, nmx}]
tM = Table[{2^n, timingTest[2^n, pval, mergeM]}, {n, nmn, nmx}]
tK = Table[{2^n, timingTest[2^n, pval, mergeK]}, {n, nmn, nmx}]
tU = Table[{2^n, timingTest[2^n, pval, mergeU]}, {n, nmn, nmx}]
tV = Table[{2^n, timingTest[2^n, pval, mergeV]}, {n, nmn, nmx}]
tR = Table[{2^n, timingTest[2^n, pval, mergeR]}, {n, nmn, nmx}]
tLL = Table[{2^n, timingTest[2^n, pval, mergeLL]}, {n, nmn, nmx}]
Plot:
ListLogLogPlot[{tY, tL, tM, tK, tU, tV, tR, tLL}, Joined -> True,
PlotLabels -> {"yarchik", "lericr", "march", "Vitaliy Kaurov",
"user1066", "vindobona", "Roma Lee", "Lukas Lang"}]
Conclusion
It seems, @VitaliyKaurov code is the fastest, however, it sometimes returns incorrect results. Would be nice if it can be fixed and compared. Next in speed is the code of @vindobona using JoinAcross and the 3rd place goes to the present code using VertexOutComponent. The code of vindobona seems to demonstrate a consistent linear scaling, whereas the majority of codes demonstrate quadratic scaling. According to the documentation, VertexOutComponent "works with large graphs", but probably this is not enough as the following larger study shows.
nmn = 6;
nmx = 16;
pval = 0.999;
tY = Table[{2^n, timingTest[2^n, pval, mergeY]}, {n, nmn, nmx, 2}]
tV = Table[{2^n, timingTest[2^n, pval, mergeV]}, {n, nmn, nmx, 2}]
ListLogLogPlot[{tY, tV}, Joined -> True,
PlotLabels -> {"yarchik", "vindobona"}]
Function with progress indicator
mergeY[list1_, list2_] := Module[{gr, ivert, res},
ivert = DeleteDuplicates[list1[[All, 1]]];
gr = Graph[Join[DirectedEdge @@@ list1, DirectedEdge @@@ list2]];
res = Monitor[
Table[Cases[VertexOutComponent[gr, ivert[[i]], {2}],
val_ -> {ivert[[i]], val}], {i, Length[ivert]}],
Row[{ProgressIndicator[i, {1, Length[ivert]}], i}, " "]];
Catenate[res]
]
-
1$\begingroup$ Actually this is very nice as my data is basically from an edgelist but I want to "drop" the middle level $\endgroup$M.A.– M.A.2025年03月08日 13:06:10 +00:00Commented Mar 8 at 13:06
-
$\begingroup$ While this may not pertain to the original poster’s specific use case, in general, given
list1 = {{0, 1}}andlist2 = {{0, 42}, {1, 42}}, I would have expected the output to be{{0, 42}}rather than{}. $\endgroup$vindobona– vindobona2025年03月09日 17:41:08 +00:00Commented Mar 9 at 17:41 -
1$\begingroup$ @vindobona Indeed, my assumption was that
list1[[;;,1]]andlist1[[;;,2]]are distinct. If this is not the case, it is easy to fix:Cases[Intersection[VertexOutComponent[gr, #, 2], list2[[All, 2]]], val_ -> {#, val}] & /@ DeleteDuplicates[list1[[All, 1]]] // Catenate$\endgroup$yarchik– yarchik2025年03月09日 19:08:36 +00:00Commented Mar 9 at 19:08 -
1$\begingroup$ @M.A. I compared, and it is one of the fastest solutions. I will add timing study to my post. $\endgroup$yarchik– yarchik2025年03月12日 06:49:20 +00:00Commented Mar 12 at 6:49
-
1$\begingroup$ @M.A. Yes, please see my edit $\endgroup$yarchik– yarchik2025年03月14日 09:54:19 +00:00Commented Mar 14 at 9:54
Perhaps this:
list1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
list2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k,
dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
list3 = Catenate[Values[
Merge[{GroupBy[list1, Last -> First], GroupBy[list2, First -> Last]},
Tuples]
]]
Out[]= {{a, dc}, {a, rt}, {b, dc}, {b, rt}, {a, kl}, {a, oi}, {a, po}, {b, ö}, {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, {d, ay}, {d, ül}}
-
$\begingroup$ Consider
list1={{1, 12}, {2, 13}}andlist2={{13, 24}}. Your code returns{{1}, {2, 24}}. I think it is not correct. $\endgroup$yarchik– yarchik2025年03月07日 20:13:46 +00:00Commented Mar 7 at 20:13 -
$\begingroup$ I included your code in the timing study. Your code is the winner. But do you know how to amend it to work on corner cases? $\endgroup$yarchik– yarchik2025年03月12日 08:39:33 +00:00Commented Mar 12 at 8:39
-
$\begingroup$ @yarchik i should find a moment to look into it. hope to do it soon. $\endgroup$Vitaliy Kaurov– Vitaliy Kaurov2025年03月13日 05:57:55 +00:00Commented Mar 13 at 5:57
Maybe something like
Tuples[{list1, list2}] /. {{{a_, x_}, {x_, b_}} :> {a, b}, {_List, _List} -> Nothing}
(* {{a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}, {b, dc}, {b, rt}, {b, ö},
{b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, {d, ay}, {d, ül}} *)
Alternatively (based on comments):
Cases[Tuples[{list1, list2}], {{a_, x_}, {x_, b_}} :> {a, b}]
-
$\begingroup$ Could also use
Cases, then you don't need the second rule $\endgroup$Lukas Lang– Lukas Lang2025年03月06日 08:38:33 +00:00Commented Mar 6 at 8:38 -
$\begingroup$ @LukasLang oh yeah, i should have thought of that! $\endgroup$lericr– lericr2025年03月06日 15:04:19 +00:00Commented Mar 6 at 15:04
-
$\begingroup$ Consider
list1 = {{1, 12}, {2, 13}}; list2 = {{13, 24}}. Your code returnsNothing, which is not correct. $\endgroup$yarchik– yarchik2025年03月07日 20:18:53 +00:00Commented Mar 7 at 20:18
Comparing every element of list1 to every element of list sounds like a job for Outer:
Outer[If[#1[[2]] === #2[[1]], {#1[[1]], #2[[2]]}, Nothing] &, list1, list2, 1]~Flatten~1 // Sort
Using JoinAcross :
list1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
list2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k, dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
la1 = Map[AssociationThread]@Tuples[{{1, 0}} -> list1];
la2 = Map[AssociationThread]@Tuples[{{0, 2}} -> list2];
Map[Values@*KeyDrop[0]]@JoinAcross[la1, la2, Key[0], "Inner"]
(*
{ {a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}
, {b, dc}, {b, rt}, {b, ö}, {b, dc}
, {c, re}, {c, qw}, {c, ay}, {c, ül}
, {d, qw}, {d, ay}, {d, ül} }
*)
Thread[{#1, ReplaceList[#2, Rule @@@ list2]}] & @@@ list1 // Catenate
(* {{a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}, {b, dc}, {b, rt},
{b, ö}, {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw},
{d, ay}, {d, ül}} *)
I have developed a very clumsy (and very very slow) solution with nested tables so
You must have done something wrong then. Do loop is very fast in Mathematica. Here it is. Comparing to other solutions it did OK. It is n^2 order where n is length of of the list (assuming roughly equal size lists).
If you have much larger lists to try it on, feel free to do that and compare or post them. The timings below is not really representative as your lists are too small, just gives one a rough idea.
Also, A Do loop is clear as any one can read a loop :)
L1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
L2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k,
dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
Now
RepeatedTiming@First@Last@Reap@Do[
currentL1=L1[[n]];
Do[ currentL2=L2[[m]];
If[Last@currentL1===First@currentL2,
Sow[{First@currentL1,Last@currentL2}]
]
,{m,1,Length[L2]}
]
,{n,1,Length[L1]}
]
RepeatedTiming@
Outer[If[#1[[2]] === #2[[1]], {#1[[1]], #2[[2]]}, Nothing] &, L1,
L2, 1]~Flatten~1 // Sort
RepeatedTiming@
Tuples[{L1, L2}] /. {{{a_, x_}, {x_, b_}} :> {a,
b}, {_List, _List} -> Nothing}
RepeatedTiming@
Catenate[
Values[Merge[{GroupBy[L1, Last -> First], GroupBy[L2, First -> Last]},
Tuples]]]
RepeatedTiming[
Thread /@ MapAt[GroupBy[L2, First -> Last], L1, {All, 2}]]
-
$\begingroup$ I used simply a combination of
Tablecombined withMemberQandIf. I would feel somewhat uncomfortable posting such a dilettantish solution.... ;) $\endgroup$M.A.– M.A.2025年03月06日 14:19:27 +00:00Commented Mar 6 at 14:19 -
$\begingroup$ I believe it is decisive to compare scaling with the list size, one point does not say much about the performance of the methods. $\endgroup$yarchik– yarchik2025年03月07日 16:49:11 +00:00Commented Mar 7 at 16:49
-
$\begingroup$ Consider
list1 = {{1, 12}}; list2 = {{13, 24}}. Your code returnsFirst[{}], which is not correct. $\endgroup$yarchik– yarchik2025年03月07日 20:26:18 +00:00Commented Mar 7 at 20:26
list1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
list2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k,
dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
Catenate[
Thread /@
MapAt[
Lookup[GroupBy[list2, First -> Last], #, {}] &,
list1,
{All, 2}
]
]
This builds a lookup table using GroupBy (same as @Vitaliy Kaurov's answer), and then uses MapAt and Lookup (with {} as the default, thanks @yarchik!) to get the items corresponding to the second elements of the items in list1. Finally, Thread is used to expand the elements into separate entries in the result. Finally, Catenate is used to flatten the outermost level of the list.
-
$\begingroup$ Consider
list1={{1, 12}, {2, 13}}andlist2={{13, 24}}. Your code returns{{1, Missing["KeyAbsent", 12]}, {{2, 24}}}, which is not correct. $\endgroup$yarchik– yarchik2025年03月07日 20:15:24 +00:00Commented Mar 7 at 20:15 -
$\begingroup$ @yarchik Good catch! Should be fixed now $\endgroup$Lukas Lang– Lukas Lang2025年03月08日 09:09:08 +00:00Commented Mar 8 at 9:09
Updated: as user1066 commented, my previous solution was based on incorrect understanding of the question. Here is the corrected solution:
Flatten[Map[ReplaceList[{f_, #} :> {f, #2} & @@@ list2], list1], 1]
(*
{{a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}, {b, dc}, {b, rt}, {b,
ö}, {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, {d,
ay}, {d, ül}}
*)
-
$\begingroup$ @user1066 indeed, you are right. Completely rewritten the answer to hopefully meet the requirement. $\endgroup$Roma Lee– Roma Lee2025年03月08日 15:36:22 +00:00Commented Mar 8 at 15:36
Explore related questions
See similar questions with these tags.
{b, dc}? $\endgroup$