8
$\begingroup$

I have two large lists:

list1 = {{a,v},{a,h},{b,v},{b,k},{c,t},{c,r},{d,r}};
list2 = {{v,dc},{v,rt},{h,kl},{h,oi},{h,po},{k,ö},{k,dc},{t,re},{r,qw},{r,ay},{r,ül}};

Now I want a list that matches both lists so that the first element of each row in list1 is matched with the second element of each row in list 2 based on the same values of second element of each row in list1 and the first element of each row in list2. For example list3 should yield

{{a,dc},{a,rt},{a,kl},{a,oi},{a,po},...}

I have developed a very clumsy (and very very slow) solution with nested tables so I am sure there is a faster one? I have also tried something with Associations but this will either lead to a kernel crash or loss of data (as duplicate keys will be removed). Thank you

creidhne
5,8407 gold badges26 silver badges34 bronze badges
asked Mar 5 at 17:31
$\endgroup$
3
  • 4
    $\begingroup$ Purely for reference, this is called composition of relations or relative multiplication. $\endgroup$ Commented Mar 7 at 10:28
  • $\begingroup$ What is your position on duplicate values, such as {b, dc}? $\endgroup$ Commented Mar 7 at 16:05
  • $\begingroup$ @user1066 in this case this is not duplicate - it should be interpreted as an edge list (e.g. {a, dc} and {b, dc}) Or what do you mean? see also the interesting solution by yarchik below $\endgroup$ Commented Mar 8 at 13:03

9 Answers 9

6
$\begingroup$

Using a connection with graphs:

gr = Graph[Join[DirectedEdge @@@ list1, DirectedEdge @@@ list2], 
 VertexLabels -> "Name"]

Graph from relations

Build a list using VertexOutComponent:

 Flatten[Cases[VertexOutComponent[gr, #, {2}], val_ -> {#, val}] & /@ VertexList[gr], 1]
 (* {{a, rt}, {a, kl}, {a, dc}, {a, oi}, {a, po}, {b, ö}, {b, rt}, {b, dc}, 
 {c, qw}, {c, ül}, {c, re}, {c, ay}, {d, qw}, {d, ül}, {d, ay}} *)

Faster version would be

Cases[VertexOutComponent[gr, #, {2}], val_ -> {#, val}] & /@ DeleteDuplicates[list1[[All, 1]]] // Catenate

Timing study

Below I report timing results for all codes presented here. The time-measuring routine:

timingTest[n_, p_, merge_] := Module[{a, l, m = Round[p*n], res},
 Do[
 l[k] = 
 Transpose[{n*(k - 1) + RandomInteger[{1, n}, m], 
 n*k + RandomInteger[{1, n}, m]}], {k, 2}];
 Timing[res = merge[l[1], l[2]];] // First
 ]

Collection of computational routines:

mergeY[list1_, list2_] := Module[{gr, ivert},
 ivert = DeleteDuplicates[list1[[All, 1]]];
 gr = Graph[Join[DirectedEdge @@@ list1, DirectedEdge @@@ list2]];
 Catenate[
 Cases[VertexOutComponent[gr, #, {2}], val_ -> {#, val}] & /@ 
 ivert]
 ]
mergeL[list1_, list2_] := Module[{},
 Cases[Tuples[{list1, list2}], {{a_, x_}, {x_, b_}} :> {a, b}]
 ]
mergeK[list1_, list2_] := Module[{},
 Catenate[
 Values[Merge[{GroupBy[list1, Last -> First], 
 GroupBy[list2, First -> Last]}, Tuples]]]
 ]
mergeM[list1_, list2_] := Module[{},
 Outer[If[#1[[2]] === #2[[1]], {#1[[1]], #2[[2]]}, Nothing] &, list1,
 list2, 1]~Flatten~1 // Sort
 ]
mergeU[list1_, list2_] := Module[{},
 Thread[{#1, ReplaceList[#2, Rule @@@ list2]}] & @@@ list1 // Catenate
 ]
mergeV[list1_, list2_] := Module[{la1, la2},
 la1 = Map[AssociationThread]@Tuples[{{1, 0}} -> list1];
 la2 = Map[AssociationThread]@Tuples[{{0, 2}} -> list2];
 Map[Values@*KeyDrop[0]]@JoinAcross[la1, la2, Key[0], "Inner"]
 ]
mergeR[list1_, list2_] := Module[{},
 Flatten[Map[ReplaceList[{f_, #} :> {f, #2} & @@@ list2], list1], 1]
 ]
mergeLL[list1_, list2_] := Module[{},
 Catenate[
 Thread /@ 
 MapAt[Lookup[GroupBy[list2, First -> Last], #, {}] &, 
 list1, {All, 2}]]
 ]

Running tests:

nmn = 6;
nmx = 13;
pval = 0.999;
tY = Table[{2^n, timingTest[2^n, pval, mergeY]}, {n, nmn, nmx}]
tL = Table[{2^n, timingTest[2^n, pval, mergeL]}, {n, nmn, nmx}]
tM = Table[{2^n, timingTest[2^n, pval, mergeM]}, {n, nmn, nmx}]
tK = Table[{2^n, timingTest[2^n, pval, mergeK]}, {n, nmn, nmx}]
tU = Table[{2^n, timingTest[2^n, pval, mergeU]}, {n, nmn, nmx}]
tV = Table[{2^n, timingTest[2^n, pval, mergeV]}, {n, nmn, nmx}]
tR = Table[{2^n, timingTest[2^n, pval, mergeR]}, {n, nmn, nmx}]
tLL = Table[{2^n, timingTest[2^n, pval, mergeLL]}, {n, nmn, nmx}]

Plot:

ListLogLogPlot[{tY, tL, tM, tK, tU, tV, tR, tLL}, Joined -> True, 
 PlotLabels -> {"yarchik", "lericr", "march", "Vitaliy Kaurov", 
 "user1066", "vindobona", "Roma Lee", "Lukas Lang"}]

Timing results

Conclusion

It seems, @VitaliyKaurov code is the fastest, however, it sometimes returns incorrect results. Would be nice if it can be fixed and compared. Next in speed is the code of @vindobona using JoinAcross and the 3rd place goes to the present code using VertexOutComponent. The code of vindobona seems to demonstrate a consistent linear scaling, whereas the majority of codes demonstrate quadratic scaling. According to the documentation, VertexOutComponent "works with large graphs", but probably this is not enough as the following larger study shows.

nmn = 6;
nmx = 16;
pval = 0.999;
tY = Table[{2^n, timingTest[2^n, pval, mergeY]}, {n, nmn, nmx, 2}]
tV = Table[{2^n, timingTest[2^n, pval, mergeV]}, {n, nmn, nmx, 2}]
ListLogLogPlot[{tY, tV}, Joined -> True, 
 PlotLabels -> {"yarchik", "vindobona"}]

enter image description here

Function with progress indicator

mergeY[list1_, list2_] := Module[{gr, ivert, res},
 ivert = DeleteDuplicates[list1[[All, 1]]];
 gr = Graph[Join[DirectedEdge @@@ list1, DirectedEdge @@@ list2]];
 res = Monitor[
 Table[Cases[VertexOutComponent[gr, ivert[[i]], {2}], 
 val_ -> {ivert[[i]], val}], {i, Length[ivert]}], 
 Row[{ProgressIndicator[i, {1, Length[ivert]}], i}, " "]];
 Catenate[res]
 ]
answered Mar 7 at 12:18
$\endgroup$
8
  • 1
    $\begingroup$ Actually this is very nice as my data is basically from an edgelist but I want to "drop" the middle level $\endgroup$ Commented Mar 8 at 13:06
  • $\begingroup$ While this may not pertain to the original poster’s specific use case, in general, given list1 = {{0, 1}} and list2 = {{0, 42}, {1, 42}}, I would have expected the output to be {{0, 42}} rather than {}. $\endgroup$ Commented Mar 9 at 17:41
  • 1
    $\begingroup$ @vindobona Indeed, my assumption was that list1[[;;,1]] and list1[[;;,2]] are distinct. If this is not the case, it is easy to fix: Cases[Intersection[VertexOutComponent[gr, #, 2], list2[[All, 2]]], val_ -> {#, val}] & /@ DeleteDuplicates[list1[[All, 1]]] // Catenate $\endgroup$ Commented Mar 9 at 19:08
  • 1
    $\begingroup$ @M.A. I compared, and it is one of the fastest solutions. I will add timing study to my post. $\endgroup$ Commented Mar 12 at 6:49
  • 1
    $\begingroup$ @M.A. Yes, please see my edit $\endgroup$ Commented Mar 14 at 9:54
6
$\begingroup$

Perhaps this:

list1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
list2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k, 
 dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
list3 = Catenate[Values[
 Merge[{GroupBy[list1, Last -> First], GroupBy[list2, First -> Last]},
 Tuples]
 ]]

Out[]= {{a, dc}, {a, rt}, {b, dc}, {b, rt}, {a, kl}, {a, oi}, {a, po}, {b, ö}, {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, {d, ay}, {d, ül}}

answered Mar 5 at 17:45
$\endgroup$
3
  • $\begingroup$ Consider list1={{1, 12}, {2, 13}} and list2={{13, 24}}. Your code returns {{1}, {2, 24}}. I think it is not correct. $\endgroup$ Commented Mar 7 at 20:13
  • $\begingroup$ I included your code in the timing study. Your code is the winner. But do you know how to amend it to work on corner cases? $\endgroup$ Commented Mar 12 at 8:39
  • $\begingroup$ @yarchik i should find a moment to look into it. hope to do it soon. $\endgroup$ Commented Mar 13 at 5:57
6
$\begingroup$

Maybe something like

Tuples[{list1, list2}] /. {{{a_, x_}, {x_, b_}} :> {a, b}, {_List, _List} -> Nothing}
(* {{a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}, {b, dc}, {b, rt}, {b, ö}, 
 {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, {d, ay}, {d, ül}} *)

Alternatively (based on comments):

Cases[Tuples[{list1, list2}], {{a_, x_}, {x_, b_}} :> {a, b}]
answered Mar 5 at 18:22
$\endgroup$
3
  • $\begingroup$ Could also use Cases, then you don't need the second rule $\endgroup$ Commented Mar 6 at 8:38
  • $\begingroup$ @LukasLang oh yeah, i should have thought of that! $\endgroup$ Commented Mar 6 at 15:04
  • $\begingroup$ Consider list1 = {{1, 12}, {2, 13}}; list2 = {{13, 24}}. Your code returns Nothing, which is not correct. $\endgroup$ Commented Mar 7 at 20:18
4
$\begingroup$

Comparing every element of list1 to every element of list sounds like a job for Outer:

Outer[If[#1[[2]] === #2[[1]], {#1[[1]], #2[[2]]}, Nothing] &, list1, list2, 1]~Flatten~1 // Sort
answered Mar 6 at 5:25
$\endgroup$
3
$\begingroup$

Using JoinAcross :

list1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
list2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k, dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
la1 = Map[AssociationThread]@Tuples[{{1, 0}} -> list1];
la2 = Map[AssociationThread]@Tuples[{{0, 2}} -> list2];
Map[Values@*KeyDrop[0]]@JoinAcross[la1, la2, Key[0], "Inner"] 
(*
{ {a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}
, {b, dc}, {b, rt}, {b, ö}, {b, dc}
, {c, re}, {c, qw}, {c, ay}, {c, ül}
, {d, qw}, {d, ay}, {d, ül} }
*)
answered Mar 8 at 1:42
$\endgroup$
2
$\begingroup$
Thread[{#1, ReplaceList[#2, Rule @@@ list2]}] & @@@ list1 // Catenate
(* {{a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}, {b, dc}, {b, rt}, 
 {b, ö}, {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, 
 {d, ay}, {d, ül}} *)
answered Mar 6 at 19:56
$\endgroup$
1
$\begingroup$

I have developed a very clumsy (and very very slow) solution with nested tables so

You must have done something wrong then. Do loop is very fast in Mathematica. Here it is. Comparing to other solutions it did OK. It is n^2 order where n is length of of the list (assuming roughly equal size lists).

If you have much larger lists to try it on, feel free to do that and compare or post them. The timings below is not really representative as your lists are too small, just gives one a rough idea.

Also, A Do loop is clear as any one can read a loop :)

L1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
L2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k, 
 dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};

Now

RepeatedTiming@First@Last@Reap@Do[
 currentL1=L1[[n]];
 Do[ currentL2=L2[[m]];
 If[Last@currentL1===First@currentL2,
 Sow[{First@currentL1,Last@currentL2}]
 ]
 ,{m,1,Length[L2]}
 ]
 ,{n,1,Length[L1]}
]

enter image description here


RepeatedTiming@
 Outer[If[#1[[2]] === #2[[1]], {#1[[1]], #2[[2]]}, Nothing] &, L1, 
 L2, 1]~Flatten~1 // Sort

enter image description here


RepeatedTiming@
 Tuples[{L1, L2}] /. {{{a_, x_}, {x_, b_}} :> {a, 
 b}, {_List, _List} -> Nothing}

enter image description here


RepeatedTiming@
 Catenate[
 Values[Merge[{GroupBy[L1, Last -> First], GroupBy[L2, First -> Last]},
 Tuples]]]

enter image description here


RepeatedTiming[
 Thread /@ MapAt[GroupBy[L2, First -> Last], L1, {All, 2}]]

enter image description here

answered Mar 6 at 13:38
$\endgroup$
3
  • $\begingroup$ I used simply a combination of Tablecombined with MemberQ and If. I would feel somewhat uncomfortable posting such a dilettantish solution.... ;) $\endgroup$ Commented Mar 6 at 14:19
  • $\begingroup$ I believe it is decisive to compare scaling with the list size, one point does not say much about the performance of the methods. $\endgroup$ Commented Mar 7 at 16:49
  • $\begingroup$ Consider list1 = {{1, 12}}; list2 = {{13, 24}}. Your code returns First[{}], which is not correct. $\endgroup$ Commented Mar 7 at 20:26
1
$\begingroup$
list1 = {{a, v}, {a, h}, {b, v}, {b, k}, {c, t}, {c, r}, {d, r}};
list2 = {{v, dc}, {v, rt}, {h, kl}, {h, oi}, {h, po}, {k, ö}, {k, 
 dc}, {t, re}, {r, qw}, {r, ay}, {r, ül}};
Catenate[
 Thread /@ 
 MapAt[
 Lookup[GroupBy[list2, First -> Last], #, {}] &, 
 list1, 
 {All, 2}
 ]
]

This builds a lookup table using GroupBy (same as @Vitaliy Kaurov's answer), and then uses MapAt and Lookup (with {} as the default, thanks @yarchik!) to get the items corresponding to the second elements of the items in list1. Finally, Thread is used to expand the elements into separate entries in the result. Finally, Catenate is used to flatten the outermost level of the list.

answered Mar 6 at 6:22
$\endgroup$
2
  • $\begingroup$ Consider list1={{1, 12}, {2, 13}} and list2={{13, 24}}. Your code returns {{1, Missing["KeyAbsent", 12]}, {{2, 24}}}, which is not correct. $\endgroup$ Commented Mar 7 at 20:15
  • $\begingroup$ @yarchik Good catch! Should be fixed now $\endgroup$ Commented Mar 8 at 9:09
1
$\begingroup$

Updated: as user1066 commented, my previous solution was based on incorrect understanding of the question. Here is the corrected solution:

Flatten[Map[ReplaceList[{f_, #} :> {f, #2} & @@@ list2], list1], 1]
(*
{{a, dc}, {a, rt}, {a, kl}, {a, oi}, {a, po}, {b, dc}, {b, rt}, {b, 
 ö}, {b, dc}, {c, re}, {c, qw}, {c, ay}, {c, ül}, {d, qw}, {d, 
 ay}, {d, ül}}
*)
answered Mar 8 at 14:21
$\endgroup$
1
  • $\begingroup$ @user1066 indeed, you are right. Completely rewritten the answer to hopefully meet the requirement. $\endgroup$ Commented Mar 8 at 15:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.