0
$\begingroup$

I'm working with multiple ordered lists, each containing tuples of the form (key, value), where each list is sorted in ascending order by key. I want to merge these lists into a single ordered list that includes every distinct key from the input lists. The merged output should be a list of tuples in the form:

(key, value_from_list1, value_from_list2, ..., value_from_listK)

with the following behavior:

  • For each key present in the union of all keys, the output tuple should contain the "current" value from each list.
  • If an input list does not have an entry at that key, it should carry forward the most recent value from that list (i.e., use the value from the latest key less than or equal to the current key).

For example, given two lists:

A = [(1, a1), (2, a2), (10, a3)]
B = [(1, b1), (8, b2), (10, b3)]

the desired output is:

C = [
 (1, a1, b1), // At key 1, both lists provide values.
 (2, a2, b1), // At key 2, list A updates to a2; list B still uses b1.
 (8, a2, b2), // At key 8, list B updates to b2; list A remains at a2.
 (10, a3, b3) // At key 10, both lists update.
]

This merging behavior should extend to more than two lists as well (e.g., merging 5 lists into tuples with 6 elements, where the first element is the key).

All input lists are assumed to contain the element (0, null), but we omit this element (i.e. (0, null, null, ..., null)) from the final output. This ensures that, when merging, if a list doesn't have an entry for a given key, it carries forward the most recent value (initially null). For example, merging A = [(1, a1)] and B = [(2, b1)] yields [(1, a1, null), (2, a1, b1)].

My Questions:

  1. Algorithm Efficiency: What is the most efficient algorithm for performing this merge?

I've considered a solution using a sweeping line algorithm with multiple pointers—one for each list—to keep track of the most recent value as I iterate through the union of keys. However, I'm looking for insights or more efficient methods, especially in the general case with k lists.

Edit 2025年03月13日

Pseudocode

I ́ve created pseudocode based on the suggestion from @Smylic.

function mergeLists(lists):
 k = number of lists
 pointers = array of size k, all initialized to 0
 lastValues = array of size k, initially set to None (or some default)
 output = empty list
 while true:
 minKey = None
 // Find the minimum key among the current pointers in all lists.
 for i from 0 to k-1:
 if pointers[i] < length(lists[i]):
 currentKey = lists[i][pointers[i]].key
 if minKey is None or currentKey < minKey:
 minKey = currentKey
 // If no minimum key was found, all lists are exhausted.
 if minKey is None:
 break
 // Update lastValues for each list that has the current minKey
 for i from 0 to k-1:
 if pointers[i] < length(lists[i]) and lists[i][pointers[i]].key == minKey:
 lastValues[i] = lists[i][pointers[i]].value
 pointers[i] = pointers[i] + 1
 // Append the merged tuple (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1])
 output.append( (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1]) )
 return output

asked Mar 12 at 14:55
$\endgroup$
5
  • $\begingroup$ I do not see a significant difference from a "standard $k$-way merge"(en.wikipedia). $\endgroup$ Commented Mar 12 at 18:39
  • $\begingroup$ @greybeard This is similar to a k-way merge but not quite the same. A standard k-way merge only merges existing elements, while here, I need to carry forward the most recent value from each list when a key is missing. This adds a state-tracking requirement, making it a bit more complex. $\endgroup$ Commented Mar 12 at 21:10
  • $\begingroup$ Keeping $k$ values, one for each list? Insignificant difference! A more useful approach may characterise the operations required of the union of lists, and find a way how to support that without needing up to $k$ times the memory. And possibly save time ($o(kn)$). $\endgroup$ Commented Mar 13 at 7:56
  • $\begingroup$ @greybeard Thanks, I've updated with post with some pseudocode $\endgroup$ Commented Mar 13 at 12:11
  • $\begingroup$ Please don't edit your post to include an answer (or implementation of an answer) in the post. We're not a forum, and we work differently from other sites you might be used to. For example, we're not looking for you to edit your post and add "answered: ..." with the answer. Instead, you could edit the answer to add your proposed pseudocode, so it is organized together with the description of the algorithm, in the answer "zone". $\endgroup$ Commented Mar 14 at 20:54

1 Answer 1

2
$\begingroup$

You can't do it faster than linear of the output size. I. e., if you need to output $k$ values for every key, the most efficient algorithm takes $\Theta(k \cdot u)$ of time, where $u$ is the number of unique keys. Supposing that key comparison takes $\mathrm O(1)$ of time, you can even avoid using a heap for getting the next key: just look at every pointer once to find minimum key, then look at every pointer one more time either to get the current value and move pointer, or to get the previous value. However heap of keys may decrease the underlying constant.

answered Mar 13 at 5:10
$\endgroup$
2
  • $\begingroup$ Thanks a lot for your insights! I made pseudocode—please let me know if it's what you had in mind. Its added to my original post. I am not using a heap for it. $\endgroup$ Commented Mar 13 at 11:29
  • $\begingroup$ @Isak Yes, it is what I'm talking about $\endgroup$ Commented Mar 14 at 4:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.