I have a list of lists like this: [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [2, 3], [3, 4]]
. How can I count the lists which are sublists of more than two lists? For example, here [2, 3] and [3, 4]
would be the lists that are sublists of first 3 lists. I want to get rid of them.
-
What have you tried, and what problems did you encounter?Thierry Lathuille– Thierry Lathuille2018年02月23日 11:34:38 +00:00Commented Feb 23, 2018 at 11:34
3 Answers 3
This comprehension should do it:
data = [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [2, 3], [3, 4]]
solution = [i for i in data if sum([1 for j in data if set(i).issubset(set(j))]) < 3]
Comments
set_list = [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [2, 3], [3, 4]]
check_list = [[2, 3], [3, 4]]
sublist_to_list = {}
for set in set_list:
for i, sublist in enumerate(check_list):
count = 0
for element in sublist:
if element in set:
count += 1
if count == len(sublist):
if i not in sublist_to_list:
sublist_to_list[i] = [set]
else:
sublist_to_list[i].append(set)
print(sublist_to_list)
Output: {0: [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [2, 3]], 1: [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [3, 4]]}
- which means [2, 3] is subset of [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [2, 3]]
- and [3, 4] is subset of [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [3, 4]]
Comments
You can first make a function that gets sub lists of a list:
def sublists(lst):
length = len(lst)
for size in range(1, length + 1):
for start in range(length - size + 1):
yield lst[start:start+size]
Which works as follows:
>>> list(sublists([1, 2, 3, 4, 5]))
[[1], [2], [3], [4], [5], [1, 2], [2, 3], [3, 4], [4, 5], [1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3, 4], [2, 3, 4, 5], [1, 2, 3, 4, 5]]
Then you can use this to collect all the sublists list indices into a collections.defaultdict
:
from collections import defaultdict
lsts = [[1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 2, 3, 4, 5], [2, 3, 4, 5, 6, 7], [2, 3], [3, 4]]
d = defaultdict(list)
for i, lst in enumerate(lsts):
subs = sublists(lst)
while True:
try:
curr = tuple(next(subs))
d[curr].append(i)
except StopIteration:
break
Which will have tuple keys for the sublists, and the list indices as the values.
Then to determine sub lists that occur more than twice in all the lists, you can check if the set of all the indices has a length of more than two:
print([list(k) for k, v in d.items() if len(set(v)) > 2])
Which will give the following sublists:
[[2], [3], [4], [5], [2, 3], [3, 4], [4, 5], [2, 3, 4], [3, 4, 5], [2, 3, 4, 5]]