I have a array contains file names like below:
['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png', ....]
I want to quickly group these files into multiple arrays like this:
[['001_1.png', '001_2.png', '001_3.png'], ['002_1.png', '002_2.png'], ['003_1.png', '003_2.png', '003_3.png', '003_4.png'], ...]
Could anyone tell me how to do it in few lines in python?
-
2In your desired output, should the third element be 001_3.png?tda– tda2018年05月04日 07:37:15 +00:00Commented May 4, 2018 at 7:37
-
Is it always like this, I mean ordered ?BcK– BcK2018年05月04日 07:37:20 +00:00Commented May 4, 2018 at 7:37
-
The third one should be 001_3.png, right?Vafliik– Vafliik2018年05月04日 07:37:42 +00:00Commented May 4, 2018 at 7:37
6 Answers 6
If your data is already sorted by the file name, you can use itertools.groupby:
files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
'003_1.png', '003_2.png', '003_3.png']
import itertools
keyfunc = lambda filename: filename[:3]
# this creates an iterator that yields `(group, filenames)` tuples,
# but `filenames` is another iterator
grouper = itertools.groupby(files, keyfunc)
# to get the result as a nested list, we iterate over the grouper to
# discard the groups and turn the `filenames` iterators into lists
result = [list(files) for _, files in grouper]
print(list(result))
# [['001_1.png', '001_2.png', '001_3.png'],
# ['002_1.png', '002_2.png'],
# ['003_1.png', '003_2.png', '003_3.png']]
Otherwise, you can base your code on this recipe, which is more efficient than sorting the list and then using groupby.
Input: Your input is a flat list, so use a regular ol' loop to iterate over it:
for filename in files:Group identifier: The files are grouped by the first 3 letters:
group = filename[:3]Output: The output should be a nested list rather than a dict, which can be done with
result = list(groupdict.values())
Putting it together:
files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
'003_1.png', '003_2.png', '003_3.png']
import collections
groupdict = collections.defaultdict(list)
for filename in files:
group = filename[:3]
groupdict[group].append(filename)
result = list(groupdict.values())
print(result)
# [['001_1.png', '001_2.png', '001_3.png'],
# ['002_1.png', '002_2.png'],
# ['003_1.png', '003_2.png', '003_3.png']]
Read the recipe answer for more details.
2 Comments
Something like that should work:
import itertools
mylist = [...]
[list(v) for k,v in itertools.groupby(mylist, key=lambda x: x[:3])]
If input list isn't sorted, than use something like that:
import itertools
mylist = [...]
keyfunc = lambda x:x[:3]
mylist = sorted(mylist, key=keyfunc)
[list(v) for k,v in itertools.groupby(mylist, key=keyfunc)]
Comments
You can do it using a dictionary.
list = ['001_1.png', '001_2.png', '003_3.png', '002_1.png', '002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
dict = {}
for item in list:
if item[:3] not in dict:
dict[item[:3]] = []
dict[item[:3]].append(item)
Then you have to sort the dictionary by key value.
dict = {k:v for k,v in sorted(dict.items())}
The last step is to use a list comprehension in order to achieve your requirement.
list = [v for k,v in dict.items()]
print(list)
Output
[['001_1.png', '001_2.png'], ['002_1.png', '002_2.png'], ['003_3.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']]
Comments
Using a simple iteration and dictionary.
Ex:
l = ['001_1.png', ' 001_2.png', ' 003_3.png', ' 002_1.png', ' 002_2.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png']
r = {}
for i in l:
v = i.split("_")[0][-1]
if v not in r:
r[v] = []
r[v].append(i)
print(r.values())
Output:
[['001_1.png', ' 001_2.png'], [' 003_3.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png'], [' 002_1.png', ' 002_2.png']]
Comments
If your list is ordered like this here is a short script for this task.
myList = []
for i in a:
if i[:-4].endswith('1'):
myList.append([i])
else:
myList[-1].append(i)
# [['001_1.png', '001_2.png', '003_3.png'], ['002_1.png', '002_2.png'], ...]
Comments
#IYN
mini_list = []
p = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
new_p = []
for index, element in enumerate(p):
if index == len(p)-1:
mini_list.append(element)
new_p.append(mini_list)
break
if element[0:3]==p[index+1][0:3]:
mini_list.append(element)
else:
mini_list.append(element)
new_p.append(mini_list)
mini_list = []
print (new_p)
The code above will cut the initial list into sub lists and append them as individual lists into a resulting, larger list. Note: not a few lines, but you can convert this to a function.
def list_cutter(ls):
mini_list = []
new_list = []
for index, element in enumerate(ls):
if index == len(ls)-1:
mini_list.append(element)
new_list.append(mini_list)
break
if element[0:3]==ls[index+1][0:3]:
mini_list.append(element)
else:
mini_list.append(element)
new_list.append(mini_list)
mini_list = []
return new_list