2

I have the following python list:

['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv', 'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv', 'daman_and_diu_2002_aa.csv']

How do I separate it into 2 lists:

['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv'] and ['daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv', 'daman_and_diu_2002_aa.csv']

The lists are split based on the words preceeding the year i.e. 2000...

I know I should use regex in python but not sure how to do it. Also, the solution needs to be extensible and not dependent on actual names e.g. chattisgarh

Blorgbeard
104k50 gold badges237 silver badges276 bronze badges
asked Jun 19, 2016 at 22:50
5
  • thanks @RoryDaulton, the elements are strings. Updated my question to reflect that Commented Jun 19, 2016 at 22:56
  • Could you do it based on the text before the first _? like using name.partition("_")[0] to compare titles? This wouldn't work if you had titles like 'foo_bar_2000' vs 'foo_foo_2000' though. Commented Jun 19, 2016 at 22:57
  • doesn't work since different list elements can have different number of _s Commented Jun 19, 2016 at 22:58
  • Are you sure the year contains the first numeric character in each list? Commented Jun 19, 2016 at 22:59
  • yes, the year contains the first and only numeric character in the list Commented Jun 19, 2016 at 22:59

3 Answers 3

5

You can use itertools.groupby here:

import itertools
import re
list = ['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv',
 'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
 'daman_and_diu_2002_aa.csv']
grouped = itertools.groupby(sorted(list), lambda x: re.match('(.+)_\d{4}', x).group(1)) 
for (key, values) in grouped:
 print(key)
 print([x for x in values])

The regex (.+)_\d{4} matches a group of at least one character (which is what we group by) followed by an underscore and 4 digits.

answered Jun 19, 2016 at 23:16

Comments

4

Here is one way to get a dictionary, where for each "name" key the value is a list of the strings starting with that name, keeping the order of the original list. This does not use regex and in fact uses no modules at all. You can easily modify this to make a function, remove the trailing underscore from each name, checking for various errors in the data list, getting the resulting lists out of the dictionary, and so on.

If you allow other modules, or allow changes in the order, I'm sure there are other ways.

a = ['chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv',
 'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
 'daman_and_diu_2002_aa.csv']
names_dict = {}
for item in a:
 # Find the first numeric character in the item
 for i, c in enumerate(item):
 if c.isdigit():
 break
 # Store the string in the dictionary according to its preceding characters
 name = item[:i]
 if names_dict.get(name, None):
 names_dict[name].append(item)
 else:
 names_dict[name] = [item]
print(names_dict)

The result of this code (prettified) is

{'daman_and_diu_': [
 'daman_and_diu_2000_aa.csv', 'daman_and_diu_2001_aa.csv',
 'daman_and_diu_2002_aa.csv'],
 'chhattisgarh_': [
 'chhattisgarh_2015_aa.csv', 'chhattisgarh_2016_aa.csv']
}
answered Jun 19, 2016 at 23:16

Comments

2

Another option to use regular expression combined with dictionary:

files = ["chhattisgarh_2015_aa.csv", "chhattisgarh_2016_aa.csv", "daman_and_diu_2000_aa.csv", "daman_and_diu_2001_aa.csv", "daman_and_diu_2002_aa.csv"]
import re
from collections import defaultdict
groupedFiles = defaultdict(list)
for fileName in files:
 pattern = re.findall("(.*)\\d{4}", fileName)[0]
 groupedFiles[pattern].append(fileName)
groupedFiles
{'chhattisgarh_': ['chhattisgarh_2015_aa.csv',
 'chhattisgarh_2016_aa.csv'],
 'daman_and_diu_': ['daman_and_diu_2000_aa.csv',
 'daman_and_diu_2001_aa.csv',
 'daman_and_diu_2002_aa.csv']}
answered Jun 19, 2016 at 23:20

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.