Grouping comma-separated lines together

Question 1

I have comma-delimited files like these, where the first field is sorted in increasing order:

Case 1 ( 1st file ) :

abcd,1
abcd,21
abcd,122
abce,12
abcf,13
abcf,21

Case 2 ( and another file like this ) :

abcd,1
abcd,21
abcd,122

What I want to do is convert the first file to like this :

abcd 1,21,122
abce 12
abcf 13,21

And similarly, for the second file like this :

abcd 1,21,122

Now, I wrote a very ugly code with a lot of if's to check whether the next line's string before the comma is same as current line's string so, if it is then do ....

It's so badly written that, I wrote it myself around 6 months back and it took me around 3-4 minutes to understand why I did what I did in this code. Well in short it's ugly, in case you would like to see, here it is ( also there's a bug currently in here and since I needed a better way than this whole code so I didn't sort it out, for the curious folks out there the bug is that it doesn't print anything for the second case mentioned above and I know why ).

def clean_file(filePath, destination):
 f = open(filePath, 'r')
 data = f.read()
 f.close()
 curr_string = current_number = next_string = next_number = ""
 current_numbers = ""
 final_payload = ""
 lines = data.split('\n')[:-1]
 for i in range(len(lines)-1):
 print(lines[i])
 curr_line = lines[i]
 next_line = lines[i+1]
 curr_string, current_number = curr_line.split(',')
 next_string, next_number = next_line.split(',')
 if curr_string == next_string:
 current_numbers += current_number + ","
 else:
 current_numbers += current_number # check to avoid ',' in the end
 final_payload += curr_string + " " + current_numbers + "\n"
 current_numbers = ""
 print(final_payload)
 # For last line
 if curr_string != next_string:
 # Directly add it to the final_payload
 final_payload += next_line + "\n"
 else:
 # Remove the newline, add a comma and then finally add a newline
 final_payload = final_payload[:-1] + ","+next_number+"\n"
 with open(destination, 'a') as f:
 f.write(final_payload)

Any better solutions?

Question 2

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 3

To solve the grouping problem, use itertools.groupby.
To read files with comma-separated fields, use the csv module.
In almost all cases, open() should be called using a with block, so that the files will be automatically closed for you, even if an exception occurs within the block:
```
with open(file_path) as in_f, open(destination, 'w') as out_f:
 data = csv.reader(in_f)
 # code goes here
```
filePath violates Python's official style guide, which recommends underscores, like your curr_line.

Question 4

While @200_success's answer is very good (always use libraries that solve your problem), I'm going to give an answer that illustrates how to think about more general problems in case there isn't a perfect library.

Use `with` to automatically close files when you're done

You risk leaving a file open if an exception is raised and file.close() is never called.

with open(input_file) as in_file:

Use the object to iterate, not indices

Most collections and objects can be iterated over directly, so you don't need indices

with open(input_file) as in_file:
 for line in in_file:
 line = line.strip() # get rid of '\n' at end of line

Use data structures to organize your data

In the end, you want to associate a letter-string with a list of numbers. In python, a dict allows you to associate any piece of data with any other, so we'll use that to associate the letter-strings with a list of numbers.

with open(input_file) as in_file:
 data = dict()
 for line in in_file:
 line = line.strip() # get rid of '\n' at end of line
 letters, numbers = line.split(',')
 data[letters].append(numbers)

Now, this doesn't quite work since, if a letters entry hasn't been seen yet, the call to data[letters] won't have anything to return and will raise a KeyError exception. So, we have to account for that

with open(input_file) as in_file:
 data = dict()
 for line in in_file:
 line = line.strip() # get rid of '\n' at end of line
 letters, number = line.split(',')
 try: # there might be an error
 data[letters].append(number) # append new number if letters has been seen before
 except KeyError:
 data[letters] = [number] # create new list with one number for a new letter-string

Now, all of the file is stored in a convenient form in the data object. To output, just loop through the data

with open(input_file) as in_file:
 data = dict()
 for line in in_file:
 line = line.strip() # get rid of '\n' at end of line
 letters, number = line.split(',')
 try: # there might be an error
 data[letters].append(number) # append new number if letters has been seen before
 except KeyError:
 data[letters] = [number] # create new list with one number for a new letter-string
with open(output_file, 'w') as out_file:
 for letters, number_list in data.items(): # iterate over all entries
 out_file.write(letters + ' ' + ','.join(number_list) + '\n')

The .join() method creates a string from a list such that the entries of the list are separated by the string that precedes it--',' in this case.

Question 5

Instead of trying to append and catching the error, you can use setdefault: data.setdefault(letters, []).append(number)

Question 6

@ToddSewell Neat! That'll be useful in the future.

Question 7

Or use collections.defaultdict of course.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Accepted Answer · 2019-01-13 18:00:33Z

To solve the grouping problem, use itertools.groupby.
To read files with comma-separated fields, use the csv module.
In almost all cases, open() should be called using a with block, so that the files will be automatically closed for you, even if an exception occurs within the block:
```
with open(file_path) as in_f, open(destination, 'w') as out_f:
 data = csv.reader(in_f)
 # code goes here
```
filePath violates Python's official style guide, which recommends underscores, like your curr_line.

Stack Exchange Network

Grouping comma-separated lines together

Case 1 ( 1st file ) :

Case 2 ( and another file like this ) :

2 Answers 2

Use `with` to automatically close files when you're done

Use the object to iterate, not indices

Use data structures to organize your data

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Grouping comma-separated lines together

Case 1 ( 1st file ) :

Case 2 ( and another file like this ) :

2 Answers 2

Use with to automatically close files when you're done

Use the object to iterate, not indices

Use data structures to organize your data

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `with` to automatically close files when you're done