condensing a list of lists with 'description' lines in Python

Question 1

I have a long list of data pulled in from a csv file that has comments / descriptions scattered throughout the document. I'd like to collapse the descriptions into the last element of each row of data, and my current solution seems slow and overly complex. Rows with data have values in the first index and description rows are empty strings. This gives me a list of lists like so:

data = [
 ['a', 'this', 'is', 'data', 1, 2, 3, ''],
 ['', '', '', 'this is a description', '', '', '', ''],
 ['', '', '', 'that carries onto two lines', '', '', '', ''],
 ['another', 'row', 'with', 'data', 0, 3, 1, ''],
 ['', '', '', 'this is a description', '', '', '', ''],
 ['', '', '', 'that carries onto three lines', '', '', '', ''],
 ['', '', '', 'like so', '', '', '', ''],
 ['data', 'with', 'no', 'description', 9, 2, 0, ''],
 ['b', 'this', 'is', 'data', 1, 2, 3, ''],
 ['', '', '', '', 'sometimes the description', 'is offset', '', '']
]

Here's what I'd like to see:

desired_data = [
 ['a', 'this', 'is', 'data', 1, 2, 3,
 'this is a description that carries onto two lines'],
 ['another', 'row', 'with', 'data', 0, 3, 1,
 'this is a description that carries onto three lines like so'],
 ['data', 'with', 'no', 'description', 9, 2, 0, None],
 ['b', 'this', 'is', 'data', 1, 2, 3, 'sometimes the description is offset']
]

My current solution requires creating whole new lists, one with data and one with descriptions, and then iterating over the data and searching through the descriptions each iteration:

# get numbered rows with data (first element is not '')
actual_data = [(i, x) for i, x in enumerate(data) if x[0] != '']
# get numbered rows with descriptions (first element is '')
descriptions = [(i, ' '.join([j for j in x if j != '']))
 for i, x in enumerate(data) if x[0] == '']
# get just the indices of the rows with descriptions
description_indices = {i[0] for i in descriptions}
desired_data_attempt = []
for d in actual_data:
 # get data to insert
 x = d[1][:]
 # get first index to check
 n = d[0] + 1
 description = []
 while n in description_indices:
 # keep adding consecutive descriptions
 description.append([i[1] for i in descriptions if i[0] == n][0])
 n += 1
 # set empty descriptions to None; othewise create one string from list
 if description == []:
 x[-1] = None
 else:
 x[-1] = ' '.join(description)
 # insert data with description
 desired_data_attempt.append(x)
assert desired_data_attempt == desired_data

Ideally, I'd love to be able to construct this new object in one pass through the original data. Any suggestions would be appreciated!

Question 2

You can do this in one go, just keep a data piece you are going to yield and list of comment items. If you encounter a new comment item - append to list, if you encounter a ata piece - yield one you were holding with all the comments, wipe comments and hold a new data item instead, like this:

def is_comment(item):
 return item[0] == ''
def nonempty(item):
 return [x for x in item if x]
def fuse_comments(data):
 result = data[0]
 comments = []
 for element in data[1:]:
 if is_comment(element):
 comments += nonempty(element)
 else:
 yield result + [' '.join(str(comment) for comment in comments)]
 comments = []
 result = nonempty(element)
 yield result + [' '.join(str(comment) for comment in comments)]
from pprint import pprint
pprint(list(fuse_comments(data)))

Question 3

That's much better, thanks! I made a few tweaks so as to match the desired output exactly (values of 0 in the data were being omitted by nonempty() and empty comments weren't returning None).

Daerdemandt DaerdemandtDaerdemandt 1,0316 silver badges8 bronze badges · Accepted Answer · 2016-07-14 15:05:41Z

You can do this in one go, just keep a data piece you are going to yield and list of comment items. If you encounter a new comment item - append to list, if you encounter a ata piece - yield one you were holding with all the comments, wipe comments and hold a new data item instead, like this:

def is_comment(item):
 return item[0] == ''
def nonempty(item):
 return [x for x in item if x]
def fuse_comments(data):
 result = data[0]
 comments = []
 for element in data[1:]:
 if is_comment(element):
 comments += nonempty(element)
 else:
 yield result + [' '.join(str(comment) for comment in comments)]
 comments = []
 result = nonempty(element)
 yield result + [' '.join(str(comment) for comment in comments)]
from pprint import pprint
pprint(list(fuse_comments(data)))

That's much better, thanks! I made a few tweaks so as to match the desired output exactly (values of 0 in the data were being omitted by nonempty() and empty comments weren't returning None).

Stack Exchange Network

condensing a list of lists with 'description' lines in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

condensing a list of lists with 'description' lines in Python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions