I have a list of CSV files (all have the same fields), some of which are empty. My goal is to find the first nonempty file and use that as the "base" file for an aggregation. After finding the base file, the way I will consume the rest of the files changes, so I need to maintain the index of the file. Here is what I currently have:
def f(list_of_files):
aggregated_file = ""
nonempty_index = -1
for index, file in enumerate(list_of_files):
aggregated_file = extract_header_and_body(file)
if aggregated_file:
nonempty_index = index
break
if nonempty_index != -1:
rest_of_files = list_of_files[nonempty_index:]
for file in rest_of_files:
aggregated_file += extract_body(file)
return aggregated_file
Both extract_header_and_body
and extract_body
return string representations (with and without column names, respectively) of the CSV files -- if the file is empty, the empty string is returned. This seems like a clunky solution, especially for Python. Is there a more concise/readable way to accomplish this?
1 Answer 1
If we reformulate this as trying to extract the header + body of the first non-empty file and the body of all subsequent files this comes to mind:
aggregated_file = ''
for file in list_of_files:
if aggregated_file:
aggregated_file += extract_body(file)
else:
aggregated_file += extract_header_and_body(file)
extract_header_and_body()
andextract_body()
as well: we may be able to give you better advice in that case.) \$\endgroup\$