How may I optimize the following function to write a xls file from a list of dictionaries using the openpyxl library in Python?
The current logic behind feels very very wrong
The dict list:
things = [
{
"Fruit": "Orange",
"Flavour": "Good",
"Expiration": "21May20"
},
{
"Fruit": "Apple",
"Flavour": "Good",
"Expiration": "19May20"
},
{
"Fruit": "Banana",
"Flavour": "Regular",
"Expiration": "16May20"
}
]
The functions that I have:
from openpyxl import Workbook, load_workbook
def create_xls(filepath):
wb = Workbook()
wb.save(filepath)
def write_xls(filepath, dictionary):
wb = load_workbook(filepath)
sheet = wb.active
headers = [x for x in dictionary[0]]
for index, value in enumerate(headers):
sheet.cell(row=1, column=index+1).value = value
for i, x in enumerate(dictionary):
for idx,value in enumerate(x.values()):
sheet.cell(row=i+2, column=idx+1).value = value
wb.save(filepath)
Thank you!
-
1\$\begingroup\$ do you want to have Fruit, Flavor, Expiration as titles and respective values below? or is there something else intended? \$\endgroup\$user203258– user2032582019年09月09日 04:26:10 +00:00Commented Sep 9, 2019 at 4:26
1 Answer 1
From the documentation, you can see that a worksheet object has an .append
method that let you write a row from an iterable at the bottom of said sheet. Documentation from the builtin help
is reproduced here:
Help on method append in module openpyxl.worksheet.worksheet:
append(iterable) method of openpyxl.worksheet.worksheet.Worksheet instance
Appends a group of values at the bottom of the current sheet.
* If it's a list: all values are added in order, starting from the first column
* If it's a dict: values are assigned to the columns indicated by the keys (numbers or letters)
:param iterable: list, range or generator, or dict containing values to append
:type iterable: list|tuple|range|generator or dict
Usage:
* append(['This is A1', 'This is B1', 'This is C1'])
* **or** append({'A' : 'This is A1', 'C' : 'This is C1'})
* **or** append({1 : 'This is A1', 3 : 'This is C1'})
:raise: TypeError when iterable is neither a list/tuple nor a dict
This means that your can sheet.append(headers)
instead of your ugly loop. Similarly, using .values()
on your dictionnaries, you can simplify your write_xls
function to:
def write_xls(filepath, dictionary):
wb = load_workbook(filepath)
sheet = wb.active
headers = list(dictionary[0])
sheet.append(headers)
for x in dictionary:
sheet.append(list(x.values()))
wb.save(filepath)
Now, a few more things to consider.
First off, since you are only interested in creating the file and writing in it, you may be interested in the write-only mode provided by openpyxl
. This mean you will simplify your code to a single function:
def write_xls(filepath, dictionary):
wb = Workbook(write_only=True)
sheet = wb.create_sheet()
headers = list(dictionary[0])
sheet.append(headers)
for x in dictionary:
sheet.append(list(x.values()))
wb.save(filepath)
Second, you relly very much on your data being presented well ordered and without flaws. This might bite you at some point. I would:
- find all possible headers in your dictionnaries and order them;
- use them to recreate each row using the same ordering each time.
This will allow you to have a coherent output, even with inputs such as:
things = [
{
"Fruit": "Orange",
"Flavour": "Good",
"Expiration": "21May20",
},
{
"Flavour": "Good",
"Fruit": "Apple",
"Expiration": "19May20",
},
{
"Flavour": "Regular",
"Expiration": "16May20",
"Fruit": "Banana",
}
]
or even:
things = [
{
"Fruit": "Orange",
"Flavour": "Good",
"Expiration": "21May20"
},
{
"Fruit": "Apple",
"Flavour": "Good",
"Junk": "Spam",
"Expiration": "19May20"
},
{
"Fruit": "Banana",
"Flavour": "Regular",
"Expiration": "16May20"
}
]
Proposed improvements:
import itertools
from openpyxl import Workbook
def write_xls(filename, data):
wb = Workbook(write_only=True)
ws = wb.create_sheet()
headers = list(set(itertools.chain.from_iterable(data)))
ws.append(headers)
for elements in data:
ws.append([elements.get(h) for h in headers])
wb.save(filename)