I have written a Python programme that takes data from Excel sheets and adds this data to new .DAT files.
I use win32com.client
because the Excel sheets will already be opened and other modules need to open a file before they can process it.
As the win32com.client
outputs the range as a Component Object Model, it changes the content in some minor but destructive ways. Numbers, such as 2
, are outputted as 2.0
. Moreover, empty cells are outputted as None
.
For this reason I cannot immediately put the cells' values into a file, and instead I use a loop to fill a list during which I edit the data.
This loop, however, is relatively very slow. I'd like a faster method, but I am unsure how to achieve this.
from win32com.client import GetObject
def convertdata(r, l):
newr = []
for c in r:
# Some codes start with leading zeroes and need to be left as-is
# The try...except clause would convert this code into integers, stripping
# the leading zeroes. str(c) == '0.0' makes sure single 0 values do get
# processed.
if str(c) == '0.0' or not str(c)[:1] == '0':
try:
float(str(c))
newr.append(str(int(c)))
except ValueError:
if str(c) == 'None':
newr.append('')
else:
newr.append(str(c))
else:
newr.append(str(c))
if c.Column == l and not c.Row - 1 == r.Rows.Count:
newr.append('\n')
else:
newr.append('\t')
return newr
# ... Code omitted
exl = GetObject(None, "Excel.Application")
exl_wbs = exl.Workbooks
sht = exlwbs[0].Worksheets(2)
rng = sht.Range(sht.Cells(2, 1), sht.Cells(lrow, lcol))
newrng = convertdata(rng, lcol)
dataf = ''.join(newrng)
with open(fpath, "w") as newfile:
newfile.write(dataf)
lrow
and lcol
are integers. fpath
is a string to the new file. sht
is a Worksheet object.
Input example
enter image description here
Output example (not the same data as the input, but you get the gist of it)
I 08.5070 Plate assembly Plate assembly 5 5070 VIRTUAL 1 1 1 0
I 0070_01.01 Plate D10 (SA) Plate D10 (SA) 08.58070 101 VIRTUAL 1 1 1 0
I 001170 Support Support 6 1170 VIRTUAL 1 1 1 0
I 0010.1170_01.01 conveyor (SA) conveyor (SA) 0090.1170 101 VIRTUAL
1 Answer 1
variable names
1-letter variable names are difficult to understand. Correct names for variables, functions and classes goes already a long way to comment the intent of the code.
r
is a data range, c
is a cell, l
is the last row in the range, then call em thus
functions
I would extract the code that is needed to parse 1 cell:
def convert_cell(data):
"""converts a single cell"""
data_string = str(data)
if data_string == "0.0":
return "0"
if data_string[0] == 0:
return data_string
if data_string == "None":
return ""
try:
float(data_string)
return str(int(data))
except ValueError:
return data_string
The advantage is that you cal skip some levels of nesting, and it is really clear what each type of input get transformed to. I also refactored the multiple calls to str(c)
to a variable
generator
Instead of starting a list, and appending to it, you can better yield
the results.
def convertdata(data_range, last_column):
for cell in data_range:
yield convert_cell(cell)
is_last_column = cell.Column == last_column
is_last_row = cell.Row - 1 == data_range.Rows.Count
yield "\n" if is_last_column and not is_last_row else "\t"
data_range = sheet.Range(sheet.Cells(2, 1), sheet.Cells(lrow, lcol))
dataf = ''.join(convertdata(data_range, lcol))
lcol
/l
is \$\endgroup\$sht
, and the imports won't help define it. Unless there's some magicfrom win32com.client import sht
that pick the sheet OP wants magically. \$\endgroup\$