This method takes a text file of my work schedule and parses it. It works, but is kind of kludgey, and I'm looking for feedback and suggestions. My first attempt at a file parser, and my first look at Python (I have dabbled in Java/Android and JavaScript). I'm hoping that between the pseudocode and the comments it will be clear what is going on.
Pseudocode:
def read_body(): f = open('file') tempList = [] d = custom data object() for line in f: if line not empty line: if begins with a macron: #denotes new day in the file prepare new/empty data object for line in f: split line for whitespace if line not empty line if line is work log it in data object else its another activity if pay value, also log it elif begins with an equal sign: #denotes end of day else: clean up f.close()
Full Code (it works):
macron = ' ̄'
equalSign = '='
underscore = '_'
def read_body(path, month_year):
'''Reads schedule detail text file and generates pairing data objects for each work day.
path: file to open 'monthYEAR.txt'
date: MonthYEAREMPNO (month, year, employee number)
from file header
'''
f = open(path)
tempList = []
d = WorkDay()
for line in f:
tempList = line.rsplit()
if len(tempList) > 0:
if macron in tempList[0]: #begin work day
#prepare new data object
d.clear()
d.month_year(month_year)
for line in f:
tempList = line.rsplit()
if len(tempList) > 0:
if isWorkingDay(tempList[0]): #parse tempList[1] date block MM/dd/YYYY
d.activity_name(tempList[0])
d.date_of_month(tempList[1])
line = f.next()
tempList = line.rsplit()
d.hours_worked(tempList[1])
d.pay(tempList[3])
break
else: #is other activity (sick, off, vacation, etc), log date/time/pay
d.activity_name(tempList[0]) #tempList[0] failed isWorkingDay(), so just print. Human readable string
for line in f:
if underscore in line:
break
elif 'Credit:' in line:
d.pay(line[8:12])
break
elif equalSign in tempList[0]: #end work day
#drop activity object in "write to db" queue
add_to_db(d)
else:
'''end of file, clean up partial data objects. necessary because schedule text file ends with a macron'''
#discard d if empty, raise error if non-complete, write to db if complete
pass
f.close()
2 Answers 2
As a recommendation for readability i suggest you replace
if len(tempList) > 0:
with
if tempList:
as the Empty List resolves to False.
It's also recommended to use the "with statement" when dealing with files to avoid errors.
with open('file') as f:
...
This will free any locks on the filesystem once this block is exited.
Those constants (
macron
,equalSign
,underscore
) are a bit uselessly named. It's no use havingconst int TEN = 10
; use names that reflect what the constants are for. (This also helps avoid the analogic case ofconst int TEN = 11
when you need to change the constant in a new version...) Better yet, put the constants into predicate functions that encapsulate all the checking in one easy-to-use package.The nested loops over your file handle may work, but they're confusing as heck. I recommend an outer
while True
loop and manual calls tof.next()
.Your code is very nested. Try splitting it into multiple functions.
It's often preferable to return early (
if not tempList: break
) instead of nesting (if tempList: ...
).Your inner loop could be improved to handle an entire "record" (begin day, stuff happens, end day) at a time, rather than taking each line and clumsily maintaining state. (All this really entails is changing the structure to remove the loops and clearly define a sequence of processing; moving the code into its own function is just bonus points.)
Some bits of code common to all branches of execution (
d.month_year(month_year)
is one spot,d.activity_name(tempList[0])
is another) can be moved out of the branching constructs.If a day is started without ending the previous day, the code throws the previous day away. You might want to change that.
You're not awfully clear about whether the macron to start a new day gets its own line; your code acts as if it does (by throwing away the rest of that line), so I'm going with that.
Your terminating characters (macron, equals, underscore) - you say they have to be at the beginning of the line, but then test if they're anywhere in the line! Combined with the above leap of logic, you can simplify those tests to checking if the character is the only one on its line.
Other-activity sections (terminated with
_
) don't terminate a record when they end. I'l assume it's impermissible to have anything between the end of an other-section and the end of the record.
Also taking into account Christoph Hegemann's pointers, your revised code:
def isStartWorkDay(line):
return line == " ̄\n"
def isEndWorkDay(line):
return line == "=\n"
def isEndOtherActivitySection(line):
return line == "_\n"
def processRecord(f, d):
# This line should just start a record
line = f.next()
if line == '\n':
return False
# Start of record should be start of record
if not isStartWorkDay(line):
raise Exception("You missed the ̄ at the start of the day")
# First line always starts with a day or a special identifier ("sick" etc.)
lineList = f.next().rsplit()
d.activity_name(lineList[0])
if isWorkingDay(lineList[0]):
# Line 1 is day of week, day of month, [boring]
d.date_of_month(lineList[1])
# Line 2 is [boring], hours worked, [boring], pay, [boring]
lineList = f.next().rsplit()
d.hours_worked(lineList[1])
d.pay(lineList[3])
else:
# Something special happened; go through lines describing it
# I know this is potentially confusing (as I said above), but it's the "least bad" way
for line in f:
if isEndOtherActivitySection(line):
# That's it for this section
break
elseif "Credit:" in line:
# It's a pay credit, we need to record it
d.pay(line[8:12])
else:
pass # Move along, nothing to see here
# All right, this record had better be ending now
line = f.next()
if isEndWorkDay(line):
return True
raise Exception("You missed the = at the end of the day")
def readBodyIntoDb(path, month_year):
'''Reads schedule detail text file and generates pairing data objects for each work day.
path: file to open 'monthYEAR.txt'
date: MonthYEAREMPNO (month, year, employee number)
from file header'''
d = None # WorkDay object; required in outer scope
with open(path) as f:
try:
while True:
d = WorkDay()
d.month_year(month_year)
if processRecord(f, d): # d modified by side effect
dbWrite(d)
except StopIteration:
# EOF. I filled this pseudocode in for you
if isEmptyRecord(d):
pass # Nothing to see here
elseif isCompleteRecord(d):
dbWrite(d) # Someone left off the last `=` or something
else:
raise Exception("incomplete record")
return True
(My Python's a bit rusty, so there may be latent syntax or logic errors in there.)
-
\$\begingroup\$ Thanks for the thorough look! A couple things: flags like macron etc are only present on lines by themselves, 40 at a time (horizontal line). I originally checked for
if tempList[0][0:1] == macron:
because its 2 bytes, but that didn't work for the equal or underscore, which neededtempList[0][0]
(1 byte), so it got cluttered and I discovered thein
statement which simplified it. Thoughts? I understand about the constant names, thanks. I'll rework and post a new question in a few days. \$\endgroup\$nexus_2006– nexus_20062014年08月05日 01:40:25 +00:00Commented Aug 5, 2014 at 1:40 -
\$\begingroup\$
if line == '\n'
doesn't work with my particular file, company IT uses windows line endings, butline.isspace()
(built-in function) works. \$\endgroup\$nexus_2006– nexus_20062014年08月05日 02:56:42 +00:00Commented Aug 5, 2014 at 2:56
Explore related questions
See similar questions with these tags.