So, I was working on a dashboard for a potential customer and I needed a fake dataset with employee information to demonstrate. Mainly, I needed to know when the employee arrived the company (first swipe), when he left (last swipe) and the hours he spent in many other areas of the company (ORC is a room, DSP is another, and so on)
Basically, I create a random hour between 8am and 10am and assign this to the first swipe. I do the same for last swipe but in a range from 5pm to 7pm. Then, I calculate how many hours he worked by subtracting one from another.
With this information now I start to calculate how many hours he spent in every area of the company. The ORC is the working room, so I want to keep between 50% to 80% of hours worked there and the rest randomly assign to other areas.
I spent a lot of time in this code, and it's been a while since I created it. It's not the most pythonic code you will ever see, but it worked :D
import calendar
import datetime
import random
def random_hour(start, end):
hour_rand = random.randint(start, end)
minutes_rand = random.randint(0, 59)
return datetime.timedelta(hours=hour_rand, minutes=minutes_rand)
def random_weight(total_working_hours):
working_hours_per_area = {
'in_orc': 0,
'in_cafe': 0,
'in_dsp': 0,
'in_kiosk': 0,
'in_training': 0,
}
whole_time = 100
total_time = datetime.timedelta()
for i, area in enumerate(working_hours_per_area):
if whole_time < 0:
whole_time = 0
if i == 0:
rand_time = random.randint(50, 80)
else:
rand_time = random.randint(0, whole_time)
whole_time -= rand_time
working_hours_per_area[area] = rand_time / 100
total_aux = sum(working_hours_per_area.values())
if total_aux < 1.0:
diff = 1.0 - total_aux
min_hour = min(working_hours_per_area.keys(), key=(lambda k: working_hours_per_area[k]))
working_hours_per_area[area] += diff
for area in working_hours_per_area:
working_hours_per_area[area] = working_hours_per_area[area] * total_working_hours
return working_hours_per_area
employees = [
['CHI-123', 'CLOVIS TONELADA'],
['CHI-456', 'JOSE DA COVA'],
['CHI-789', 'EMERSON PEDREIRA'],
['CHI-321', 'GREYCE CROQUETE'],
['CHI-654', 'ROBERTO PINGA'],
['CHI-987', 'CAROLINA DOZE AVOS'],
]
days = []
cal = calendar.Calendar()
for week in cal.monthdatescalendar(2020,9):
for day in week:
if day.weekday() < 5:
days.append(day)
f = open('dataset.csv', 'w+')
f.write('Date;Employee_Name;Employee_Code;First_Swipe;Last_Swipe;Total_Working_Hours;In_ORC;In_Cafe;In_DSP;In_Kiosk;In_Training\n')
for day in days:
for i in range(0, 6):
date = day
employee_name = employees[i][1]
employee_code = employees[i][0]
first_swipe = random_hour(8, 10)
last_swipe = random_hour(17, 19)
total_working_hours = last_swipe - first_swipe
working_hours_per_area = random_weight(total_working_hours)
total_working_hours_per_area = datetime.timedelta(hours=0, minutes=0)
locals().update(working_hours_per_area)
write = ';'.join([str(date), employee_name, employee_code, str(first_swipe), str(last_swipe), str(total_working_hours), \
str(in_orc), str(in_cafe), str(in_dsp), str(in_kiosk), str(in_training)])
f.write(write + '\n')
f.close()
1 Answer 1
You didn't include any specific request, so here are some general comments.
comments / documentation
You say it's been a while since you wrote it. When you look at the code now, are there places you ask yourself "why did I do that?" or where it takes time to figure out what is going on? If so, those are good places to add comments.
Also docstrings could be added to the file and the functions.
random_hour(start, end)
The writeup says it returns a random time between start and end. However, it actually returns a random timedelta between start and end
+ 59 minutes. Also, similar python functions tend to include the start
and exclude the end
(e.g. randrange), so it would be good to document this.
random_weight(total_working_hours)
dicts() are not guaranteed to be ordered until Python 3.7. So i==0 may not correspond to in_orc
. It would be better to iterate over the keys and check if the key=='in_orc'.
min_hour
is calculated but never used. I think it is supposed to be area
.
module level code
It is common to put the top level code in a function such as main()
. And the call main()
from code such as
if __name__ == '__main__':
main()
csv module
The standard library includes the csv
module for reading a writting csv and other kinfs of delimited text files. It takes care of escaping characters or enclosing strings in quotes if needed.
unpacking
Instead of using for i in range(0,6)
to iterate over the employees, use something like:
for employee_code, employee_name in employees:
...
locals()
The python documentation says the dictionary returned by locals()
should NOT be modified. The changes may not be picked up by the interpreter.
That's enough for now.