I have an array filled with 5 other arrays that consists of arrays of 2 values (first a date then a count):
[
[
[dt, cnt], [dt, cnt], ....
],
[
[dt, cnt], [dt, cnt], ....
],
[
[dt, cnt], [dt, cnt], ....
],
[
[dt, cnt], [dt, cnt], ....
],
[
[dt, cnt], [dt, cnt], ....
],
]
It concerns click statistics of different websites. This data needs to be converted to data to make a consistent chart with Google visualisations. So first a conversion is done in Python, then this result converted to JSON to pass to the Google libs.
My current code is this:
# determine min date
mindate = datetime.date.max
for dataSet in sets:
if (dataSet[0][0] < mindate):
mindate = dataSet[0][0];
# fill a dictionary with all dates involved
datedict = {}
for dat in daterange(mindate, today):
datedict[dat] = [dat];
# fill dictionary with rest of data
arrlen = 2
for dataSet in sets:
# first the values
for value in dataSet:
datedict[value[0]] = datedict[value[0]] + [value[1]];
# don't forget the missing values (use 0)
for dat in daterange(mindate, today):
if len(datedict[dat]) < arrlen:
datedict[dat] = datedict[dat] + [0]
arrlen = arrlen + 1
# convert to an array
datearr = []
for dat in daterange(mindate, today):
datearr = datearr + [datedict[dat]]
# convert to json
result = json.dumps(datearr, cls=DateTimeEncoder)
(The DateTimeEncoder
just gives a JavaScript-friendly datetime in the JSON)
The output looks like this:
[ ["new Date(2008, 7, 27)", 0, 5371, 1042, 69, 0], ["new Date(2008, 7, 28)", 0, 5665, 1100, 89, 0], ... ]
This is one of my first Python adventures, so I expect this piece of code can be improved upon easily. I want this to be shorter and more elegant. Show me the awesomeness of Python because I'm still a bit disappointed.
I'm using Django, by the way.
3 Answers 3
List comprehensions are definitely the way to go here. They allow you to say what you mean without getting mired in the details of looping.
To find the first date, make use of the built-in min()
function.
defaultdict
is useful for handling lookups where there may be missing keys. For example, defaultdict(int)
gives you a dictionary that reports 0 whenever you lookup a date that it doesn't know about.
I assume that each of the lists within sets
represents the data for one of your sites. I recommend using site_data
as a more meaningful alternative to dataSet
.
from collections import defaultdict
import json
def first_date(series):
return series[0][0]
start_date = min(first_date(site_data) for site_data in sets)
# counts_by_site_and_date is just a "more convenient" version of sets, where
# the [dt, cnt] pairs have been transformed into a dictionary where you can
# lookup the count by date.
counts_by_site_and_date = [defaultdict(int, site_data) for site_data in sets]
result = json.dumps([
[date] + [site_counts[date] for site_counts in counts_by_site_and_date]
for date in daterange(start_date, today)
], cls=DateTimeEncoder)
-
1\$\begingroup\$ Thanks! I really like the defaultdict. Didn't know that one yet. \$\endgroup\$Jacco– Jacco2015年01月21日 22:11:35 +00:00Commented Jan 21, 2015 at 22:11
Trailing semicolons
At many places there is a ;
at the end of the line.
It's completely unnecessary in Python and you should remove those.
Appending to arrays
At several places you are appending values to arrays in a strange way, for example:
datedict[val[0]] = datedict[val[0]] + [val[1]];
The common and shorter way to do this is using .append()
:
datedict[val[0]].append(val[1])
Augmented assignment operator +=
Instead of:
l = l + 1
This is shorter and better:
l += 1
Use list comprehensions
Instead of:
datearr = [] for dat in daterange(mindate, today): datearr = datearr + [datedict[dat]]
You can use a list comprehension, a powerful feature of Python:
datearr = [datedict[dat] for dat in daterange(mindate, today)]
In Python 3.3 or earlier you can get the mindate using min()
with a generator expression:
mindate = datetime.date.max
if sets:
mindate = min(d[0][0] for d in sets)
For Python 3.4+ we can pass a default value to min()
in case the iterable passed to it is empty:
mindate = min((d[0][0] for d in sets), default=datetime.date.max)
As we are going to use the results returned by daterange
many times in our program we should better save it once in a variable.
dates_list = daterange(mindate, today)
Now while initializing the datedict
better initialize it as [0, 0, 0, 0, 0]
or [0]*len(sets)
because after this we can remove the inner loop that you're doing in the next piece of code. And you can initialize the dict using a dict-comprehension:
datedict = {dat: [0]*len(sets) for dat in dates_list}
For Python 2.6 or earlier use dict()
with a generator expression:
datedict = dict((dat, [0]*len(sets)) for dat in dates_list)
Now time to update the value of items whose values are known to us. The trick here is to use enumerate
to get the index of current row from sets
so that we can update the count at particular index:
for i, d in enumerate(sets):
# use tuple unpacking
for dat, count in d:
datedict[dat][i] += count
Lastly we can create the desired dateattr
using a list-comprehension:
datearr = [[dat] + datedict[dat] for dat in dates_list]
-
\$\begingroup\$ Great, that step works now, but I think the last step renders the array empty. (I checked that the datedict is correct before that step) \$\endgroup\$Jacco– Jacco2015年01月21日 22:01:38 +00:00Commented Jan 21, 2015 at 22:01
-
\$\begingroup\$ @Jacco I guess
daterange
is a generator then? \$\endgroup\$Ashwini Chaudhary– Ashwini Chaudhary2015年01月21日 22:02:47 +00:00Commented Jan 21, 2015 at 22:02 -
\$\begingroup\$ def daterange(start_date, end_date): for n in range(int ((end_date - start_date).days)): yield start_date + datetime.timedelta(n) \$\endgroup\$Jacco– Jacco2015年01月21日 22:05:19 +00:00Commented Jan 21, 2015 at 22:05
-
\$\begingroup\$ So, yes :-) I fixed this by doing: dates_list = list(daterange(mindate, today)) \$\endgroup\$Jacco– Jacco2015年01月21日 22:07:37 +00:00Commented Jan 21, 2015 at 22:07
Explore related questions
See similar questions with these tags.