2
\$\begingroup\$

I have a working solution for the given problem of A/B Optimization, but you as an experienced Python (3) developer will agree with me, that this solution is not very pythonic at all.

My Question here is: How do I make this code more pythonic?

It's a very procedural process that one would expect from a Basic program, but not from a Python program. Please help me to become a better coder and make pythonic code by working through this example with me.

Here's the problem and my solution.

Problem

In this test you will be given a CSV containing the results of a website's A/B homepage test. The site is simultaneously testing 5 design variables (A, B, C, D, and E) and tracking if users clicked on the signup button (Z). Each line of the CSV represents a visitor to the homepage, what they saw, and if they clicked or not. The A-E variables will have values of 1-5, representing which version of that design element was shown to the user. The Z variable will contain either a 1 to represent that the user clicked signup, or a 0 to represent that the user did not.

A B C D E Z
1 3 1 2 4 1
3 2 1 5 5 0
4 3 2 2 5 1
5 5 2 3 4 0
2 4 1 3 4 0
...

Your task will be to determine the single combination of A-E values that will make users most likely to click the signup button. Assume the effects of each A-E value are mutually exclusive. If two values of a variable are equally optimal, choose the lesser value. You will enter the answer in the form of a 5-digit number, the first digit being the optimal A value, the second being the optimal B value, the third being the optimal C value, etc.

Solution

from urllib.request import urlopen
def getBestDesign(url):
 data = urlopen(url).readlines()
 A = {}; B = {}; C = {}; D = {}; E = {};
 A['1'] = 0
 A['2'] = 0
 A['3'] = 0
 A['4'] = 0
 A['5'] = 0
 B['1'] = 0
 B['2'] = 0
 B['3'] = 0
 B['4'] = 0
 B['5'] = 0
 C['1'] = 0
 C['2'] = 0
 C['3'] = 0
 C['4'] = 0
 C['5'] = 0
 D['1'] = 0
 D['2'] = 0
 D['3'] = 0
 D['4'] = 0
 D['5'] = 0
 E['1'] = 0
 E['2'] = 0
 E['3'] = 0
 E['4'] = 0
 E['5'] = 0
 for row in data:
 row = row.decode(encoding='UTF-8').split('\t')
 if row[5].startswith('1'):
 A[str(row[0])] += 1
 B[str(row[1])] += 1
 C[str(row[2])] += 1
 D[str(row[3])] += 1
 E[str(row[4])] += 1
 maxclicks_A = max(A.values())
 maxclicks_B = max(B.values())
 maxclicks_C = max(C.values())
 maxclicks_D = max(D.values())
 maxclicks_E = max(E.values())
 for k, v in A.items():
 if v == maxclicks_A:
 print('A: ', k)
 for k, v in B.items():
 if v == maxclicks_B:
 print('B: ', k)
 for k, v in C.items():
 if v == maxclicks_C:
 print('C: ', k)
 for k, v in D.items():
 if v == maxclicks_D:
 print('D: ', k)
 for k, v in E.items():
 if v == maxclicks_E:
 print('E: ', k)
if __name__ == "__main__":
 url = input('Url: ')
 getBestDesign(url)
asked Nov 2, 2013 at 0:31
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

There is massive code duplication in your script. Basically you are doing the same things for A through E, so you could just put them in a list of dictionaries and use loops. Also, by using defaultdict you do not have to initialize them per hand (and even if not you could use a loop for this, too). Also, I'd recommend removing the code for transforming the URL to a 2D array from the function, and using just split() without '\t' the code works as well, and is somewhat more robust to other input formats.

from urllib.request import urlopen
from collections import defaultdict
def getBestDesign(data):
 # this list holds all your A, B, C, ... dicts, with defaultvalues
 ABCDE = [defaultdict(int) for i in range(5)]
 for row in data:
 if row[5] == '1':
 for i in range(5):
 ABCDE[i][row[i]] += 1
 for i in range(5):
 maxclicks = max(ABCDE[i].values())
 for k, v in ABCDE[i].items():
 if v == maxclicks:
 print('ABCDE'[i], ": ", k) # note the ''!
 # break # optional, stop after first "max" value
if __name__ == "__main__":
 url = input('Url: ')
 lines = urlopen(url).readlines()
 data = [line.decode(encoding='UTF-8').split() for line in lines]
 getBestDesign(data)

The algorithms is still basically the same as yours, and I think that it can still be improved. Probably you could use numpy to treat the input data as a matrix and do all sorts of useful matrix operations on the data. Further, you could probably use collections.Counter or itertools.groupby for counting the good combinations, reducing the whole thing to two or three lines.


Here's another version, even shorter, doing some preprocessing and using collections.Counter:

def getBestDesign(data):
 filtered = [d[:-1] for d in data if d[5] == '1']
 for i, d in enumerate(zip(*filtered)):
 key, maxclicks = Counter(d).most_common(1)[0]
 print("{0}: {1} ({2})".format("ABCDE"[i], key, maxclicks))

Let's explain this line by line:

  • the filtered list comprehension holds just the lines from data where the user has clicked; as this is already filtered, we can drop the last column with [:-1]
  • in the loop, we iterate the data columns -- zip(*...) basically transposes the matrix -- using enumerate, which yields the index i and the actual data d at that index
  • the Counter class is used to count each variant and to return the most_common one
  • finally, we print the result using string.format, using the index i to get the respective letter from the string "ABCDE"
answered Nov 2, 2013 at 13:04
\$\endgroup\$
2
  • \$\begingroup\$ +1, but it seems wrong to me that you have one loop over ABCDE[i].values() to find maxclicks and then a second loop over ABCDE[i].items() to find the key. Better to do this in a single loop: k, maxclicks = max(ABCDE[i].items(), key=operator.itemgetter(1)) \$\endgroup\$ Commented Nov 4, 2013 at 10:16
  • \$\begingroup\$ @GarethRees You are right! When I wrote this, I thought "this seems wrong, there has to be a better way" all the time, but did not catch it. Thanks! (Then again, this way there will always be only one max value, whereas there may be multiple.) \$\endgroup\$ Commented Nov 4, 2013 at 10:49

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.