3
\$\begingroup\$

What the following code does is:

The key field is column0 (where sometimes there could be single key, and sometimes keys separated by comma).

The rest of the columns are also either single or comma separated. But based on the single key value from column0, the goal is to collect all the values as a set in the rest of the columns.

import sys
import csv
dict={}
def getVal(k):
 try:
 v = dict[k]
 except:
 v= None
 return v
# First read as a line and first transformation separates first column
# and stores into a table with key,value (where value is the remaining)
for line in sys.stdin:
 line = line.strip()
 row = line.split('\t')
 l = len(row)
 keyvalues = row[0]
 items = keyvalues.split(",")
 for item in items:
 key = item
 value=[]
 for i in range(1, l):
 value.append(row[i].split(","))
 if getVal(key) == None:
 dict[key] = value
 else:
 prevval = dict[key]
 cols = len(prevval)
 newvalue=[]
 newlist = []
 for j in range(0,cols):
 newlist = prevval[j]+value[j]
 newset = set(newlist)
 newvalue.append(list(newset))
 dict[key] = newvalue
for k,v in dict.items():
 rowstr = k+"\t"
 ncols = len(v)
 for items in v:
 cols=""
 for item in items:
 cols +=item+","
 cols = cols[0:len(cols)-1]
 rowstr += cols+"\t"
 print rowstr 

Sample input

3,15,75 1635,1762 878 3425 121,122,123
15 1762 871 3475 121,125,126
3 1585,1590,5192 882,832,841 3200,3211 120,121,122,123,124

I'm getting the results correctly as expected, but I would like to know any improvements to make on the code.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Apr 29, 2016 at 0:04
\$\endgroup\$
0

2 Answers 2

3
\$\begingroup\$

Indentation matters a lot in Python. Please follow the PEP 8 standard, which is 4 spaces.

This code feels very procedural. Python can be much more expressive than that, if you take advantage of features like list comprehensions.

You've used a lot of variables: line, row, l, keyvalues, items, item, key, value, i, prevval, cols, newvalue, newlist, j, newset — too many for me to keep track of what each represents. You could reduce the mental load by eliminating variables for ephemeral expressions. For example, you could eliminate items by writing for item in keyvalues.split(",").

I see that you considered using the csv module, but didn't. If you did, you could have simplified

for line in sys.stdin:
 line = line.strip()
 row = line.split('\t')

... to for row in csv.reader(fileinput.input(), delimiter='\t'). I recommend fileinput instead of sys.stdin so that the user has the option of passing a filename on the command line.

The getVal(k) function can just be replaced by dict.get(k, None). It would be better to avoid choosing dict as the name of a variable, since it shadows the dict() constructor.

Suggested solution

import csv
import fileinput
data = {}
for row in csv.reader(fileinput.input(), delimiter='\t'):
 values = [set(col.split(',')) for col in row]
 for key in values.pop(0): # Numbers in column 0
 data[key] = [
 new_col.union(old_col)
 for new_col, old_col in zip(values, data.get(key, values))
 ]
for key, values in data.items():
 print '\t'.join([key] + [','.join(col) for col in values])
answered Apr 29, 2016 at 8:34
\$\endgroup\$
0
1
\$\begingroup\$

Dictionary usage

Your getVal function is the get method of standard dictionaries. Use dict.get(key) instead of getVal(key). However, 3 things that are wrong with it:

  • It uses the global variable dict to retrieve values from, you should have parametrize it with the dictionary instead:

    def getVal(dict_, key):
     try:
     v = dict_[key]
     except:
     v = None
     return v
    
  • It uses dict as a variable name, shadowing the builtin function.

  • It uses a raw except, catching the expected KeyError but potentially catching more serious issues. Always specify the kind of exception you are expecting.

Now for getVal usage, you set a value into dict[key] if getVal returned None otherwise you process further. In such case, you will set the value because the flow got into the except in your getVal function, meaning key is not in dict. It's quite simple to test in python:

if key not in dict:
 dict[key] = value
else:
 prevval = ...

No need for getVal anymore...

answered Apr 29, 2016 at 8:35
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.