Dealing with a dirty table

Question 1

I have to deal with a really ugly looking response I get back from a particular device. The string represents a table. Each section of the string represents the form:

["id", "date_time", "path", "type","duration","count", "rate"]

Which as an example looks like:

"1,2014年4月3日 10:1:10,somerandompath,some/screwy!stringwithstufflike[0,0,0,0]inIt,0,0;"

all table entries as you can see are terminated by ; and have their elements separated by ,

But some entries like in some/screwy!stringwithstufflike[0,0,0,0]inIt have commas within their data,.. and times/dates 10:1:10 aren't padded with 0's to be consistent

This is what I have so far, but it doesn't feel very optimized...

def GetEventTable(self):
 # After I split by ',' commas can be inside of [..], 
 # so I combine entries until their number of starting/end braces match
 def split_entry(entry):
 splitentry = entry.split(',')
 for ii in range(len(splitentry))[::-1]:
 start_bracket = splitentry[ii].count('[')
 end_bracket = splitentry[ii].count(']')
 if not start_bracket == end_bracket:
 splitentry[ii-1:ii+1] = [splitentry[ii-1] + ',' + splitentry[ii]]
 return splitentry
 event_lable = ["id", "date_time", "path", "type","duration","count", "rate"]
 ## This section just talks to the remote device and gets the unformatted string ###
 print "Sending: GET EVENT TABLE"
 stat, raw_data = SendRecieve(self, "GET EVENT TABLE")
 if IsError(stat):
 raise Exception("Error type '{}': {}".format(stat, raw_data))
 print raw_data
 ##################################################################################
 events = raw_data.split(';')[:-1]
 event_table = [{key:value for key,value in zip(event_lable,split_entry(entry) )} for entry in events ]
 return event_table

How can I improve the efficiency and also the readability?

It returns a list of dictionaries with keys ["id", "date_time", "path", "type","duration","count", "rate"]

Question 2

Can you add information about how this function is supposed to be used? At the moment, this is just the definition of a function taking no parameter and calling a SendRecieve function that probably does not exist on a standard installation (the typo is a pretty good hint).

Question 3

@Josay I already explained that I'm getting the data in a string where "entries as you can see are terminated by ; and have their elements separated by ,".. I did however forget to add self :p

Question 4

Well then you'll have to excuse me for being a pain but it's quite hard to review anything out of its context (a method out of its class). I'll try to give it a try if no one else has later on.

Question 5

Disclaimer

Because a lot of context is missing, here's the code I will review :

def split_entry(entry):
 splitentry = entry.split(',')
 for ii in range(len(splitentry))[::-1]:
 start_bracket = splitentry[ii].count('[')
 end_bracket = splitentry[ii].count(']')
 if not start_bracket == end_bracket:
 splitentry[ii-1:ii+1] = [splitentry[ii-1] + ',' + splitentry[ii]]
 return splitentry
event_lable = ["id", "date_time", "path", "type","duration","count", "rate"]
raw_data = "1,2014年4月3日 10:1:10,somerandompath,some/screwy!stringwithstufflike[0,0,0,0]inIt,0,0;"
events = raw_data.split(';')[:-1]
event_table = [{key:value for key,value in zip(event_lable,split_entry(entry) )} for entry in events ]
print event_table

ii is a pretty bad name. i is twice shorter and so many times easier to understand.

Instead of creating a new list by reversing the return of range, you could just use all the potential of range (or xrange):

Instead of using not A == B, you could write a more natural A != B

Looping over an array using range(len(a)) is an anti-pattern in python. To loop on an array while keeping track of the index, the usual solution is to use enumerate(). In your case, because you are updating the array anyway, it is quite tricky to loop on it.

Final point : I had trouble understanding your list slicing stuff but as I was messing a bit with it, I obtained a result different from yours ('rate' appears in the final result so it might actually be a good thing) :

def split_entry(entry):
 splitentry = entry.split(',')
 for i in xrange(len(splitentry)-1,0,-1):
 chunk = splitentry[i]
 if chunk.count('[') != chunk.count(']'):
 splitentry[i-1] += ',' + chunk
 splitentry[i] = '0'
 return splitentry
event_lable = ["id", "date_time", "path", "type","duration","count", "rate"]
raw_data = "1,2014年4月3日 10:1:10,somerandompath,some/screwy!stringwithstufflike[0,0,0,0]inIt,0,0;"
events = raw_data.split(';')[:-1]
event_table = [{key:value for key,value in zip(event_lable,split_entry(entry) )} for entry in events]
print event_table

Question 6

hmmm.. now that I think of it, I wouldn't need to clear the old value in splitentry[i] = '0' seeing as the zip function will only return up to max len(event_lable) anyway!

SylvainD SylvainD 29.8k1 gold badge49 silver badges93 bronze badges · Answer 1 · 2014-04-06 16:59:37Z

Disclaimer

Because a lot of context is missing, here's the code I will review :

def split_entry(entry):
 splitentry = entry.split(',')
 for ii in range(len(splitentry))[::-1]:
 start_bracket = splitentry[ii].count('[')
 end_bracket = splitentry[ii].count(']')
 if not start_bracket == end_bracket:
 splitentry[ii-1:ii+1] = [splitentry[ii-1] + ',' + splitentry[ii]]
 return splitentry
event_lable = ["id", "date_time", "path", "type","duration","count", "rate"]
raw_data = "1,2014年4月3日 10:1:10,somerandompath,some/screwy!stringwithstufflike[0,0,0,0]inIt,0,0;"
events = raw_data.split(';')[:-1]
event_table = [{key:value for key,value in zip(event_lable,split_entry(entry) )} for entry in events ]
print event_table

ii is a pretty bad name. i is twice shorter and so many times easier to understand.

Instead of creating a new list by reversing the return of range, you could just use all the potential of range (or xrange):

Instead of using not A == B, you could write a more natural A != B

Looping over an array using range(len(a)) is an anti-pattern in python. To loop on an array while keeping track of the index, the usual solution is to use enumerate(). In your case, because you are updating the array anyway, it is quite tricky to loop on it.

Final point : I had trouble understanding your list slicing stuff but as I was messing a bit with it, I obtained a result different from yours ('rate' appears in the final result so it might actually be a good thing) :

def split_entry(entry):
 splitentry = entry.split(',')
 for i in xrange(len(splitentry)-1,0,-1):
 chunk = splitentry[i]
 if chunk.count('[') != chunk.count(']'):
 splitentry[i-1] += ',' + chunk
 splitentry[i] = '0'
 return splitentry
event_lable = ["id", "date_time", "path", "type","duration","count", "rate"]
raw_data = "1,2014年4月3日 10:1:10,somerandompath,some/screwy!stringwithstufflike[0,0,0,0]inIt,0,0;"
events = raw_data.split(';')[:-1]
event_table = [{key:value for key,value in zip(event_lable,split_entry(entry) )} for entry in events]
print event_table

hmmm.. now that I think of it, I wouldn't need to clear the old value in splitentry[i] = '0' seeing as the zip function will only return up to max len(event_lable) anyway!

Stack Exchange Network

Dealing with a dirty table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Dealing with a dirty table

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions