I'm trying to parse a csv file but it seems that I'm missing something basic and can't get it right. Each raw of the csv contains a string in {} including several parameters randomly sorted such as in the example below.
Timestamp,Session Index,Event,Description,Version,Platform,Device,User ID,Params,
"Dec 03, 2014 01:30 AM",1,NoRegister,,1.4.0,iPhone,Apple iPhone 5s (GSM),,{},
"Dec 03, 2014 01:30 AM",2,HomeTab,Which tab the user viewed ,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ UserID : 36875; tabName : QuickAndEasy},
"Dec 03, 2014 01:30 AM",3,UserRecipeOverview,How many users go to Overview of a recipe?,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ RecipeID : 1488; UserID : 36875},
My code is the following but I get an error that I don't understand:
counter = 0
mappedLines = {}
import csv
with open ('test.csv', 'r') as f:
reader = csv.reader (f)
for line in reader:
counter = counter + 1
lineDict = {}
line = line.replace("{","")
line = line.replace("}","")
line = line.strip()
fieldPairs = line.split(";")
for pair in fieldPairs:
fields = pair.split(":")
key = fields[0].strip()
value = fields[1].strip()
lineDict[key] = value
mappedLines[counter] = lineDict
def printFields(keys, lineSets):
output_line = ""
for key in keys:
if key in lineSets:
output_line = output_line + lineSets[key] + ","
else:
output_line += ","
print output_line[0:len(output_line) - 1]
fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]
for key in range(1,len(mappedLines) + 1):
lineSets = mappedLines[key]
printFields(fields,lineSets)
Here's the Traceback:
Traceback (most recent call last):
File "testV3.py", line 14, in <module>
line = line.replace("{","")
AttributeError: 'list' object has no attribute 'replace'
EDIT:
I'm now triyng to include the write function to save the output to a new csv file with the following. the csv record the headers only and in column.
import csv
def printfields(keys, linesets):
output_line = ""
for key in keys:
if key in linesets:
output_line += linesets[key] + ","
else:
output_line += ","
print output_line
def csv_writer(reader, path):
"""
write reader to a csv file path
"""
with open(path, "wd") as csv_file:
writer = csv.writer(csv_file, delimiter=",")
for line1 in line:
if line1 in path
writer.writerow(line1)
if __name__ == "__main__":
fields = [
"UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel", "targetUID"
]
mappedLines = {}
with open('test.csv', 'r') as f:
reader = csv.DictReader(f)
for line in reader:
fieldPairs = [
p for p in
line['Params'].strip().strip('}').strip('{').strip().split(';')
if p
]
lineDict = {
pair.split()[0].strip(): pair.split(':')[1].strip()
for pair in fieldPairs
}
mappedLines[reader.line_num] = lineDict
path = "output.csv"
csv_writer(reader, path)
for key in sorted(mappedLines.keys()):
linesets = mappedLines[key]
printfields(fields, linesets)
-
I have solved your original question and imho your EDIT qualifies as a standalone question. If you agree, would you move your additional edit part of your question into a new question and resolved this question as answered?dopstar– dopstar2015年01月04日 14:50:43 +00:00Commented Jan 4, 2015 at 14:50
-
Hi @dopstar, Thanks for your help, comments and recommendations when using Stack overflow. As you probably noticed I'm still learning good practices when getting help from the community. your Help helps a lot! I have now created a new post stackoverflow.com/questions/27815100/… including my edits so you can answer it. Thanks!mmarboeuf– mmarboeuf2015年01月07日 08:24:24 +00:00Commented Jan 7, 2015 at 8:24
3 Answers 3
line is a list containing the cells of the current row. To access one of them, use a loop:
for cell in line:
cell.replace(...)
4 Comments
line[8] is the only cell/field with the { and } characters in it -- so you could also do something like line[8].replace(...) rather than loop over each of them.cell.replace :)I have rearranged your code and modified it. The changes are that it uses csv.DictReader, and counter variable is no longer used. and the range function in the for loop is no longer used.
import csv
def printFields(keys, lineSets):
output_line = ""
for key in keys:
if key in lineSets:
output_line += lineSets[key] + ","
else:
output_line += ","
print output_line
if __name__ == "__main__":
fields = [
"UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"
]
mappedLines = {}
with open('test.csv', 'r') as f:
reader = csv.DictReader(f)
for line in reader:
fieldPairs = [
p for p in
line['Params'].strip().strip('}').strip('{').strip().split(';')
if p
]
lineDict = {
pair.split()[0].strip(): pair.split(':')[1].strip()
for pair in fieldPairs
}
mappedLines[reader.line_num] = lineDict
for key in sorted(mappedLines.keys()):
lineSets = mappedLines[key]
printFields(fields, lineSets)
4 Comments
You can use the following statement to remove the "{" and "}" in a list of string
line = ".".join(line).replace("{","").replace("}","").split(",")