parsing csv file in python and csv module

Question 1

I'm trying to parse a csv file but it seems that I'm missing something basic and can't get it right. Each raw of the csv contains a string in {} including several parameters randomly sorted such as in the example below.

Timestamp,Session Index,Event,Description,Version,Platform,Device,User ID,Params,
"Dec 03, 2014 01:30 AM",1,NoRegister,,1.4.0,iPhone,Apple iPhone 5s (GSM),,{},
"Dec 03, 2014 01:30 AM",2,HomeTab,Which tab the user viewed ,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ UserID : 36875; tabName : QuickAndEasy},
"Dec 03, 2014 01:30 AM",3,UserRecipeOverview,How many users go to Overview of a recipe?,1.4.0,iPhone,Apple iPhone 5s (GSM),,{ RecipeID : 1488; UserID : 36875},

My code is the following but I get an error that I don't understand:

counter = 0
mappedLines = {}
import csv
with open ('test.csv', 'r') as f:
 reader = csv.reader (f)
 for line in reader:
 counter = counter + 1
 lineDict = {}
 line = line.replace("{","")
 line = line.replace("}","")
 line = line.strip()
 fieldPairs = line.split(";")
 for pair in fieldPairs:
 fields = pair.split(":")
 key = fields[0].strip()
 value = fields[1].strip()
 lineDict[key] = value
 mappedLines[counter] = lineDict
def printFields(keys, lineSets):
 output_line = ""
 for key in keys:
 if key in lineSets:
 output_line = output_line + lineSets[key] + ","
 else:
 output_line += ","
 print output_line[0:len(output_line) - 1]
fields = ["UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"]
for key in range(1,len(mappedLines) + 1):
 lineSets = mappedLines[key]
 printFields(fields,lineSets)

Here's the Traceback:

Traceback (most recent call last):
 File "testV3.py", line 14, in <module>
 line = line.replace("{","")
AttributeError: 'list' object has no attribute 'replace'

EDIT:

I'm now triyng to include the write function to save the output to a new csv file with the following. the csv record the headers only and in column.

import csv
def printfields(keys, linesets):
 output_line = ""
 for key in keys:
 if key in linesets:
 output_line += linesets[key] + ","
 else:
 output_line += ","
 print output_line
def csv_writer(reader, path):
 """
 write reader to a csv file path
 """
 with open(path, "wd") as csv_file:
 writer = csv.writer(csv_file, delimiter=",")
 for line1 in line:
 if line1 in path
 writer.writerow(line1)
if __name__ == "__main__":
 fields = [
 "UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel", "targetUID"
 ]
 mappedLines = {}
 with open('test.csv', 'r') as f:
 reader = csv.DictReader(f)
 for line in reader:
 fieldPairs = [
 p for p in
 line['Params'].strip().strip('}').strip('{').strip().split(';')
 if p
 ]
 lineDict = {
 pair.split()[0].strip(): pair.split(':')[1].strip()
 for pair in fieldPairs
 }
 mappedLines[reader.line_num] = lineDict
 path = "output.csv"
 csv_writer(reader, path)
 for key in sorted(mappedLines.keys()):
 linesets = mappedLines[key]
 printfields(fields, linesets)

Question 2

I have solved your original question and imho your EDIT qualifies as a standalone question. If you agree, would you move your additional edit part of your question into a new question and resolved this question as answered?

Question 3

Hi @dopstar, Thanks for your help, comments and recommendations when using Stack overflow. As you probably noticed I'm still learning good practices when getting help from the community. your Help helps a lot! I have now created a new post stackoverflow.com/questions/27815100/… including my edits so you can answer it. Thanks!

Question 4

line is a list containing the cells of the current row. To access one of them, use a loop:

for cell in line:
 cell.replace(...)

Question 5

Hi Thanks for your feedback but I'm sorry I still don't get it. I already use a loop and I don't know what to do with yours. Could you explain more and detail how I should add/replace another loop? Tks! M.

Question 6

As I wrote: line is an array, not a string. You can't use replace on it. If you want to change the cell content, you must use two loops: one for the rows, and one for the cells in a row.

Question 7

@mmarboeuf: line[8] is the only cell/field with the { and } characters in it -- so you could also do something like line[8].replace(...) rather than loop over each of them.

Question 8

That should be cell.replace :)

Question 9

I have rearranged your code and modified it. The changes are that it uses csv.DictReader, and counter variable is no longer used. and the range function in the for loop is no longer used.

import csv
def printFields(keys, lineSets):
 output_line = ""
 for key in keys:
 if key in lineSets:
 output_line += lineSets[key] + ","
 else:
 output_line += ","
 print output_line
if __name__ == "__main__":
 fields = [
 "UserID", "tabName", "RecipeID", "type", "searchWord", "isFromLabel"
 ]
 mappedLines = {}
 with open('test.csv', 'r') as f:
 reader = csv.DictReader(f)
 for line in reader:
 fieldPairs = [
 p for p in
 line['Params'].strip().strip('}').strip('{').strip().split(';')
 if p
 ]
 lineDict = {
 pair.split()[0].strip(): pair.split(':')[1].strip()
 for pair in fieldPairs
 }
 mappedLines[reader.line_num] = lineDict
 for key in sorted(mappedLines.keys()):
 lineSets = mappedLines[key]
 printFields(fields, lineSets)

Question 10

Thanks! That worked like a charm. still struggling to write the output to a csv file though

Question 11

So now I'm trying to write the ouptupt to a csv file but I only get the headers in the csv file

Question 12

Would you then upvote this answer as useful at the very least and thereafter create a separate question for your additional part?

Question 13

I think I've done what you suggested. let me know if not. Thanks

Question 14

You can use the following statement to remove the "{" and "}" in a list of string

line = ".".join(line).replace("{","").replace("}","").split(",")

Question 15

Why parse the file as CSV when you then lump all the cells together again?

user1907906 · Accepted Answer · 2014-12-18 07:43:15Z

1

line is a list containing the cells of the current row. To access one of them, use a loop:

for cell in line:
 cell.replace(...)

Share

Improve this answer

edited Jan 4, 2015 at 8:49

Burhan Khalid's user avatar

Burhan Khalid

175k20 gold badges255 silver badges292 bronze badges

answered Dec 18, 2014 at 7:43

user1907906

Sign up to request clarification or add additional context in comments.

4 Comments

mmarboeuf

mmarboeuf Over a year ago

Hi Thanks for your feedback but I'm sorry I still don't get it. I already use a loop and I don't know what to do with yours. Could you explain more and detail how I should add/replace another loop? Tks! M.

2014年12月18日T09:01:12.433Z+00:00

user1907906

user1907906 Over a year ago

As I wrote: line is an array, not a string. You can't use replace on it. If you want to change the cell content, you must use two loops: one for the rows, and one for the cells in a row.

2014年12月18日T09:26:30.077Z+00:00

martineau

martineau Over a year ago

@mmarboeuf: line[8] is the only cell/field with the { and } characters in it -- so you could also do something like line[8].replace(...) rather than loop over each of them.

2014年12月18日T09:40:06.203Z+00:00

Burhan Khalid

Burhan Khalid Over a year ago

That should be cell.replace :)

2015年01月04日T08:48:57.5Z+00:00

CollectivesTM on Stack Overflow

parsing csv file in python and csv module

3 Answers 3

4 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

4 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related