I have a file that I wish to parse. It has data in the json format, but the file is not a json file. I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0.
{ "totalReplyCount": 0,
"newLevel":{
"main":{
"url":"http://www.someURL.com",
"name":"Ronald Whitlock",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something great"
},
"id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
}
},
{ "totalReplyCount": 4,
"newLevel":{
"main":{
"url":"http://www.someUR2L.com",
"name":"other name",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something else great"
},
"id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
}
},
My initial attempt was to do the following
def readCsv(filename):
with open(filename, 'r') as csvFile:
for row in csvFile["totalReplyCount"]:
print row
but I get an error stating
TypeError: 'file' object has no attribute 'getitem'
I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. What is the correct way to do this? My end result should look like this for the ids:
['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]
EDIT 1- 7/26/16
I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). I switched it to a proper format that is more like JSON. This new edit properly matches file I am parsing. I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1
:, where line X is the end of the line.
def readCsv(filename):
with open(filename, 'r') as file:
data=json.load(file)
pprint(data)
I also tried DictReader, and got a KeyError: 'totalReplyCount'
. Is the dictionary un-ordered?
EDIT 2 -7/27/16
After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' :
def readCsv(filename):
with open(filename, 'r') as csvfile:
for row in csv.DictReader(csvfile):
for item in row:
print item[0]
I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. If I was to do print item[0:5]
I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. What am I missing?
5 Answers 5
After reading the question and all the above answers, please check if this is useful to you.
I have considered input file as simple file not as csv or json file.
Flow of code is as follow:
- Open and read a file in reverse order.
- Search for ID in line. Extract ID and store in temp variable.
- Go on reading file line by line and search totalReplyCount.
- Once you got totalReplyCount, check it if it greater than 0.
- If yes, then store temp ID in id_list and re-initialize temp variable.
import re tmp_id_to_store = '' id_list = [] for line in reversed(open("a.txt").readlines()): m = re.search('"id":"(\w+)"', line.rstrip()) if m: tmp_id_to_store = m.group(1) n = re.search('{ "totalReplyCount": (\d+),', line.rstrip()) if n: fou = n.group(1) if int(fou) > 0: id_list.append(tmp_id_to_store) tmp_id_to_store = '' print id_list
More check points can be added.
As the error stated, Your csvFile
is a file
object, it is not a dict
object, so you can't get an item out of it.
if your csvFile
is in CSV format, you can use the csv module to read each line of the csv into a dict :
import csv
with open(filename) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['totalReplyCount']
note the DictReader
method from the csv module, it will read your csv line and parse it into dict object
-
His input file however looks more like badly formed json.Paul Rooney– Paul Rooney07/26/2016 04:16:21Commented Jul 26, 2016 at 4:16
-
@PaulRooney yeah I just suppose that the file is in the correct csv format since he mentioned the file is not in json formatChrim– Chrim07/26/2016 04:17:56Commented Jul 26, 2016 at 4:17
If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. Then it is just a matter of iterating over the keys and extracting data.
import json
from pprint import pprint
with open('data.json') as data_file:
data = json.load(data_file)
pprint(data)
Parsing values from a JSON file using Python?
Look at Justin Peel's answer. It should help.
-
I did try iterating over the data in json, but got errors because it is not in the proper json format. I tried to parse the data and write to a file, but the error I got was TypeError: Expected a character buffer object...unseen_damage– unseen_damage07/26/2016 14:12:50Commented Jul 26, 2016 at 14:12
-
@unseen_damage Are you sure your JSON is formatted correctly? Try using this to check first: jsonformatter.curiousconcept.comuser5228393– user522839307/26/2016 17:02:17Commented Jul 26, 2016 at 17:02
-
The json is not formatted properly, although it is close to json. The file output is basically like such:
{ item: 0, { item 2: {item 3: xxx, item4: xxx} item5: xxx } }, { item: 0, { item 2: {item 3: xxx, item4: xxx} item5: xxx } }
,unseen_damage– unseen_damage07/27/2016 18:46:09Commented Jul 27, 2016 at 18:46
Parsing values from a JSON file in Python , this link has it all @ Parsing values from a JSON file using Python? via stackoverflow.
Here is a shell one-liner, should solve your problem, though it's not python.
egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if (2ドル+0 > 0) {getline; print}}' | cut -d: -f2
output:
"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
[<string>]
on a file object it doesnt support it. Also the data you are reading does not look like a csv.