JSON data from Twitter to Edgelist?

Question 1

Following along another code review (Printing out JSON data from Twitter as a CSV) I would like to submit a slightly adapted code for review.

This code imports JSON data obtained from Twitter and prints out just the tweet author and if there are any users the author mentions in the tweet. User_mentions is a field that is provided in the JSON output. The difficult part I've encountered is that an author sometimes doesn't mention anyone, or mentions 5 other user. So I'm not sure of the best way to account for this, besides what I've cobbled together below.

My ultimate goal: Create an edgelist from this data to then put into a network visualization tool. I've been converting the output (example below) from this code using a UNIX command (also below) I wrote, but if anyone has a better way to do this within this code, please do let me know.

Format after code:

author1,mention1,mention2,mention3
author1,mention4
author2 author3,mention5
author4,mention3,mention6

Ultimate format in CSV:

author1,mention1
author1,mention2
author1,mention3
author1,mention4
author3,mention5
author4,mention3
author4,mention6

Python Code:

import json
import sys
tweets=[]
# import tweets from JSON
for line in open(sys.argv[1]):
try:
 tweets.append(json.loads(line))
except:
pass
# create a new variable for a single tweets
tweet=tweets[0]
# pull out various data from the tweets
tweet_author = [tweet['user']['screen_name'] for tweet in tweets]
tweet_mention1 = [(tweet['entities']['user_mentions'][0]['screen_name'] if len(tweet['entities']['user_mentions']) >= 1 else None) for tweet in tweets]
tweet_mention2 = [(tweet['entities']['user_mentions'][1]['screen_name'] if len(tweet['entities']['user_mentions']) >= 2 else None) for tweet in tweets]
tweet_mention3 = [(tweet['entities']['user_mentions'][2]['screen_name'] if len(tweet['entities']['user_mentions']) >= 3 else None) for tweet in tweets]
tweet_mention4 = [(tweet['entities']['user_mentions'][3]['screen_name'] if len(tweet['entities']['user_mentions']) >= 4 else None) for tweet in tweets]
tweet_mention5 = [(tweet['entities']['user_mentions'][4]['screen_name'] if len(tweet['entities']['user_mentions']) >= 5 else None) for tweet in tweets]
tweet_mention6 = [(tweet['entities']['user_mentions'][5]['screen_name'] if len(tweet['entities']['user_mentions']) >= 6 else None) for tweet in tweets]
tweet_mention7 = [(tweet['entities']['user_mentions'][6]['screen_name'] if len(tweet['entities']['user_mentions']) >= 7 else None) for tweet in tweets]
tweet_mention8 = [(tweet['entities']['user_mentions'][7]['screen_name'] if len(tweet['entities']['user_mentions']) >= 8 else None) for tweet in tweets]
tweet_mention9 = [(tweet['entities']['user_mentions'][8]['screen_name'] if len(tweet['entities']['user_mentions']) >= 9 else None) for tweet in tweets]
tweet_mention10 = [(tweet['entities']['user_mentions'][9]['screen_name'] if len(tweet['entities']['user_mentions']) >= 10 else None) for tweet in tweets]
#outputting to CSV
out = open(sys.argv[2], 'w')
rows = zip(tweet_author, tweet_mention1, tweet_mention2, tweet_mention3, tweet_mention4, tweet_mention5, tweet_mention6)
from csv import writer
csv = writer(out)
for row in rows:
 values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
 csv.writerow(values)
out.close()

UNIX command: Used to take the output of this code and format as an edgelist. If this part can be worked into the above code, it would be much appreciated!

cat [file.txt] | sed 's/,/ /g' | awk '{print 1,ドル 2ドル "##" 1,ドル 3ドル "##" 1,ドル 4ドル "##" 1,ドル 5ドル "##" 1,ドル 6ドル "##" 1,ドル 7ドル "##" 1,ドル 8ドル "##" 1,ドル 9ドル "##" 1ドル 10ドル}' | sed 's/##/\n/g' | sed 's/ /,/g'

Question 2

I've found another answer on https://github.com/alexhanna to this question that I've been able to adapt and enrich to somewhat meet my needs:

import json
import sys
from csv import writer
import time
from datetime import datetime
startTime = datetime.now()
with open(sys.argv[1]) as in_file, \
 open(sys.argv[2], 'w') as out_file:
 print >> out_file
 csv = writer(out_file)
 tweet_count = 0
 for line in in_file:
 tweet_count += 1
 try:
 tweet = json.loads(line)
 except:
 pass
 if not (isinstance(tweet, dict)):
 pass
 elif 'delete' in tweet:
 pass
 elif 'user' not in tweet:
 pass
 else:
 if 'entities' in tweet and len(tweet['entities']['user_mentions']) > 0:
 user = tweet['user']
 user_mentions = tweet['entities']['user_mentions']
 for u2 in user_mentions:
 print ",".join([
 user['screen_name'],
 u2['screen_name']
 ])
#values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
#csv.writerow(values)
print "File Imported:", str(sys.argv[1])
print "# Tweets Imported:", tweet_count
print "File Exported:", str(sys.argv[2])
print "Time Elapsed:", (datetime.now()-startTime)

CurtLH CurtLH 6102 gold badges8 silver badges15 bronze badges · Answer 1 · 2014-03-17 19:55:07Z

I've found another answer on https://github.com/alexhanna to this question that I've been able to adapt and enrich to somewhat meet my needs:

import json
import sys
from csv import writer
import time
from datetime import datetime
startTime = datetime.now()
with open(sys.argv[1]) as in_file, \
 open(sys.argv[2], 'w') as out_file:
 print >> out_file
 csv = writer(out_file)
 tweet_count = 0
 for line in in_file:
 tweet_count += 1
 try:
 tweet = json.loads(line)
 except:
 pass
 if not (isinstance(tweet, dict)):
 pass
 elif 'delete' in tweet:
 pass
 elif 'user' not in tweet:
 pass
 else:
 if 'entities' in tweet and len(tweet['entities']['user_mentions']) > 0:
 user = tweet['user']
 user_mentions = tweet['entities']['user_mentions']
 for u2 in user_mentions:
 print ",".join([
 user['screen_name'],
 u2['screen_name']
 ])
#values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
#csv.writerow(values)
print "File Imported:", str(sys.argv[1])
print "# Tweets Imported:", tweet_count
print "File Exported:", str(sys.argv[2])
print "Time Elapsed:", (datetime.now()-startTime)

Stack Exchange Network

JSON data from Twitter to Edgelist?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

JSON data from Twitter to Edgelist?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions