7
\$\begingroup\$

Following along another code review (Printing out JSON data from Twitter as a CSV) I would like to submit a slightly adapted code for review.

This code imports JSON data obtained from Twitter and prints out just the tweet author and if there are any users the author mentions in the tweet. User_mentions is a field that is provided in the JSON output. The difficult part I've encountered is that an author sometimes doesn't mention anyone, or mentions 5 other user. So I'm not sure of the best way to account for this, besides what I've cobbled together below.

My ultimate goal: Create an edgelist from this data to then put into a network visualization tool. I've been converting the output (example below) from this code using a UNIX command (also below) I wrote, but if anyone has a better way to do this within this code, please do let me know.

Format after code:

  • author1,mention1,mention2,mention3
  • author1,mention4
  • author2 author3,mention5
  • author4,mention3,mention6

Ultimate format in CSV:

  • author1,mention1
  • author1,mention2
  • author1,mention3
  • author1,mention4
  • author3,mention5
  • author4,mention3
  • author4,mention6

Python Code:

import json
import sys
tweets=[]
# import tweets from JSON
for line in open(sys.argv[1]):
try:
 tweets.append(json.loads(line))
except:
pass
# create a new variable for a single tweets
tweet=tweets[0]
# pull out various data from the tweets
tweet_author = [tweet['user']['screen_name'] for tweet in tweets]
tweet_mention1 = [(tweet['entities']['user_mentions'][0]['screen_name'] if len(tweet['entities']['user_mentions']) >= 1 else None) for tweet in tweets]
tweet_mention2 = [(tweet['entities']['user_mentions'][1]['screen_name'] if len(tweet['entities']['user_mentions']) >= 2 else None) for tweet in tweets]
tweet_mention3 = [(tweet['entities']['user_mentions'][2]['screen_name'] if len(tweet['entities']['user_mentions']) >= 3 else None) for tweet in tweets]
tweet_mention4 = [(tweet['entities']['user_mentions'][3]['screen_name'] if len(tweet['entities']['user_mentions']) >= 4 else None) for tweet in tweets]
tweet_mention5 = [(tweet['entities']['user_mentions'][4]['screen_name'] if len(tweet['entities']['user_mentions']) >= 5 else None) for tweet in tweets]
tweet_mention6 = [(tweet['entities']['user_mentions'][5]['screen_name'] if len(tweet['entities']['user_mentions']) >= 6 else None) for tweet in tweets]
tweet_mention7 = [(tweet['entities']['user_mentions'][6]['screen_name'] if len(tweet['entities']['user_mentions']) >= 7 else None) for tweet in tweets]
tweet_mention8 = [(tweet['entities']['user_mentions'][7]['screen_name'] if len(tweet['entities']['user_mentions']) >= 8 else None) for tweet in tweets]
tweet_mention9 = [(tweet['entities']['user_mentions'][8]['screen_name'] if len(tweet['entities']['user_mentions']) >= 9 else None) for tweet in tweets]
tweet_mention10 = [(tweet['entities']['user_mentions'][9]['screen_name'] if len(tweet['entities']['user_mentions']) >= 10 else None) for tweet in tweets]
#outputting to CSV
out = open(sys.argv[2], 'w')
rows = zip(tweet_author, tweet_mention1, tweet_mention2, tweet_mention3, tweet_mention4, tweet_mention5, tweet_mention6)
from csv import writer
csv = writer(out)
for row in rows:
 values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
 csv.writerow(values)
out.close()

UNIX command: Used to take the output of this code and format as an edgelist. If this part can be worked into the above code, it would be much appreciated!

cat [file.txt] | sed 's/,/ /g' | awk '{print 1,ドル 2ドル "##" 1,ドル 3ドル "##" 1,ドル 4ドル "##" 1,ドル 5ドル "##" 1,ドル 6ドル "##" 1,ドル 7ドル "##" 1,ドル 8ドル "##" 1,ドル 9ドル "##" 1ドル 10ドル}' | sed 's/##/\n/g' | sed 's/ /,/g'
asked Mar 17, 2014 at 16:00
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

I've found another answer on https://github.com/alexhanna to this question that I've been able to adapt and enrich to somewhat meet my needs:

import json
import sys
from csv import writer
import time
from datetime import datetime
startTime = datetime.now()
with open(sys.argv[1]) as in_file, \
 open(sys.argv[2], 'w') as out_file:
 print >> out_file
 csv = writer(out_file)
 tweet_count = 0
 for line in in_file:
 tweet_count += 1
 try:
 tweet = json.loads(line)
 except:
 pass
 if not (isinstance(tweet, dict)):
 pass
 elif 'delete' in tweet:
 pass
 elif 'user' not in tweet:
 pass
 else:
 if 'entities' in tweet and len(tweet['entities']['user_mentions']) > 0:
 user = tweet['user']
 user_mentions = tweet['entities']['user_mentions']
 for u2 in user_mentions:
 print ",".join([
 user['screen_name'],
 u2['screen_name']
 ])
#values = [(value.encode('utf8') if hasattr(value, 'encode') else value) for value in row]
#csv.writerow(values)
print "File Imported:", str(sys.argv[1])
print "# Tweets Imported:", tweet_count
print "File Exported:", str(sys.argv[2])
print "Time Elapsed:", (datetime.now()-startTime)
answered Mar 17, 2014 at 19:55
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.