twitter data mining script in python

Question 1

I have written a simple script that searches twitter for keywords and saves them to a csv file if they contain those words. It can be found on my github here.

How can I improve this code to generally be more efficient and be up to coding standards ?

"""
Script that goes through english tweets that are filtered by security words and posted in the last one hour and stores the polarity, id, date time, query, username and text into a csv file.
"""
import tweepy
import datetime, time, csv, codecs
from textblob import TextBlob
import cleanit
##setting authorization stuff for twitter##
consumer_key = "xxx"
consumer_secret = "xxx"
access_token = "xxx"
access_token_secret = "xxx"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
##initializing lists##
big_list = []
text_list = []
id_list = []
name_list = []
created_list = []
query_list = []
polarityy = []
t = 0
#use words in this list as search terms for tweepy.cursor function
security_words = ['phishing','dos','botnet','xss','smb','wannacry','heartbleed','ransomware','trojan','spyware','exploit','virus','malware','mitm']
# if word in security words list and double_meaning_words list if text also contains word from gen words list, if it does store if not discard
double_meaning_words = ['petya','smb','dos','infosec','hacker','backdoor']
gen_words = ["attack","security","hit","detected","protected","injection","data","exploit", "router", 'ransomware', 'phishing', 'wannacry', 'security']
def storing_data(stat):
##store id,username,datetime,text and polarity for filtered tweets in csv##
text_list.append(str(cleanit.tweet_cleaner_updated(status.text)).encode("utf-8")) 
 id_list.append(str(status.id)) # append id number to list 
 name_list.append(str(status.user.screen_name)) # append user name to list 
 created_list.append((status.created_at).strftime('%c')) # append date time to list 
 analysis = TextBlob(status.text)
 analysis = analysis.sentiment.polarity # use textblob on text to get sentiment score of text 
 if analysis >= -1 and analysis <= 0: # append sentiment score to list 
 polarityy.append("4")
 else:
 polarityy.append("0")
def rejects(stat):
##store tweets which do not pass filters into csv##
 with open('rejects.csv', "a", newline='', encoding='utf-8') as rejectfile:
 logger = csv.writer(rejectfile)
 logger.writerow([status.text])
while True:
 print ('running', datetime.datetime.now())
 with open('sec_tweet_dataset_5.csv', "a", newline='', encoding='utf-8') as logfile:
 logger = csv.writer(logfile)
 for i in security_words:
 alex = []
 for status in tweepy.Cursor(api.search, i,lang="en").items(40): #search twitter for word in security word list in english
 if (status.retweeted == False) or ('RT @' not in status.text): #is tweet is retweeted dont store it
 if i in double_meaning_words and i in status.text: #if search term being used from security words list also in double meaning words check if it also contains word -
 for words in gen_words: # - from gen_words list. If it does continue to storing if not dont store.
 if words in status.text:
 storing_data(status)
 break
 else:
 rejects(status)
 else:
 storing_data(status)
 rejects(status)
 while t < len(polarityy):
 alex = ([polarityy[t],id_list[t],created_list[t],name_list[t],text_list[int(t)]])
 t += 1
 logger.writerow(alex)
 time.sleep(1800)

Question 2

These following rules are pretty general and take time to internalize. I hope you can apply some of them to your code anyways.

Global variables (variables you don't declare in functions, but at the top level) should be avoided). Constants (variables which you never change) are okay. Instead of changing/mutating global variables in your functions, try to rewrite them so they take input and return something.

Try to break your code up into more functions.

Give descriptive variable names (What does "t" do in your code?).

Read through PEP8(https://www.python.org/dev/peps/pep-0008/) and try to apply it to your code.

Noah Haasis Noah Haasis 3111 silver badge2 bronze badges · Answer 1 · 2018-11-30 14:05:16Z

These following rules are pretty general and take time to internalize. I hope you can apply some of them to your code anyways.

Global variables (variables you don't declare in functions, but at the top level) should be avoided). Constants (variables which you never change) are okay. Instead of changing/mutating global variables in your functions, try to rewrite them so they take input and return something.

Try to break your code up into more functions.

Give descriptive variable names (What does "t" do in your code?).

Read through PEP8(https://www.python.org/dev/peps/pep-0008/) and try to apply it to your code.

Stack Exchange Network

twitter data mining script in python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

twitter data mining script in python

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions