Commit 55b4307

authored

Merge pull request avinashkranjan#903 from RohiniRG/RohiniRG-twitterb

Twitter Scraper using snscrape

2 parents 5d5af7b + 16c5ce2 commit 55b4307Copy full SHA for 55b4307

File tree

4 files changed

+167

-0

lines changed

Twitter_Scraper_without_API

4 files changed

+167

-0

lines changed

`‎Twitter_Scraper_without_API/README.md`

Lines changed: 41 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,41 @@`
	`1`	`+# Tweet hashtag based scraper without Twitter API`
	`2`	`+`
	`3`	`+- Here, we make use of snscrape to scrape tweets associated with a particular hashtag. Snscrape is a python library that scrapes twitter without the use of API keys.`
	`4`	`+`
	`5`	`+- We have 2 scripts associated with this project one to fetch tweets with snscrape and store it in the database (we use SQLite3), and the other script displays the tweets from the database.`
	`6`	`+`
	`7`	`+- Using snscrape, we are storing the hashtag, the tweet content, user id, as well as the URL of the tweets in the database.`
	`8`	`+`
	`9`	`+## Requirements`
	`10`	`+`
	`11`	`+Packages associated can be installed as:`
	`12`	`+`
	`13`	+```sh
	`14`	`+ $ pip install -r requirements.txt`
	`15`	+```
	`16`	`+`
	`17`	`+## Running the script`
	`18`	`+`
	`19`	`+For running the script which fetches tweets and other info associated with the hashtag and storing in the database:`
	`20`	+```sh
	`21`	`+ $ python fetch_hashtags.py`
	`22`	+```
	`23`	`+`
	`24`	`+For running the script to display the tweet info stored in the database:`
	`25`	+```sh
	`26`	`+ $ python display_hashtags.py`
	`27`	+```
	`28`	`+`
	`29`	`+## Working`
	`30`	`+`
	`31`	+```fetch_hashtags.py``` will work as follows:
	`32`	`+`
	`33`	`+![image](https://imgur.com/8YFK4OV.png)`
	`34`	`+`
	`35`	+```display_hashtags.py``` will work as follows:
	`36`	`+`
	`37`	`+![image](https://i.imgur.com/1uNEEMw.png)`
	`38`	`+`
	`39`	`+## Author`
	`40`	`+`
	`41`	`+[Rohini Rao](https://github.com/RohiniRG)`

`‎Twitter_Scraper_without_API/display_hashtags.py`

Lines changed: 50 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,50 @@`
	`1`	`+import sqlite3`
	`2`	`+import os`
	`3`	`+`
	`4`	`+`
	`5`	`+def sql_connection():`
	`6`	`+ """`
	`7`	`+ Establishes a connection to the SQL file database`
	`8`	`+ :return connection object:`
	`9`	`+ """`
	`10`	`+ path = os.path.abspath('./Twitter_Scraper_without_API/TwitterDatabase.db')`
	`11`	`+ con = sqlite3.connect(path)`
	`12`	`+ return con`
	`13`	`+`
	`14`	`+`
	`15`	`+def sql_fetcher(con):`
	`16`	`+ """`
	`17`	`+ Fetches all the tweets with the given hashtag from our database`
	`18`	`+ :param con:`
	`19`	`+ :return:`
	`20`	`+ """`
	`21`	`+ hashtag = input("\nEnter hashtag to search: #")`
	`22`	`+ hashtag = '#' + hashtag`
	`23`	`+ count = 0`
	`24`	`+ cur = con.cursor()`
	`25`	`+ cur.execute('SELECT * FROM tweets') # SQL search query`
	`26`	`+ rows = cur.fetchall()`
	`27`	`+`
	`28`	`+ for r in rows:`
	`29`	`+ if hashtag in r:`
	`30`	`+ count += 1`
	`31`	`+ print(f'USERNAME: {r[1]}\nTWEET CONTENT: {r[2]}\nURL: {r[3]}\n')`
	`32`	`+`
	`33`	`+ if count:`
	`34`	`+ print(f'{count} tweets fetched from database')`
	`35`	`+ else:`
	`36`	`+ print('No tweets available for this hashtag')`
	`37`	`+`
	`38`	`+`
	`39`	`+con = sql_connection()`
	`40`	`+`
	`41`	`+while 1:`
	`42`	`+ sql_fetcher(con)`
	`43`	`+`
	`44`	`+ ans = input('Press (y) to continue or any other key to exit: ').lower()`
	`45`	`+ if ans == 'y':`
	`46`	`+ continue`
	`47`	`+ else:`
	`48`	`+ print('Exiting..')`
	`49`	`+ break`
	`50`	`+`

`‎Twitter_Scraper_without_API/fetch_hashtags.py`

Lines changed: 66 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,66 @@`
	`1`	`+import snscrape.modules.twitter as sntweets`
	`2`	`+import sqlite3`
	`3`	`+`
	`4`	`+`
	`5`	`+def sql_connection():`
	`6`	`+ """`
	`7`	`+ Establishes a connection to the SQL file database`
	`8`	`+ :return connection object:`
	`9`	`+ """`
	`10`	`+ con = sqlite3.connect('./Twitter_Scraper_without_API/TwitterDatabase.db')`
	`11`	`+ return con`
	`12`	`+`
	`13`	`+`
	`14`	`+def sql_table(con):`
	`15`	`+ """`
	`16`	`+ Creates a table in the database (if it does not exist already)`
	`17`	`+ to store the tweet info`
	`18`	`+ :param con:`
	`19`	`+ :return:`
	`20`	`+ """`
	`21`	`+ cur = con.cursor()`
	`22`	`+ cur.execute("CREATE TABLE IF NOT EXISTS tweets(HASHTAG text, USERNAME text,"`
	`23`	`+ " CONTENT text, URL text)")`
	`24`	`+ con.commit()`
	`25`	`+`
	`26`	`+`
	`27`	`+def sql_insert_table(con, entities):`
	`28`	`+ """`
	`29`	`+ Inserts the desired data into the table to store tweet info`
	`30`	`+ :param con:`
	`31`	`+ :param entities:`
	`32`	`+ :return:`
	`33`	`+ """`
	`34`	`+ cur = con.cursor()`
	`35`	`+ cur.execute('INSERT INTO tweets(HASHTAG, USERNAME, CONTENT, '`
	`36`	`+ 'URL) VALUES(?, ?, ?, ?)', entities)`
	`37`	`+ con.commit()`
	`38`	`+`
	`39`	`+`
	`40`	`+con = sql_connection()`
	`41`	`+sql_table(con)`
	`42`	`+`
	`43`	`+while 1:`
	`44`	`+ tag = input('\n\nEnter a hashtag: #')`
	`45`	`+ max_count = int(input('Enter maximum number of tweets to be listed: '))`
	`46`	`+`
	`47`	`+ count = 0`
	`48`	`+ # snscrape uses the given string of hashtag to find the desired amount of`
	`49`	`+ # tweets and associated info`
	`50`	`+ for i in sntweets.TwitterSearchScraper('#' + tag).get_items():`
	`51`	`+ count += 1`
	`52`	`+ entities = ('#'+tag, i.username, i.content, i.url)`
	`53`	`+ sql_insert_table(con, entities)`
	`54`	`+`
	`55`	`+ if count == max_count:`
	`56`	`+ break`
	`57`	`+`
	`58`	`+ print('Done!')`
	`59`	`+`
	`60`	`+ ans = input('Press (y) to continue or any other key to exit: ').lower()`
	`61`	`+ if ans == 'y':`
	`62`	`+ continue`
	`63`	`+ else:`
	`64`	`+ print('Exiting..')`
	`65`	`+ break`
	`66`	`+`

`‎Twitter_Scraper_without_API/requirements.txt`

Lines changed: 10 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,10 @@`
	`1`	`+beautifulsoup4==4.9.3`
	`2`	`+certifi==2020年12月5日`
	`3`	`+chardet==4.0.0`
	`4`	`+idna==2.10`
	`5`	`+lxml==4.6.2`
	`6`	`+PySocks==1.7.1`
	`7`	`+requests==2.25.1`
	`8`	`+snscrape==0.3.4`
	`9`	`+soupsieve==2.2`
	`10`	`+urllib3==1.26.4`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 55b4307

File tree

4 files changed

4 files changed

`‎Twitter_Scraper_without_API/README.md`

`‎Twitter_Scraper_without_API/display_hashtags.py`

`‎Twitter_Scraper_without_API/fetch_hashtags.py`

`‎Twitter_Scraper_without_API/requirements.txt`

0 commit comments