Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 55b4307

Browse files
Merge pull request avinashkranjan#903 from RohiniRG/RohiniRG-twitterb
Twitter Scraper using snscrape
2 parents 5d5af7b + 16c5ce2 commit 55b4307

File tree

4 files changed

+167
-0
lines changed

4 files changed

+167
-0
lines changed

‎Twitter_Scraper_without_API/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Tweet hashtag based scraper without Twitter API
2+
3+
- Here, we make use of snscrape to scrape tweets associated with a particular hashtag. Snscrape is a python library that scrapes twitter without the use of API keys.
4+
5+
- We have 2 scripts associated with this project one to fetch tweets with snscrape and store it in the database (we use SQLite3), and the other script displays the tweets from the database.
6+
7+
- Using snscrape, we are storing the hashtag, the tweet content, user id, as well as the URL of the tweets in the database.
8+
9+
## Requirements
10+
11+
Packages associated can be installed as:
12+
13+
```sh
14+
$ pip install -r requirements.txt
15+
```
16+
17+
## Running the script
18+
19+
For running the script which fetches tweets and other info associated with the hashtag and storing in the database:
20+
```sh
21+
$ python fetch_hashtags.py
22+
```
23+
24+
For running the script to display the tweet info stored in the database:
25+
```sh
26+
$ python display_hashtags.py
27+
```
28+
29+
## Working
30+
31+
```fetch_hashtags.py``` will work as follows:
32+
33+
![image](https://imgur.com/8YFK4OV.png)
34+
35+
```display_hashtags.py``` will work as follows:
36+
37+
![image](https://i.imgur.com/1uNEEMw.png)
38+
39+
## Author
40+
41+
[Rohini Rao](https://github.com/RohiniRG)
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
import sqlite3
2+
import os
3+
4+
5+
def sql_connection():
6+
"""
7+
Establishes a connection to the SQL file database
8+
:return connection object:
9+
"""
10+
path = os.path.abspath('./Twitter_Scraper_without_API/TwitterDatabase.db')
11+
con = sqlite3.connect(path)
12+
return con
13+
14+
15+
def sql_fetcher(con):
16+
"""
17+
Fetches all the tweets with the given hashtag from our database
18+
:param con:
19+
:return:
20+
"""
21+
hashtag = input("\nEnter hashtag to search: #")
22+
hashtag = '#' + hashtag
23+
count = 0
24+
cur = con.cursor()
25+
cur.execute('SELECT * FROM tweets') # SQL search query
26+
rows = cur.fetchall()
27+
28+
for r in rows:
29+
if hashtag in r:
30+
count += 1
31+
print(f'USERNAME: {r[1]}\nTWEET CONTENT: {r[2]}\nURL: {r[3]}\n')
32+
33+
if count:
34+
print(f'{count} tweets fetched from database')
35+
else:
36+
print('No tweets available for this hashtag')
37+
38+
39+
con = sql_connection()
40+
41+
while 1:
42+
sql_fetcher(con)
43+
44+
ans = input('Press (y) to continue or any other key to exit: ').lower()
45+
if ans == 'y':
46+
continue
47+
else:
48+
print('Exiting..')
49+
break
50+
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import snscrape.modules.twitter as sntweets
2+
import sqlite3
3+
4+
5+
def sql_connection():
6+
"""
7+
Establishes a connection to the SQL file database
8+
:return connection object:
9+
"""
10+
con = sqlite3.connect('./Twitter_Scraper_without_API/TwitterDatabase.db')
11+
return con
12+
13+
14+
def sql_table(con):
15+
"""
16+
Creates a table in the database (if it does not exist already)
17+
to store the tweet info
18+
:param con:
19+
:return:
20+
"""
21+
cur = con.cursor()
22+
cur.execute("CREATE TABLE IF NOT EXISTS tweets(HASHTAG text, USERNAME text,"
23+
" CONTENT text, URL text)")
24+
con.commit()
25+
26+
27+
def sql_insert_table(con, entities):
28+
"""
29+
Inserts the desired data into the table to store tweet info
30+
:param con:
31+
:param entities:
32+
:return:
33+
"""
34+
cur = con.cursor()
35+
cur.execute('INSERT INTO tweets(HASHTAG, USERNAME, CONTENT, '
36+
'URL) VALUES(?, ?, ?, ?)', entities)
37+
con.commit()
38+
39+
40+
con = sql_connection()
41+
sql_table(con)
42+
43+
while 1:
44+
tag = input('\n\nEnter a hashtag: #')
45+
max_count = int(input('Enter maximum number of tweets to be listed: '))
46+
47+
count = 0
48+
# snscrape uses the given string of hashtag to find the desired amount of
49+
# tweets and associated info
50+
for i in sntweets.TwitterSearchScraper('#' + tag).get_items():
51+
count += 1
52+
entities = ('#'+tag, i.username, i.content, i.url)
53+
sql_insert_table(con, entities)
54+
55+
if count == max_count:
56+
break
57+
58+
print('Done!')
59+
60+
ans = input('Press (y) to continue or any other key to exit: ').lower()
61+
if ans == 'y':
62+
continue
63+
else:
64+
print('Exiting..')
65+
break
66+
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
beautifulsoup4==4.9.3
2+
certifi==2020年12月5日
3+
chardet==4.0.0
4+
idna==2.10
5+
lxml==4.6.2
6+
PySocks==1.7.1
7+
requests==2.25.1
8+
snscrape==0.3.4
9+
soupsieve==2.2
10+
urllib3==1.26.4

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /