Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 708b1d7

Browse files
Stack Overflow scraper added
1 parent 7d76916 commit 708b1d7

File tree

3 files changed

+97
-0
lines changed

3 files changed

+97
-0
lines changed

‎StackOverflow-Scraper/README.md‎

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
## Stack Overflow
2+
3+
### Scrape questions, views, votes, answer counts, and descriptions from Stack Overflow website regarding a topic
4+
5+
Create an instance of `StackOverflow` class.
6+
7+
```python
8+
questions = StackOverflow("topic")
9+
```
10+
11+
| Methods | Details |
12+
| -------------- | ----------------------------------------------------------------------------------- |
13+
| `.getQuestions()` | Returns the questions, views, votes, answer counts, and descriptions in JSON format |
14+
15+
**Example**
16+
17+
```python
18+
que = StackOverflow("github")
19+
scrape = que.getQuestions()
20+
json = json.loads(scrape)
21+
questions = json["questions"]
22+
for q in questions:
23+
print("\nQuestion: ", q["question"])
24+
print("Views: ", q["views"])
25+
print("Votes: ", q["vote_count"])
26+
print("Answers: ", q["answer_count"])
27+
print("Description: ", q["description"])
28+
29+
```

‎StackOverflow-Scraper/questions.py‎

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
from bs4 import BeautifulSoup
2+
import requests
3+
import json
4+
5+
6+
class StackOverflow:
7+
def __init__(self, topic):
8+
self.topic = topic
9+
10+
def getQuestions(self):
11+
"""
12+
Returns the questions, views, votes, answer counts, and descriptions in JSON format\n
13+
Class - `StackOverflow`
14+
Example:
15+
```
16+
que = StackOverflow(topic="github")
17+
scrape = que.getQuestions()
18+
```
19+
Returns:
20+
{
21+
"question": question title
22+
"views": view count of question
23+
"vote_count": vote count of question
24+
"answer_count": no. of answers to the question
25+
"description": description of the question
26+
}
27+
"""
28+
url = "https://stackoverflow.com/questions/tagged/" + self.topic
29+
try:
30+
res = requests.get(url)
31+
soup = BeautifulSoup(res.text, "html.parser")
32+
33+
questions_data = {"questions": []}
34+
35+
questions = soup.select(".s-post-summary")
36+
for que in questions:
37+
title = que.select_one(".s-link").getText()
38+
stats = que.select(".s-post-summary--stats-item-number")
39+
vote = stats[0].getText()
40+
ans = stats[1].getText()
41+
views = stats[2].getText()
42+
desc = (
43+
que.select_one(".s-post-summary--content-excerpt")
44+
.getText()
45+
.strip()
46+
.encode("ascii", "ignore")
47+
.decode()
48+
.replace(" ", "")
49+
)
50+
questions_data["questions"].append(
51+
{
52+
"question": title,
53+
"views": views,
54+
"vote_count": vote,
55+
"answer_count": ans,
56+
"description": desc,
57+
}
58+
)
59+
json_data = json.dumps(questions_data)
60+
return json_data
61+
except:
62+
error_message = {"message": "No questions related to the topic found"}
63+
64+
ejson = json.dumps(error_message)
65+
return ejson
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
beautifulsoup4
2+
requests
3+
json

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /