Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 70c9016

Browse files
Added files for infobox scraper (avinashkranjan#982)
1 parent 8fc1a92 commit 70c9016

File tree

3 files changed

+128
-0
lines changed

3 files changed

+128
-0
lines changed

‎Wiki_Infobox_Scraper/README.md‎

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Wikipedia infobox scraper
2+
3+
- The given python script uses beautifulSoup to scrape Wikipedia pages according to the given user query and obtain data from its wikipedia infobox.
4+
5+
## Requirements:
6+
7+
```
8+
$ pip install -r requirements.txt
9+
```
10+
11+
## Working screenshots:
12+
13+
![Image](https://i.imgur.com/cyLKmYL.png)
14+
15+
![Image](https://i.imgur.com/s2XGW95.png)
16+
17+
![Image](https://i.imgur.com/afWQSW9.png)
18+
19+
## Author:
20+
[Rohini Rao](www.github.com/RohiniRG)

‎Wiki_Infobox_Scraper/main.py‎

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
from bs4 import BeautifulSoup
2+
import requests
3+
from tkinter import *
4+
5+
info_dict = {}
6+
7+
def error_box():
8+
"""
9+
A function to create a pop-up, in case the code errors out
10+
"""
11+
global mini_pop
12+
13+
mini_pop = Toplevel()
14+
mini_pop.title('Error screen')
15+
16+
mini_l = Label(mini_pop, text=" !!!\nERROR FETCHING DATA", fg='red', font=('Arial',10,'bold'))
17+
mini_l.grid(row=1, column=1, sticky='nsew')
18+
entry_str.set("")
19+
20+
21+
22+
def wikiScraper():
23+
"""
24+
Function scrapes the infobox lying under the right tags and displays
25+
the data obtained from it in a new window
26+
"""
27+
global info_dict
28+
29+
# Modifying the user input to make it suitable for the URL
30+
entry = entry_str.get()
31+
entry = entry.split()
32+
query = '_'.join([i.capitalize() for i in entry])
33+
req = requests.get('https://en.wikipedia.org/wiki/'+query)
34+
35+
# to check for valid URL
36+
if req.status_code == 200:
37+
# for parsing through the html text
38+
soup = BeautifulSoup(req.text, 'html.parser')
39+
40+
# Finding text within infobox and storing it in a dictionary
41+
info_table = soup.find('table', {'class': 'infobox'})
42+
43+
try:
44+
for tr in info_table.find_all('tr'):
45+
try:
46+
if tr.find('th'):
47+
info_dict[tr.find('th').text] = tr.find('td').text
48+
except:
49+
pass
50+
51+
except:
52+
error_box()
53+
54+
# Creating a pop up window to show the results
55+
global popup
56+
popup = Toplevel()
57+
popup.title(query)
58+
59+
r = 1
60+
61+
for k, v in info_dict.items():
62+
e1 = Label(popup, text=k+" : ", bg='cyan4', font=('Arial',10,'bold'))
63+
e1.grid(row=r, column=1, sticky='nsew')
64+
65+
e2 = Label(popup, text=info_dict[k], bg="cyan2", font=('Arial',10, 'bold'))
66+
e2.grid(row=r, column=2, sticky='nsew')
67+
68+
r += 1
69+
e3 = Label(popup, text='', font=('Arial',10,'bold'))
70+
e3.grid(row=r, sticky='s')
71+
r += 1
72+
73+
entry_str.set("")
74+
info_dict = {}
75+
76+
else:
77+
print('Invalid URL')
78+
error_box()
79+
80+
81+
# Creating a window to take user search queries
82+
root = Tk()
83+
root.title('Wikipedia Infobox')
84+
85+
global entry_str
86+
entry_str = StringVar()
87+
88+
search_label = LabelFrame(root, text="Search: ", font = ('Century Schoolbook L',17))
89+
search_label.pack(pady=10, padx=10)
90+
91+
user_entry = Entry(search_label, textvariable = entry_str, font = ('Century Schoolbook L',17))
92+
user_entry.pack(pady=10, padx=10)
93+
94+
button_frame = Frame(root)
95+
button_frame.pack(pady=10)
96+
97+
submit_bt = Button(button_frame, text = 'Submit', command = wikiScraper, font = ('Century Schoolbook L',17))
98+
submit_bt.grid(row=0, column=0)
99+
100+
root.mainloop()
101+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
beautifulsoup4==4.9.3
2+
certifi==2020年12月5日
3+
chardet==4.0.0
4+
idna==2.10
5+
requests==2.25.1
6+
soupsieve==2.2.1
7+
urllib3==1.26.4

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /