Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 49fa1fd

Browse files
Create Get3LinksOfCards.py
1 parent 2c8dca5 commit 49fa1fd

File tree

1 file changed

+87
-0
lines changed

1 file changed

+87
-0
lines changed

‎Code/Get3LinksOfCards.py

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
#################################################
2+
### 3. GET THE LINKS OF THE CARDS ###
3+
### OF EACH PAGE ###
4+
#################################################
5+
6+
# NOTE: This code takes around 15 mins runtime due to
7+
# the wait time in getCardUrl function which is
8+
# necessary to scrap data from each location card.
9+
10+
# Authors of Code: Noam Shmuel & Lasha Gochiashvili
11+
# Load main packages and libraries
12+
from selenium import webdriver
13+
import pandas as pd
14+
from selenium.webdriver.common.by import By
15+
import time
16+
from selenium.webdriver.support.ui import WebDriverWait
17+
from selenium.webdriver.support import expected_conditions as EC
18+
19+
# Webdriver settings
20+
gecko_path = 'C:/Users/Lasha/anaconda3/geckodriver.exe'
21+
22+
options = webdriver.firefox.options.Options()
23+
options.headless = True
24+
driver = webdriver.Firefox(options = options, executable_path = gecko_path)
25+
26+
driver.implicitly_wait(5)
27+
28+
'''
29+
We created this function to use links of each country page that
30+
we stored in previous stage and scrap the urls for each cards
31+
inside the country page. Then saving the card urls that we will
32+
use at the next stage to access inside card and get the data about
33+
pollution in each location.
34+
'''
35+
def getCardUrl(country, url):
36+
driver.get(url)
37+
local_df = pd.DataFrame(columns=['country','country_url', 'cardURL'])
38+
39+
try:
40+
wait = WebDriverWait(driver, 5)
41+
wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'card__title')))
42+
titles = driver.find_elements_by_css_selector('.card__title [href]')
43+
# scraping urls of location cards by "css_selector"
44+
45+
for title in titles:
46+
time.sleep(1)
47+
try:
48+
card_link = (title.get_attribute('href'))
49+
d = {'country':country, 'country_url':url, 'cardURL':card_link}
50+
local_df = local_df.append(d, ignore_index=True)
51+
# Saving srapped urls into the Data Frame
52+
except:
53+
d = {'country':country, 'country_url':url, 'cardURL':None}
54+
local_df = local_df.append(d, ignore_index=True)
55+
except:
56+
d = {'country':country, 'country_url':url, 'cardURL':None}
57+
local_df = local_df.append(d, ignore_index=True)
58+
59+
return (local_df)
60+
61+
time.sleep(2)
62+
63+
'''
64+
Loading Data Frame of country & country_url that we created
65+
at the previous stage. We will add now scrapped urls of each
66+
location card that we get from each country page.
67+
'''
68+
df = pd.read_csv('2Links_Of_Countries.csv')
69+
df2 = pd.DataFrame(columns=['country','country_url', 'cardURL'])
70+
71+
for index, row in df.iterrows():
72+
myDf = pd.DataFrame(columns=['country','country_url', 'cardURL'])
73+
country_url = (row['country_url'])
74+
country = (row['country'])
75+
myDf = getCardUrl(country, country_url)
76+
df2 = df2.append(myDf, ignore_index=True)
77+
78+
# Printing Data Frame of country, country_url and location card url.
79+
print(df2)
80+
time.sleep(2)
81+
82+
# Saving created Data Frame into .csv file
83+
df2.to_csv('3Links_Of_Cards.csv', index=False, header=True)
84+
85+
# Closing web browser
86+
time.sleep(2)
87+
driver.quit()

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /