Commit 49fa1fd

authored

Create Get3LinksOfCards.py

1 parent 2c8dca5 commit 49fa1fdCopy full SHA for 49fa1fd

File tree

1 file changed

+87

-0

lines changed

Code
- Get3LinksOfCards.py

1 file changed

+87

-0

lines changed

`‎Code/Get3LinksOfCards.py`

Lines changed: 87 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,87 @@`
	`1`	`+#################################################`
	`2`	`+### 3. GET THE LINKS OF THE CARDS ###`
	`3`	`+### OF EACH PAGE ###`
	`4`	`+#################################################`
	`5`	`+`
	`6`	`+# NOTE: This code takes around 15 mins runtime due to`
	`7`	`+# the wait time in getCardUrl function which is`
	`8`	`+# necessary to scrap data from each location card.`
	`9`	`+`
	`10`	`+# Authors of Code: Noam Shmuel & Lasha Gochiashvili`
	`11`	`+# Load main packages and libraries`
	`12`	`+from selenium import webdriver`
	`13`	`+import pandas as pd`
	`14`	`+from selenium.webdriver.common.by import By`
	`15`	`+import time`
	`16`	`+from selenium.webdriver.support.ui import WebDriverWait`
	`17`	`+from selenium.webdriver.support import expected_conditions as EC`
	`18`	`+`
	`19`	`+# Webdriver settings`
	`20`	`+gecko_path = 'C:/Users/Lasha/anaconda3/geckodriver.exe'`
	`21`	`+`
	`22`	`+options = webdriver.firefox.options.Options()`
	`23`	`+options.headless = True`
	`24`	`+driver = webdriver.Firefox(options = options, executable_path = gecko_path)`
	`25`	`+`
	`26`	`+driver.implicitly_wait(5)`
	`27`	`+`
	`28`	`+'''`
	`29`	`+We created this function to use links of each country page that`
	`30`	`+we stored in previous stage and scrap the urls for each cards`
	`31`	`+inside the country page. Then saving the card urls that we will`
	`32`	`+use at the next stage to access inside card and get the data about`
	`33`	`+pollution in each location.`
	`34`	`+'''`
	`35`	`+def getCardUrl(country, url):`
	`36`	`+ driver.get(url)`
	`37`	`+ local_df = pd.DataFrame(columns=['country','country_url', 'cardURL'])`
	`38`	`+`
	`39`	`+ try:`
	`40`	`+ wait = WebDriverWait(driver, 5)`
	`41`	`+ wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'card__title')))`
	`42`	`+ titles = driver.find_elements_by_css_selector('.card__title [href]')`
	`43`	`+ # scraping urls of location cards by "css_selector"`
	`44`	`+`
	`45`	`+ for title in titles:`
	`46`	`+ time.sleep(1)`
	`47`	`+ try:`
	`48`	`+ card_link = (title.get_attribute('href'))`
	`49`	`+ d = {'country':country, 'country_url':url, 'cardURL':card_link}`
	`50`	`+ local_df = local_df.append(d, ignore_index=True)`
	`51`	`+ # Saving srapped urls into the Data Frame`
	`52`	`+ except:`
	`53`	`+ d = {'country':country, 'country_url':url, 'cardURL':None}`
	`54`	`+ local_df = local_df.append(d, ignore_index=True)`
	`55`	`+ except:`
	`56`	`+ d = {'country':country, 'country_url':url, 'cardURL':None}`
	`57`	`+ local_df = local_df.append(d, ignore_index=True)`
	`58`	`+`
	`59`	`+ return (local_df)`
	`60`	`+`
	`61`	`+time.sleep(2)`
	`62`	`+`
	`63`	`+'''`
	`64`	`+Loading Data Frame of country & country_url that we created`
	`65`	`+at the previous stage. We will add now scrapped urls of each`
	`66`	`+location card that we get from each country page.`
	`67`	`+'''`
	`68`	`+df = pd.read_csv('2Links_Of_Countries.csv')`
	`69`	`+df2 = pd.DataFrame(columns=['country','country_url', 'cardURL'])`
	`70`	`+`
	`71`	`+for index, row in df.iterrows():`
	`72`	`+ myDf = pd.DataFrame(columns=['country','country_url', 'cardURL'])`
	`73`	`+ country_url = (row['country_url'])`
	`74`	`+ country = (row['country'])`
	`75`	`+ myDf = getCardUrl(country, country_url)`
	`76`	`+ df2 = df2.append(myDf, ignore_index=True)`
	`77`	`+`
	`78`	`+# Printing Data Frame of country, country_url and location card url.`
	`79`	`+print(df2)`
	`80`	`+time.sleep(2)`
	`81`	`+`
	`82`	`+# Saving created Data Frame into .csv file`
	`83`	`+df2.to_csv('3Links_Of_Cards.csv', index=False, header=True)`
	`84`	`+`
	`85`	`+# Closing web browser`
	`86`	`+time.sleep(2)`
	`87`	`+driver.quit()`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 49fa1fd

File tree

1 file changed

1 file changed

`‎Code/Get3LinksOfCards.py`

0 commit comments