Commit 316695d

authored

Add files via upload

0 parents commit 316695dCopy full SHA for 316695d

File tree

+26

-0

lines changed

+26

-0

lines changed

Lines changed: 10 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,10 @@`
	`1`	`+Web Scraping in Python`
	`2`	`+======================`
	`3`	`+`
	`4`	`+extract.py:`
	`5`	`+`
	`6`	`+- This code uses the BeautifulSoup library to extract the links in any webpage.`
	`7`	`+`
	`8`	`+- The user needs to enter the website from where links have to be extracted.`
	`9`	`+`
	`10`	`+- This code uses the "a" tag in the HTML code to help extract all the links that are embedded in the web page.`

Lines changed: 16 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,16 @@`
	`1`	`+# Taken from http://www.pythonforbeginners.com/python-on-the-web/web-scraping-with-beautifulsoup/`
	`2`	`+`
	`3`	`+from bs4 import BeautifulSoup`
	`4`	`+`
	`5`	`+import requests`
	`6`	`+`
	`7`	`+url = raw_input("Enter a website to extract the URL's from: ")`
	`8`	`+`
	`9`	`+r = requests.get("http://" +url)`
	`10`	`+`
	`11`	`+data = r.text`
	`12`	`+`
	`13`	`+soup = BeautifulSoup(data)`
	`14`	`+`
	`15`	`+for link in soup.find_all('a'):`
	`16`	`+ print(link.get('href'))`

Comments

(0)