Commit 9f56e71

authored

Add files via upload

1 parent 8b02525 commit 9f56e71Copy full SHA for 9f56e71

File tree

+99

-0

lines changed

+99

-0

lines changed

Lines changed: 29 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,29 @@`
	`1`	`+heading_number,heading_text`
	`2`	`+1,History`
	`3`	`+2,Design philosophy and features`
	`4`	`+3,Syntax and semantics`
	`5`	`+3.1,Indentation`
	`6`	`+3.2,Statements and control flow`
	`7`	`+3.3,Expressions`
	`8`	`+3.4,Methods`
	`9`	`+3.5,Typing`
	`10`	`+3.6,Arithmetic operations`
	`11`	`+4,Programming examples`
	`12`	`+5,Libraries`
	`13`	`+6,Development environments`
	`14`	`+7,Implementations`
	`15`	`+7.1,Reference implementation`
	`16`	`+7.2,Other implementations`
	`17`	`+7.3,Unsupported implementations`
	`18`	`+7.4,Cross-compilers to other languages`
	`19`	`+7.5,Performance`
	`20`	`+8,Development`
	`21`	`+9,API documentation generators`
	`22`	`+10,Naming`
	`23`	`+11,Uses`
	`24`	`+12,Languages influenced by Python`
	`25`	`+13,See also`
	`26`	`+14,References`
	`27`	`+14.1,Sources`
	`28`	`+15,Further reading`
	`29`	`+16,External links`

Lines changed: 20 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,20 @@`
	`1`	`+heading_number,heading_text`
	`2`	`+1,History`
	`3`	`+2,Techniques`
	`4`	`+2.1,Human copy-and-paste`
	`5`	`+2.2,Text pattern matching`
	`6`	`+2.3,HTTP programming`
	`7`	`+2.4,HTML parsing`
	`8`	`+2.5,DOM parsing`
	`9`	`+2.6,Vertical aggregation`
	`10`	`+2.7,Semantic annotation recognizing`
	`11`	`+2.8,Computer vision web-page analysis`
	`12`	`+3,Software`
	`13`	`+4,Legal issues`
	`14`	`+4.1,United States`
	`15`	`+4.2,The EU`
	`16`	`+4.3,Australia`
	`17`	`+4.4,India`
	`18`	`+5,Methods to prevent web scraping`
	`19`	`+6,See also`
	`20`	`+7,References`

Lines changed: 5 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,5 @@`
	`1`	`+import requests`
	`2`	`+from bs4 import BeautifulSoup`
	`3`	`+response = requests.get("https://en.wikipedia.org/wiki/Web_scraping")`
	`4`	`+bs = BeautifulSoup(response.text, "lxml")`
	`5`	`+print(bs.find("p").text)`

Lines changed: 45 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,45 @@`
	`1`	`+import csv`
	`2`	`+import requests`
	`3`	`+from bs4 import BeautifulSoup`
	`4`	`+import requests`
	`5`	`+`
	`6`	`+`
	`7`	`+def get_data(url):`
	`8`	`+ response = requests.get(url)`
	`9`	`+ soup = BeautifulSoup(response.text, 'lxml')`
	`10`	`+ table_of_contents = soup.find("div", id="toc")`
	`11`	`+ headings = table_of_contents.find_all("li")`
	`12`	`+ data = []`
	`13`	`+ for heading in headings:`
	`14`	`+ heading_text = heading.find("span", class_="toctext").text`
	`15`	`+ heading_number = heading.find("span", class_="tocnumber").text`
	`16`	`+ data.append({`
	`17`	`+ 'heading_number': heading_number,`
	`18`	`+ 'heading_text': heading_text,`
	`19`	`+ })`
	`20`	`+ return data`
	`21`	`+`
	`22`	`+`
	`23`	`+def export_data(data, file_name):`
	`24`	`+ with open(file_name, "w", newline="") as file:`
	`25`	`+ writer = csv.DictWriter(file, fieldnames=['heading_number', 'heading_text'])`
	`26`	`+ writer.writeheader()`
	`27`	`+ writer.writerows(data)`
	`28`	`+`
	`29`	`+`
	`30`	`+def main():`
	`31`	`+ url_to_parse = "https://en.wikipedia.org/wiki/Python_(programming_language)"`
	`32`	`+ file_name = "python_toc.csv"`
	`33`	`+ data = get_data(url_to_parse)`
	`34`	`+ export_data(data, file_name)`
	`35`	`+`
	`36`	`+ url_to_parse = "https://en.wikipedia.org/wiki/Web_scraping"`
	`37`	`+ file_name = "web_scraping_toc.csv"`
	`38`	`+ data = get_data(url_to_parse)`
	`39`	`+ export_data(data, file_name)`
	`40`	`+`
	`41`	`+ print('Done')`
	`42`	`+`
	`43`	`+`
	`44`	`+if __name__ == '__main__':`
	`45`	`+ main()`

Comments

(0)