Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 9f56e71

Browse files
Add files via upload
1 parent 8b02525 commit 9f56e71

File tree

4 files changed

+99
-0
lines changed

4 files changed

+99
-0
lines changed

‎python_toc.csv

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
heading_number,heading_text
2+
1,History
3+
2,Design philosophy and features
4+
3,Syntax and semantics
5+
3.1,Indentation
6+
3.2,Statements and control flow
7+
3.3,Expressions
8+
3.4,Methods
9+
3.5,Typing
10+
3.6,Arithmetic operations
11+
4,Programming examples
12+
5,Libraries
13+
6,Development environments
14+
7,Implementations
15+
7.1,Reference implementation
16+
7.2,Other implementations
17+
7.3,Unsupported implementations
18+
7.4,Cross-compilers to other languages
19+
7.5,Performance
20+
8,Development
21+
9,API documentation generators
22+
10,Naming
23+
11,Uses
24+
12,Languages influenced by Python
25+
13,See also
26+
14,References
27+
14.1,Sources
28+
15,Further reading
29+
16,External links

‎web_scraping_toc.csv

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
heading_number,heading_text
2+
1,History
3+
2,Techniques
4+
2.1,Human copy-and-paste
5+
2.2,Text pattern matching
6+
2.3,HTTP programming
7+
2.4,HTML parsing
8+
2.5,DOM parsing
9+
2.6,Vertical aggregation
10+
2.7,Semantic annotation recognizing
11+
2.8,Computer vision web-page analysis
12+
3,Software
13+
4,Legal issues
14+
4.1,United States
15+
4.2,The EU
16+
4.3,Australia
17+
4.4,India
18+
5,Methods to prevent web scraping
19+
6,See also
20+
7,References

‎webscraping_5lines.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
response = requests.get("https://en.wikipedia.org/wiki/Web_scraping")
4+
bs = BeautifulSoup(response.text, "lxml")
5+
print(bs.find("p").text)

‎wiki_toc.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import csv
2+
import requests
3+
from bs4 import BeautifulSoup
4+
import requests
5+
6+
7+
def get_data(url):
8+
response = requests.get(url)
9+
soup = BeautifulSoup(response.text, 'lxml')
10+
table_of_contents = soup.find("div", id="toc")
11+
headings = table_of_contents.find_all("li")
12+
data = []
13+
for heading in headings:
14+
heading_text = heading.find("span", class_="toctext").text
15+
heading_number = heading.find("span", class_="tocnumber").text
16+
data.append({
17+
'heading_number': heading_number,
18+
'heading_text': heading_text,
19+
})
20+
return data
21+
22+
23+
def export_data(data, file_name):
24+
with open(file_name, "w", newline="") as file:
25+
writer = csv.DictWriter(file, fieldnames=['heading_number', 'heading_text'])
26+
writer.writeheader()
27+
writer.writerows(data)
28+
29+
30+
def main():
31+
url_to_parse = "https://en.wikipedia.org/wiki/Python_(programming_language)"
32+
file_name = "python_toc.csv"
33+
data = get_data(url_to_parse)
34+
export_data(data, file_name)
35+
36+
url_to_parse = "https://en.wikipedia.org/wiki/Web_scraping"
37+
file_name = "web_scraping_toc.csv"
38+
data = get_data(url_to_parse)
39+
export_data(data, file_name)
40+
41+
print('Done')
42+
43+
44+
if __name__ == '__main__':
45+
main()

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /