You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-3Lines changed: 8 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,8 +22,9 @@ To run our example scraper, you are going to need these libraries:
22
22
## List of contents
23
23
24
24
-[Introduction](#introduction)
25
-
-[Be polite](#be_polite)
26
-
-[Let's get to it](#lets_get_to_it)
25
+
-[Be polite](#be-polite)
26
+
-[Let's get to it](#lets-get-to-it)
27
+
-- [Inspecting the site](#inspecting-the-site)
27
28
28
29
## Introduction
29
30
@@ -35,12 +36,16 @@ Just as you are polite and caring in the real world, you should be such online a
35
36
36
37
## Let’s get to it
37
38
38
-
### Inspecting the site
39
39
In the following tutorial, you will not only see how a basic scraper is written but will also learn how to adjust it to your own needs. Moreover, you will learn how to do it via a proxy!
40
40
41
41
As mentioned, we will be using these libraries:
42
42
Requests
43
43
BeautifulSoup 4
44
44
The page we’re going to scrape is http://books.toscrape.com/. It doesn’t have robots.txt, but I think we can agree that the name of the site is asking you to scrape it. But before we carry on with the coding part, let's inspect the website first.
45
45
46
+
### Inspecting the site
47
+
46
48
So, this is what the main page of the website looks like. We can see it contains books, their titles, prices, ratings, availability information, and a list of genres in the sidebar.
0 commit comments