I wish to make a requests with the Python requests module. I have a large database of urls I wish to download. the urls are in the database of the form page.be/something/something.html
I get a lot of ConnectionError's. If I search the URL in my browser, the page exists.
My Code:
if not webpage.url.startswith('http://www.'):
new_html = requests.get(webpage.url, verify=True, timeout=10).text
An example of a page I'm trying to download is carlier.be/categorie/jobs.html. This gives me a ConnectionError, logged as below:
Connection error, Webpage not available for "carlier.be/categorie/jobs.html" with webpage_id "229998"
What seems to be the problem here? Why can't requests make the connection, while I can find the page in the browser?
1 Answer 1
The Requests library requires that you supply a schema for it to connect with (the 'http://' part of the url). Make sure that every url has http:// or https:// in front of it. You may want a try/except block where you catch a requests.exceptions.MissingSchema and try again with "http://" prepended to the url.
2 Comments
Explore related questions
See similar questions with these tags.