Python - Requests module HTTP and HTTPS requests

Question 1

I wish to make a requests with the Python requests module. I have a large database of urls I wish to download. the urls are in the database of the form page.be/something/something.html

I get a lot of ConnectionError's. If I search the URL in my browser, the page exists.

My Code:

if not webpage.url.startswith('http://www.'):
 new_html = requests.get(webpage.url, verify=True, timeout=10).text

An example of a page I'm trying to download is carlier.be/categorie/jobs.html. This gives me a ConnectionError, logged as below:

Connection error, Webpage not available for "carlier.be/categorie/jobs.html" with webpage_id "229998"

What seems to be the problem here? Why can't requests make the connection, while I can find the page in the browser?

Question 2

The Requests library requires that you supply a schema for it to connect with (the 'http://' part of the url). Make sure that every url has http:// or https:// in front of it. You may want a try/except block where you catch a requests.exceptions.MissingSchema and try again with "http://" prepended to the url.

Question 3

So what would be a good code snippet for trying with http and https? catching ConnectionError and retrying with https doesn't seem like the good way to do it..

Question 4

@SandervanDorsten I would process the url string before attempting to make the request. Are all of the urls supposed to go over http[s]? If that's the case, then you could even just check the first 4 characters of the string and if they're not http then prepend http:// or https:// to the url before making the request. The other obvious answer is a regular expression to determine if there is a format specifier in the string.

Danny Dyla 7014 silver badges15 bronze badges · Accepted Answer · 2016-02-09 13:29:51Z

2

The Requests library requires that you supply a schema for it to connect with (the 'http://' part of the url). Make sure that every url has http:// or https:// in front of it. You may want a try/except block where you catch a requests.exceptions.MissingSchema and try again with "http://" prepended to the url.

Share

Improve this answer

answered Feb 9, 2016 at 13:29

Danny Dyla's user avatar

Danny Dyla

7014 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sander van Dorsten

Sander van Dorsten Over a year ago

So what would be a good code snippet for trying with http and https? catching ConnectionError and retrying with https doesn't seem like the good way to do it..

2016年02月09日T14:21:11.347Z+00:00

Danny Dyla

Danny Dyla Over a year ago

@SandervanDorsten I would process the url string before attempting to make the request. Are all of the urls supposed to go over http[s]? If that's the case, then you could even just check the first 4 characters of the string and if they're not http then prepend http:// or https:// to the url before making the request. The other obvious answer is a regular expression to determine if there is a format specifier in the string.

2016年02月10日T03:22:08.317Z+00:00

CollectivesTM on Stack Overflow

Python - Requests module HTTP and HTTPS requests

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related