Web crawler class

Question 1

I'm using my crawler class in the following manner and I'm beginning to think it's bad practice:

crawler.py

import requests
class Crawler():
 def __init__(self, url):
 self.url = url
 def web_crawler(self):
 requests.get(self.url)
 return requests.text

main.py

for url in urls:
 crawler = Crawler(url)
 results = crawler.web_crawler()

Would it be better to move the url parameter outside of Crawler's __init__ and move it into the web_crawler function? That way the class won't have to be reinitialized multiple times in main.py.

Question 2

return requests.text? Did you, at least, tried to run this code?

Question 3

As the Crawler class just has one method along with __init__, you can avoid a class altogether and write:

def web_crawler(url):
 requests.get(url)
 return requests.text

You now have to initialize exactly 0 times, thus removing the problem from the root:

for url in urls:
 results = web_crawler(url)

The code is also simplified, both in definition and usage.

Question 4

You can also create a field name url, and use getter and setter to obtain/change the value outside of the class.

Question 5

This may be the start of a good review, but in it's current form it doesn't provide much value. Would you care to expand a bit?

Question 6

Would've been a good idea but I don't think it's a good practice. This reminds me of Java.

Question 7

Well I just noted that you can take the variable out and provide mechanism for changing it, this way you will need to change a variable only. Also I must note that you may have problems with multi threading if it isn't implemented properly.

Caridorc Caridorc 28k7 gold badges54 silver badges137 bronze badges · Answer 1 · 2016-02-06 14:10:48Z

As the Crawler class just has one method along with __init__, you can avoid a class altogether and write:

def web_crawler(url):
 requests.get(url)
 return requests.text

You now have to initialize exactly 0 times, thus removing the problem from the root:

for url in urls:
 results = web_crawler(url)

The code is also simplified, both in definition and usage.

Planet_Earth Planet_Earth 2031 silver badge6 bronze badges · Answer 2 · 2016-02-06 14:02:19Z

1

\$\begingroup\$

You can also create a field name url, and use getter and setter to obtain/change the value outside of the class.

Share

answered Feb 6, 2016 at 14:02

Planet_Earth's user avatar

Planet_Earth Planet_Earth

2031 silver badge6 bronze badges

\$\endgroup\$

3

\$\begingroup\$ This may be the start of a good review, but in it's current form it doesn't provide much value. Would you care to expand a bit? \$\endgroup\$

Mast
– Mast ♦

2016年02月06日 14:08:16 +00:00
Commented Feb 6, 2016 at 14:08
3

\$\begingroup\$ Would've been a good idea but I don't think it's a good practice. This reminds me of Java. \$\endgroup\$

Jonathan
– Jonathan

2016年02月06日 14:46:35 +00:00
Commented Feb 6, 2016 at 14:46
\$\begingroup\$ Well I just noted that you can take the variable out and provide mechanism for changing it, this way you will need to change a variable only. Also I must note that you may have problems with multi threading if it isn't implemented properly. \$\endgroup\$

Planet_Earth
– Planet_Earth

2016年02月06日 15:05:12 +00:00
Commented Feb 6, 2016 at 15:05

Add a comment |

Stack Exchange Network

Web crawler class

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Web crawler class

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions