I am using the Python module urllib3
. It occurs to me that there is a better way to design a good class when I rebuild my last site crawler.
class Global:
host = 'http://xxx.org/'
proxy=True
proxyHost='http://127.0.0.1:8087/'
opts=AttrDict(
method='GET',
headers={'Host':'xxxx.org',
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'en-us;q=0.5,en;q=0.3',
'Accept-Encoding':'gzip, deflate',
'Connection':'keep-alive',
'Cookie':'xxxxxxx',
'Cache-Control':'max-age=0'
},
assert_same_host=False
)
def getPool(self,proxy=None):
if proxy is None:
proxy = self.proxy
if(self.proxy):
http_pool = urllib3.proxy_from_url(self.proxyHost)
else:
http_pool = urllib3.connection_from_url(self.host)
return http_pool
class Conn:
def __init__(self, proxy):
self.proxy= proxy
self.pool = Global().getPool(self.proxy)
def swith(self):
self.pool = Global().getPool(not self.proxy)
def get(url, opts=Global.opts):
try:
self.pool.urlopen(
method=opts.method,
url= url,
headers= opts.headers,
assert_same_host=opts.assert_same_host
)
except TimeoutError, e:
print()
except MaxRetryError, e:
# ..
I try to make class contains all global configs and data for others to call, just like I do in JavaScript. But, are these classes too tightly coupled? Should I just merge them together?
2 Answers 2
Class design
A good class in general is one that corresponds to an Abstract Data Type, which is a collection of data and operations that work on that data. A good ADT should have a single, clear responsibility.
The Global
class is not a good ADT:
- It does two unrelated things:
- Contain configuration information
- Manage a proxy pool
- Possibly as a consequence of the previous point, it is poorly named
It would be better to split this class into two:
Configuration
orConfig
: contain configuration informationProxyPoolManager
: manage a proxy pool
Note that ProxyPoolManager
should not reference Configuration
directly:
it will be best to pass to it the proxyHost
and proxy
as constructor parameters.
As the poorly named Conn
class already does half of the proxy pool management,
it would be better to rename it to ProxyPoolManager
and move getPool
method into it and rename Global
to Configuration
.
Coding style
You have several coding style violations, not conforming to PEP8:
- Use spaces around
=
in variable assignments, for example:proxy = True
instead ofproxy=True
opts = AttrDict(...)
instead ofopts=AttrDict(...)
- No need to put parentheses in
if(self.proxy):
- There should be one blank line in front of method definitions of a class: put a blank line in front of
get
andswitch
methods in theConn
class - You mistyped
switch
asswith
Other issues
This code doesn't compile:
class Conn: def get(url, opts=Global.opts): try: self.pool.urlopen(...)
Because it makes a reference to self
which was not passed in as method parameter.
This is a bit hard to understand, because of reusing the name "proxy" both as method parameter and as attribute:
def getPool(self,proxy=None): if proxy is None: proxy = self.proxy if(self.proxy): http_pool = urllib3.proxy_from_url(self.proxyHost) else: http_pool = urllib3.connection_from_url(self.host) return http_pool
The purpose of this method would be more clear if you used a different name for the method parameter.
getPool()
doesn't use the proxy
argument. The method starts by using the class value to set a default value. However, the if statement just uses the class value and ignores the argument it just assigned a default value to.
Conn
? If not, it probably belongs inConn
. If you're worried about creating a separate copy of all of these constants in everyConn
instance, you can make them class variables instance of instance variables. (And you may want to makegetPool
a@classmethod
, too.) \$\endgroup\$