I am working on a project right now, where we need to search different APIs, then clean & persists data to database and finally pushing to Elasticsearch.
The high level code looks like this:
class BaseParser(object):
def __init__(self, source):
self.url = source.url
self.api_key = source.api_key
def _prepare_auth_header(self):
headers = {
"Authorization": "Basic {}".format(self.api_key)
}
return headers
def get_response(self):
raise NotImplementedError()
class UnitedexchangeParser(BaseParser):
def get_response(self):
headers = self._prepare_auth_header()
response = requests.get(self.url, headers=headers)
return response.json()
class DubaiParser(BaseParser):
def get_response(self):
headers = self._prepare_auth_header()
response = requests.get(self.url, headers=headers)
return response.json()["data"]
class ParserFactory(object):
def parse(self, source):
if source.name == "Unitedexchange":
parser = Unitedexchange(source)
elif source.name == "Dubai":
parser = DubaiParser(source)
return parser.get_response()
The application is working fine, but we are adding more and more new sources which leads to change ParserFactory
with same block of code.
How can I improve the layout and design such a way so we do minimal change for adding new parser? Performance is also an issue, and right now we are loading lots of individual classes which basically doing the same thing. It would be great if you can give some hint on that, too.
-
\$\begingroup\$ @Graipher headers missing was intentional, rather than going through implementation details, I wanted to focus on class design. I updated the code. Would be great if you can provide some feedback. \$\endgroup\$Alex benz– Alex benz2016年11月26日 20:57:08 +00:00Commented Nov 26, 2016 at 20:57
2 Answers 2
A few small comments:
- When not supplying any argument to an exception, the
()
can be skipped:raise NotImplementedError
. - I'm assuming
ParseFactory
is actually longer, otherwise there would be no need for it to be a class. Aparse
function on its own would be perfectly fine. - It would be better to make a dictionary of parsers, especially if the number of parsers grows:
class ParserFactory(object):
parsers = {"Unitedexchange": UnitedexchangeParser,
"Dubai": DubaiParser}
def parse(self, source):
parser = ParserFactory.parsers[source.name]
return parser.get_response()
This does not solve the problem of having to implement every parser.
If they are sufficiently similar, namely almost all have the same get_response
, you could just put a standard implementation in BaseParser
and use
class DubaiParser(BaseParser):
pass
Or, you could use type
to create those classes that do not need any modifications dynamically and put them in a dictionary:
parsers = {"DubaiParser": None, "OtherParser": None, ...}
for name in parsers:
parser = type(name, (BaseParser,), {})
parser.__module__ = __name__
parsers[name] = parser
There is even the possibility to override some methods, but this only makes sense if there are two different implementations of get_response
def get_response_json(self):
headers = self._prepare_auth_header()
response = requests.get(self.url, headers=headers)
return response.json()["data"]
other_methods = {"get_response": get_response_json}
parsers = {"DubaiParser": None, "OtherParser": None, ...}
different_get_response = {"OtherParser", ...}
for name in parsers:
if name in different_get_response:
parser = type(name, (BaseParser,), {}, lambda ns: ns.update(other_methods)
else:
parser = type(name, (BaseParser,), {})
parser.__module__ = __name__
parsers[name] = parser
-
\$\begingroup\$ Interesting, I also had this idea to keep the parsers name in a dictionary. But it does not directly solve the issue of "implementing" a new parser for every new Source. \$\endgroup\$Alex benz– Alex benz2016年11月26日 21:21:20 +00:00Commented Nov 26, 2016 at 21:21
-
\$\begingroup\$ @Alexbenz Have a look at my updated answer for some possible code to automatically generate those parsers. \$\endgroup\$Graipher– Graipher2016年11月26日 21:44:16 +00:00Commented Nov 26, 2016 at 21:44
The code you've posted does not look like some real code example so I don't see much of reasons to comment it.
To answer your question how to register parsers automatically you can use either metaclasses
or decorators
for it. I would not go for metaclass
while I find it interesting to use, this makes you code hard to understand so I will show you how to do it using decorators
.
PARSERS_MAP = {}
def register_parser(inner):
PARSERS_MAP[inner.__name__] = inner
return inner
@register_parser
class FooParser:
pass
class ParserFactory(object):
def parse(self, source):
parser = PARSERS_MAP[source.name]
return parser(source).get_response()
As for your second question about performance, to be honest, I didn't get it. please show the code you are having problems with.
-
1\$\begingroup\$ If you think the question is hypothetical (thus off-topic), you can flag it as such. If you don't understand (parts of) it, you can flag it as unclear what you're asking. Either way you can leave a comment for clarification. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年11月27日 11:02:03 +00:00Commented Nov 27, 2016 at 11:02
Explore related questions
See similar questions with these tags.