I'm working as a system administrator (sort of), but learning Python on the side. Life gave me a chance to apply that knowledge, since our organization is being provided cloud email service, and it has a pretty terrible GUI (without even an option to download a list of user accounts), so the only option to do some things is by using their API (which is also not very advanced). I happily jumped on the chance to actually write something useful in Python and wrote the following script, that makes a request to their "users" endpoint, parses JSON for relevant fields and writes it all to a csv.
It's a very simple script that does one thing, it works, and I'm the only person that has to use it, so I'm my only critic, still, I tried to make it as self-describing as possible. But I couldn't help but wondering what would my colleagues say if I were a software developer working in a team and they had to use it, and if the result bears any resemblance to "production-worthy" code and what could be improved in terms of its performance, security, readability or extensibility.
So, I'd very much appreciate it if someone could glance over it and maybe give any advice. Thanks!
import csv
import os
import sys
from collections import namedtuple
from typing import Iterable, List, Optional
from requests_cache import CachedSession
from prettytable import PrettyTable
from constants import (
SAVE_DIR, USERS_INPUT_FIELDS, USERS_OUTPUT_FIELDS,
CACHE_EXPIRATION, USERS_URL, USER_STR_TEMPLATE,
USERS_PER_PAGE, TOKEN)
OUTPUT_FILE_NAME = 'user.csv'
OUTPUT_FILE_PATH = os.path.join(SAVE_DIR, 'users.csv')
class User(namedtuple('User', USERS_INPUT_FIELDS)):
def __new__(cls, *args, **kwargs) -> 'User':
kwargs = { k: v for k, v in kwargs.items() if k in cls._fields }
return super().__new__(cls, *args, **kwargs)
@property
def first_name(self) -> Optional[str]:
if self.name:
return self.name.get('first')
return None
@property
def last_name(self) -> Optional[str]:
if self.name:
return self.name.get('last')
return None
def __repr__(self) -> str:
return str(self._asdict())
def __str__(self) -> str:
return USER_STR_TEMPLATE.format(
class_name=self.__class__.__name__,
fields=(self.id, self.nickname))
def get_users(session: CachedSession) -> Iterable[User]:
headers = {
'Authorization': f'OAuth {TOKEN}',
'Content-Type': 'text/plain; charset=utf-8'
}
params = {
'perPage': USERS_PER_PAGE
}
response = session.get(
url=USERS_URL,
headers=headers,
params=params)
response.raise_for_status()
if 'users' not in response.json():
raise KeyError('Couldn\'t extract users from server response')
for json_object in response.json()['users']:
yield User(**json_object)
def user_to_row(user: User) -> Iterable[List[str]]:
yield [getattr(user, field) for field in USERS_OUTPUT_FIELDS]
def get_user_rows(session: CachedSession) -> Iterable[List[str]]:
for user in get_users(session):
yield from user_to_row(user)
def write_to_csv(rows) -> None:
with open(OUTPUT_FILE_PATH, 'wt', encoding='utf8') as file:
writer = csv.writer(
file,
dialect=csv.unix_dialect,
quoting=csv.QUOTE_MINIMAL)
writer.writerows(
[USERS_OUTPUT_FIELDS, *rows])
def print_users_table(rows) -> None:
users_table = PrettyTable()
users_table.field_names = USERS_OUTPUT_FIELDS
users_table.add_rows(rows)
print(users_table)
if __name__ == '__main__':
with CachedSession(expire_after=CACHE_EXPIRATION) as session:
rows = list(get_user_rows(session))
write_to_csv(rows)
if '-v' in sys.argv:
print_users_table(rows)
-
\$\begingroup\$ What sub-version of Python 3 are you using? \$\endgroup\$Reinderien– Reinderien2023年02月20日 15:37:58 +00:00Commented Feb 20, 2023 at 15:37
-
\$\begingroup\$ Hi there! It's Python 3.9. \$\endgroup\$Gavin Greenhorn– Gavin Greenhorn2023年02月20日 16:12:46 +00:00Commented Feb 20, 2023 at 16:12
2 Answers 2
This is pretty good!
For the purpose of your job, I would suggest leaving this alone unless or until you decide it's broken in some way. It's certainly "good enough".
As a learning experience, there are some things I might raise in code review, but I have to preface this by saying that I'm not experienced with CachedSessions
or PrettyTable
.
- The indirection of importing the structure of the
User
class fromconstants
is neat, but it's not good. I assume that flexibility is an important feature of the script, in which case any solution will be as ugly as what you're doing withnamedtuple
. - There are still some issues with the
User
class though. What guarantee do you have thatid
name
ornickname
will have been included inUSERS_INPUT_FIELDS
? Why are you comfortable leavingname
as a dictionary (or, conversely, why bother makingUser
a convoluted class if a dictionary is ok)? - I'm assuming
constants
is a file full of settings you expect to change often, and may maintain different copies of for different clients. (If not, then just pull all those constants into this same file.) I'm not fond of implementing config settings as importable code; I've had co-workers start adding complex (or even side-effect-ful!) code in settings files, and other solutions are actually more flexible. I think my suggestion would have been to implement all the settings as CLI arguments, set up parsing/validation/defaults withargparse
, and then finally allow loading of settings from a file using fromfile_prefix_chars. This would probably also require that these values get passed around as function arguments instead of file-level globals, which is good. OUTPUT_FILE_NAME
isn't used, you just hard-coded the same string on the next line. At the same time, why isn't this configurable like the other constants?- For typing: A return type is a promise, so don't promise more than you need to. Specifically, annotating something as a
List
is an invitation for the consumer to mutate it. (In the case of an argument annotation, it would mean the function was insisting on the right to mutate its arg.) UseSequence
to denote an indexable re-usable iterable, orIterable
if you want to be really conservative. Same logic applies forMapping
vsdict
, and usually applies forAbstractSet
vsSet
. - Consider using
csv.DictWriter
. You can judge from the doc and example if it's a good fit; I think it usually is. - You've written parts of this in "streaming" style (using iterators/generators instead of just lists). That's usually good! But then you break it when you get to
main
, I assume because you want to userows
twice. I suggest thinking about it in terms of failure.- Right now there doesn't seem to be a strategy for what should happen if something goes wrong, but you're pretty close to one sensible strategy: If something goes wrong nothing happens and an error is thrown. In that paradigm I think the only things I would change would be moving the
print_users_table
beforewrite_to_csv
, and possibly adding a"Success!"
message. It would also be possible to build the entire CSV as astr
and write it to the file at the very end, but that might be more trouble than it's worth. - Two other strategies would be: Records are written to the file streaming-style until an error occurs,
- or records are written streaming-style with an effort to log errors and keep going if an error occurs. In a lot of contexts that last option is best; it'd just require refactoring most of your functions!
- Right now there doesn't seem to be a strategy for what should happen if something goes wrong, but you're pretty close to one sensible strategy: If something goes wrong nothing happens and an error is thrown. In that paradigm I think the only things I would change would be moving the
-
\$\begingroup\$ Thanks a lot for taking time to respond, I'll try to implement what you suggested. It would be a nice option to use csv.DictWriter, i guess, if I could just pass User._asdict() to it, but since it has a nested dictionary in the .name field, I can't just do that without implementing additional logic, so I just gone for a simpler option. Although it's certainly something to think about, some uniform way to flatten all nested JSON object fields upon initialization of the class. Right now I can't come up with such a solution. The class itself was built around the idea of getting nested fields. \$\endgroup\$Gavin Greenhorn– Gavin Greenhorn2023年02月20日 17:44:24 +00:00Commented Feb 20, 2023 at 17:44
Consider making SAVE_DIR
a pathlib.Path
so that you can write
OUTPUT_FILE_PATH = SAVE_DIR / 'users.csv'
Consider changing User
to use the new NamedTuple
superclass, and move its fields from USERS_INPUT_FIELDS
to member declarations:
class User(NamedTuple):
field1: str
field2: int
...
# remove __new__()
This header seems misplaced:
'Content-Type': 'text/plain; charset=utf-8'
because you don't actually send any body. More sensible would be to ask for a content-type via Accept
. It seems like you would ask for application/json
.
Don't call response.json()
twice. Put it in a variable.
Consider simplifying the quotes of 'Couldn\'t extract users from server response'
with
"Couldn't extract users from server response"
Add a typehint for the argument in write_to_csv(rows)
and print_users_table(rows)
.
It's good that you have a __main__
guard but it isn't enough. The symbols session
, rows
will be global. Move that code to a function.
-
\$\begingroup\$ Thanks for taking time to respond! Could you please expand on what are the benefits of using a NamedTuple class instead of a factory function? I decided to use the latter simply because it gives me an option to change the set of fields that I want extracted from a JSON object (get 'age', 'gender', 'job title' etc, when there is a need for them) pretty quickly and without ever touching existing implementation of my class (that's the idea, at least... probably a naive one). Thanks for other suggestions, I'll try to implement a main() function to shield variables from exposure to global scope. \$\endgroup\$Gavin Greenhorn– Gavin Greenhorn2023年02月20日 17:51:02 +00:00Commented Feb 20, 2023 at 17:51
-
1\$\begingroup\$
NamedTuple
is simpler, more self-contained and has much better type safety. Being able to "change the set of fields that you want extracted without ever touching the implementation of your class" seems like a solution in need of a problem. It's adding complexity where that isn't warranted. In both cases you would need to modify code. \$\endgroup\$Reinderien– Reinderien2023年02月20日 18:49:21 +00:00Commented Feb 20, 2023 at 18:49 -
\$\begingroup\$ There actually is a problem that warrants a solution like that, which is a boss saying "Give me a list of employees from department X wearing red shoes, you have 5 minutes, go". :) But I hope that I understand your point about type safety and unnecessary complexity. It's just that I'm willing to compromise for the speed payoff in that very specific case. If that was an actual production application, I wouldn't. Thank you for your perspective. \$\endgroup\$Gavin Greenhorn– Gavin Greenhorn2023年02月20日 19:11:43 +00:00Commented Feb 20, 2023 at 19:11