This is the entire code segment. All the functions for file operations are working perfectly. I just need a professional to help me clarify that am following good standards, documentation, etc., because I am going to start doing professional work very soon and I need my code to be at a professional level.
import os
class FileHandle:
def __init__(self, file_path):
self.file_path = file_path
def file_exists(self) -> bool:
"""Determine if a file exists"""
if os.path.exists(self.file_path):
return True
return False
def read_string(self) -> str:
"""Read the string format of a file"""
with open(self.file_path, "r") as current_file_handle:
return current_file_handle.read()
def read_bytes(self) -> bytes:
"""Read the byte format of a file"""
with open(self.file_path, "rb") as current_file_handle:
return current_file_handle.read()
def read_string_lines(self) -> list:
"""Get a list of lines in string format from file"""
with open(self.file_path, "r") as current_file_handle:
return current_file_handle.readlines()
def read_byte_lines(self) -> list:
"""Get a list of lines in byte format from file"""
with open(self.file_path, "rb") as current_file_handle:
return current_file_handle.readlines()
def write_string(self, string: str) -> int:
"""Overwrite the file with a string"""
with open(self.file_path, "w") as current_file_handle:
current_file_handle.write(string)
return 0
def write_bytes(self, string: bytes) -> int:
"""Overwrite the file with bytes"""
with open(self.file_path, "wb") as current_file_handle:
current_file_handle.write(string)
return 0
def append_string(self, string: str) -> int:
"""Add a string to a file"""
with open(self.file_path, "a") as current_file_handle:
current_file_handle.write(string)
return 0
def append_bytes(self, string: bytes) -> int:
"""Add a byte string to a file"""
with open(self.file_path, "ab") as current_file_handle:
current_file_handle.write(string)
return 0
def copy(self, dist: str) -> int:
"""Copy the file to the destination"""
os.system(f'copy "{self.file_path}" {dist}')
if os.path.exists(dist):
return 0
return 1
def move(self, dist: str):
"""Move the file to the destination"""
os.system(f'move "{self.file_path}" {dist}')
if os.path.exists(dist):
return 0
return 1
def delete(self) -> int:
"""Delete the file"""
os.remove(self.file_path)
if self.file_exists():
return 1
return 0
def add_attribute(self, attribute: str):
"""Add attributes to the file
Possible attributes:
1.H - for hidden file
2.A - for archive file
3.R - for readonly file
4.S - for system file
"""
os.system(f"attrib +{attribute.upper()} {self.file_path}")
def remove_attribute(self, attribute: str):
"""Removes attributes from the file
Possible attributes:
1.H - for hidden file
2.A - for archive file
3.R - for readonly file
4.S - for system file
"""
os.system(f"attrib -{attribute.upper()} {self.file_path}")
def rename(self, new_name: str) -> int:
"""Renames the file"""
os.system(f"ren {self.file_path} {new_name}")
if not os.path.exists(new_name):
return 1
return 0
6 Answers 6
I'm about to say a lot of fairly unkind-sounding things. I'll say upfront, though, that this is generally good, simple, well-documented code. Unfortunately, it makes a pretty serious mistake that I think you would greatly benefit from understanding. If it were to be presented to me in a professional setting, I'd block any pull request including it as hard as I possibly could.
This code is dangerous
Suppose another developer were to take your class and use it like this:
fh = FileHandle(some_user_input_file)
fh.move(f"backup/{some_user_input_file}")
They'd expect that this would cause the file input by the user to be moved to the backup directory. All fine.
Now, I can't test this out myself right now so take it with a little grain of salt, but I bet that if a user were to input the filename a" b & calc & echo "
, something quite surprising would happen. Take a look at this code:
some_user_input_file = """ a" b & calc & echo " """
fh = FileHandle(some_user_input_file)
fh.move(f"backup/{some_user_input_file}")
Think for a bit about what you expect to happen before running. Stop reading this answer and run it before you carry on.
You should have seen the Windows calculator pop up. If you didn't, I've screwed up that filename somewhere, which makes my point a little weaker, but fairly modest tweaks would still cause this vulnerability. In brief, your code would ultimately execute:
os.system(f'move " a" b & calc & echo " " backup/...')
This shell command will move some random file to somewhere else, then it'll run calc
, then it'll echo some stuff to the terminal. Now suppose the command wasn't calc
and was instead some command that wiped your hard drive, or downloaded some ransomware, or some other horrid thing.
I don't think anyone using a FileHandle
class would expect it to be at all capable of doing that under any circumstances. If any developer were to use this class, they'd have to know that, if they failed to properly validate user data, your class could start executing arbitrary code.
This sort of surprising behaviour is exactly what caused the absolutely terrifying log4shell vulnerability - a logging library was able to execute arbitrary code, developers quite reasonably weren't expecting this so didn't validate the input themselves, and now millions of systems are vulnerable.
I can come up with three mitigations, in descending order of preference:
Remove your reliance on scary functions like os.system
You almost never need to execute shell commands - shell commands are usually just wrappers for much simpler and cleaner APIs. If you're interfacing with some third-party application, just type "name of application python API" into your favourite search engine and use that where possible. It'll be much easier and much safer.
In your case, there's no need to use os.system
at all. The commands you're using can be replaced with shutil.copy
, shutil.move
, and win32api.SetFileAttributes
.
Given the risks associated with os.system
and the fairly straightforward alternatives, this is by far the best idea
If you absolutely have to shell out, use subprocess.run
with a list of arguments
In the incredibly unusual event that you absolutely have to use a shell command, subprocess.run
is far more secure. Rather than separating your commands with spaces, you'd put them in a list. Something like this:
subprocess.run(["move", self.file_path, dist])
This makes sure that arguments are passed directly to the subprocess, without going via a shell that might get confused between what's an argument to a command and what's a separate command entirely.
You should not use subprocess.run
in this case, since you do not need to shell out at all. Use my first suggestion, but know that this is an option for the rare case where you do need to shell out.
Validate input as though your users are attackers
This is very hard to get right. Sometimes, though, you really really need to accept user input and pass it to something dangerous.
In this case, have strict rules about what characters are valid. On windows, for instance, a filename cannot contain certain characters. At the very least, all methods that expect to take a file path should check that the input it's been given doesn't contain those characters and raise ValueError
if it does. It's worth doing that validation anyway, since catching errors early makes your code easier to use.
Test your validation rigorously, and look up other characters that might have a special meaning and how to handle them safely, and be distinctly aware that your validation may not necessarily be complete, and that safer alternatives (like my other two solutions!) may also exist and be easier to use.
-
3\$\begingroup\$ Great answer! . \$\endgroup\$N3buchadnezzar– N3buchadnezzar2022年02月02日 11:53:23 +00:00Commented Feb 2, 2022 at 11:53
-
1\$\begingroup\$ I get your drift, but on (some) Linux systems the result is "Command 'calc' not found, but can be installed with: sudo apt install calc". \$\endgroup\$Peter Mortensen– Peter Mortensen2022年02月03日 13:32:31 +00:00Commented Feb 3, 2022 at 13:32
-
9\$\begingroup\$ @PeterMortensen The use of
copy
,move
,ren
andattrib
strongly implies that this code is designed only to run on Windows, so I provided a Windows-specific PoC. Other answers have touched on cross-platform compatibility, which is worth talking about, but getting into that in my answer would have detracted from the main point that I'm trying to make. \$\endgroup\$ymbirtt– ymbirtt2022年02月03日 13:35:20 +00:00Commented Feb 3, 2022 at 13:35 -
\$\begingroup\$ Thanks a lot I appreciate your time \$\endgroup\$jamie_0101– jamie_01012022年02月04日 12:41:26 +00:00Commented Feb 4, 2022 at 12:41
Private members
You should mark file_path
as not part of the public API for the class. You accomplish this with a leading underscore in the member name. eg)
class FileHandle:
def __init__(self, file_path):
self._file_path = file_path
# Replace all remaining occurances of `file_path` with `_file_path`
Pathlib
The os.path
library is more-or-less deprecated. Look in pathlib
for replacements.
Return values
.write_string()
, .write_bytes()
, .append_string()
, and so on return an int
, but the code actually only ever returns 0
. Why bother with a return value at all? Remove it.
Success/failure
.copy()
, .move()
, .delete()
and .rename()
return 1
/0
for success/failure. Boolean values are more professional. As in, use the bool
as the return type hint, and explicitly return True
or False
.
Note: .move()
doesn't even have a return type indicated.
Spelling
dist
is a typical abbreviation for distance
. You probably want dest
as an abbreviation for destination
.
Security
os.system(f'copy "{self.file_path}" {dist}')
Why have you protected self.file_path
in quotation marks? Because you might have spaces in the filename? What about the target? Shouldn't it be quoted as well?
You are not checking for any invalid characters in either path. You should ensure file_path
doesn't contain invalid characters during the constructor, and also ensure the dist
targets doesn't either. Depending on the shell os.system()
uses, this it a ripe for malicious code injection!
Representation
You should implement a __repr__
function. The standard is to return a string which looks like the constructor for the object, i.e.,
class FileHandle:
...
def __repr__(self) -> str:
return f"FileHandle({self.file_path!r})"
You might also consider implementing __str__
.
Documentation
Your FileHandle
class should have a """docstring"""
as well. The module (file) should also have one.
All of the """docstring"""
should be expanded to include more than one terse sentence. Document the parameters they accept, and what is returned.
Efficiency
Class objects by default contains a dictionary for the members. Your objects have exactly one member: file_path
. You should declare this to reduce the memory footprint:
class FileHandle:
__slots__ = ('_file_path', )
def __init__(self, file_path: str):
self._file_path = file_path
...
Naming
string
is a module which can be imported. Using it as a parameter name will hide that module, preventing easy access to it. Consider using a different name for the parameter.
Attributes
Consider creating an Enum
or a Flag
for possible attributes
from enum import Enum
class Attribute(Enum):
HIDDEN = 'H'
ARCHIVE = 'A'
READ_ONLY = 'R'
SYSTEM = 'S'
-
\$\begingroup\$
The os.path library is more-or-less deprecated.
I was not aware of this. Do you have a source? \$\endgroup\$AbrahamCoding– AbrahamCoding2022年02月03日 08:00:22 +00:00Commented Feb 3, 2022 at 8:00 -
3\$\begingroup\$ @AbrahamCoding By "more or less" I think he means "not officially". But PathLib provides all the same features in a more modern way, and is clearly intended to replace it. \$\endgroup\$Barmar– Barmar2022年02月03日 15:24:12 +00:00Commented Feb 3, 2022 at 15:24
-
\$\begingroup\$ Thanks a lot. I appreciate your time \$\endgroup\$jamie_0101– jamie_01012022年02月04日 12:42:30 +00:00Commented Feb 4, 2022 at 12:42
One more warning bell here. You should not perform any shell invocation unless absolutely necessary, and that comes with pitfalls and security considerations that have to be fully understood.
You should certainly not be doing shell just to copy a file when the built-in shutil does that in a safer way.
The problem with this class is that it's just a wrapper around the existing functions. I can't see the added value here. What is the problem you are trying to solve? How does that class simplify file handling in your day to day work? In other words, how would you justify its existence?
The class could be useful if it were cross-platform and providing some kind of unifying interface or abstraction to the underlying OS and file system.
In fact the code is bloated if you consider this example:
def file_exists(self) -> bool:
"""Determine if a file exists"""
if os.path.exists(self.file_path):
return True
return False
Since os.path.exists
returns a boolean result, your code can be shortened as:
def file_exists(self) -> bool:
"""Determine if a file exists"""
return os.path.exists(self.file_path):
At this point it is obvious that the function does not perform any extra useful work and you could just as well do without it.
The class is probably not complete, for instance you can set and clear attributes, but what if you want to query attributes?
My suggestion is to read the documentation for os
, os.path
and shutil
to get a good grasp of what is already available to you as a programmer. Don't reinvent the wheel. Focus your energy on implementing new functions or special needs that are not already covered and good luck with your career.
I don't want to discourage you. I think you have made an effort to implement certain good practices and write modern code. There are docstrings, type hinting, f-strings, etc. The code is otherwise clean and easy to understand. But it won't make a strong impression in a job interview. You will be expected to perform slightly more advanced stuff.
-
7\$\begingroup\$ They did tag it "reinventing the wheel". They're practicing class design, not implementing something that's actually needed. \$\endgroup\$Barmar– Barmar2022年02月03日 15:25:25 +00:00Commented Feb 3, 2022 at 15:25
-
1\$\begingroup\$ @Barmar I would say that "reinventing the wheel" would still need to be adding something of value, which isn't wrappers around existing methods. For example, reimplementing aspects of
os
,os.path
, orshutil
. This isn't reinventing the wheel so much as covering over an existing wheel. (Also, OP didn't originally tag it as such, for what that's worth.) \$\endgroup\$Teepeemm– Teepeemm2022年02月04日 15:16:43 +00:00Commented Feb 4, 2022 at 15:16
At the moment, your class only fully works on Windows. On Mac OS and Linux all methods that call a command line tool with os.system
will fail. You should make this limitation clear (eg in the module docstring) and add a guard for it:
import os
if os.name != "nt":
raise ValueError("This class only fully works under Windows")
or actually implement it for the other operating systems:
if os.name == "nt":
os_copy = 'copy "{source}" "{target}"'
...
elif os.name == "posix":
os_copy = 'cp "{source}" "{target}"'
...
else:
raise NotImplementedError()
class File:
...
def copy(self, target):
os.system(os_copy.format(source=self.file_path, target=target))
...
Although you should probably use the subprocess
module to call external tools, it give you a lot more control.
For moving (and renaming), there's actually os.rename
.
-
3\$\begingroup\$ "or actually implement it for the other operating systems" - my favorite option. Windows shmindows. \$\endgroup\$Noah Broyles– Noah Broyles2022年02月02日 18:24:41 +00:00Commented Feb 2, 2022 at 18:24
Welcome to Code Review!
Your code is mostly just lightly wrapping standard functions, so it is pretty much good as it is in my opinion, the only improvement that I can think of is in the add_attribute
function:
def add_attribute(self, attribute: str):
"""Add attributes to a file
Possible attributes:
1.H - for hidden file
2.A - for archive file
3.R - for readonly file
4.S - for system file
"""
os.system(f"attrib +{attribute.upper()} {self.file_path}")
If only some attributes are permissible then you should also implement this in the code:
PERMISSIBLE_ATTRIBUTES = ["H", "A", "R", "S"]
def add_attribute(self, attribute: str):
"""Add attributes to a file
Possible attributes:
1.H - for hidden file
2.A - for archive file
3.R - for readonly file
4.S - for system file
"""
if attribute not in self.PERMISSIBLE_ATTRIBUTES:
raise ValueError(f"Attribute must be one of {self.PERMISSIBLE_ATTRIBUTES}")
os.system(f"attrib +{attribute.upper()} {self.file_path}")
And the same for attribute removal.
-
1\$\begingroup\$ The permissible attributes are just
H
,A
,R
,S
. The numbers come from an ordered list and are not part of the API surface \$\endgroup\$Ben Voigt– Ben Voigt2022年02月04日 19:44:55 +00:00Commented Feb 4, 2022 at 19:44
Use pathlib
, a well-designed module in the standard library (3.4+) that wraps a file path string into an object and provides convenient methods just like your design.
There are a number of problems with types:
def __init__(self, file_path):
should bedef __init__(self, file_path: str):
.def read_string_lines(self) -> list:
should betyping.List[str]
orlist[str]
(3.9+).All of the functions returning
int
values of0
or1
should instead returnbool
values ofFalse
orTrue
- unless you think there is a reason in the future to return other values like-1
or2
, in which case you should change fromint
to an enum.
You can catch these type errors using mypy
--strict yourfile.py
.
The methods write_string()
, copy()
, etc. seem to return 0
for success and 1
for failure. This matches the shell scripting convention but is the opposite of conventions within programming languages.
Even though the methods copy()
and move()
check if the destination exists after performing the operation, this doesn't necessarily mean success - the destination file could have existed before the operation and the operation was not permitted for whatever reason.
The rename()
method doesn't quote file paths, unlike the implementation copy()
etc; this is inconsistent.
Explore related questions
See similar questions with these tags.
shutil
) for moving and copying files. Thecopy
command doesn't work on Linux, for example, aside from the fact that it is a security risk to run un-validated arguments in a subshell. \$\endgroup\$Args
,Returns
andRaises
sections which tell users what to expect from your function/method/class. \$\endgroup\$