I am writing a python package which is dependent on a number of large data files. These files are not included with the package. Instead, users are required to have these files on disk already (with an arbitrary path). What is the best way to make my package aware of the location of these files?
I have been reading about setup.py and setup.cfg but I am still not sure how to to this. It seems to me that a user-editable option in setup.cfg would be a good way to go but I don't know if it is, if it can be done at all, or how I would do it if so...
I did see this almost identical question, Python Packaging: Ask user for variable values when (pip) installing, which focuses on user input during pip (which is discouraged in the comments). If that actually would be a good solution, I'm interested in how to do that too.
In my private development of the package, I have used module constants, as in
DEFAULT_PATH_FILE1 = "my/path/to/file1.csv"
DEFAULT_PATH_FILE2 = "my/path/to/file2.csv"
etc. and properties initialized with these constants. This doesn't seem viable at all for distribution.
-
important for context, does your package provide a command line interface?Arne– Arne2021年06月01日 18:19:27 +00:00Commented Jun 1, 2021 at 18:19
-
No, that is not the intention. It (mainly) defines a class that facilitate data analysis across the different data files. The intended usage is to import and use this class at will.Stalpotaten– Stalpotaten2021年06月01日 19:04:53 +00:00Commented Jun 1, 2021 at 19:04
2 Answers 2
What you want is not a one-time setup during install (which is also impossible with modern .whl installs), but a way for clients to configure your library at any point during runtime. Given that you don't provide a cli, you can either use environment variables as an option to provide that , or look for a user-defined config file.
This here is a simple recipe using appdirs to find out where the config file should be found. It loads on import of your package, and tells clients how bad it is if the config file isn't there. Usually, that'd be:
- write a log message
- use default settings
- throw some kind of exception
- a combination of the above
from logging import getLogger
from pathlib import Path
from configparser import ConfigParser
# loads .ini format files easily, just to have an example to go with
import appdirs # needs to be pip-installed
log = getLogger(__name__)
config = ConfigParser(interpolation=None)
# load config, substitute "my_package" with the actual name of your package
config_path = Path(appdirs.user_config_dir("my_package")) / "user.ini"
try:
with open(config_path) as f:
config.read_file(f, source="user")
except FileNotFoundError:
# only do whatever makes sense
log.info(f"User config expected at '{config_path}', but not found.")
config.read_string("[pathes]\nfile_foo=foo\nfile_bar=bar") # dubious
raise ImportError(f"Can't use this module; create a config at '{config_path}'.")
class Foo:
def __init__(self):
with open(cfg["pathes"]["file_foo"]) as f:
self.data = f.read()
2 Comments
This sounds like runtime configuration. It's none of the business of setup.py, which is concerned with installing your package.
For app configuration, it would be common to specify this resource location by command-line argument, environment variable, or configuration file. You will usually want to either hard-code some sensible default path in case the user does not specify any configuration, or raise an exception in case of resources not existing / not found.
Example for environment var:
import os
DEFAULT_PATH_FILE1 = "/default/path/to/file1.csv"
PATH_FILE1 = os.environ.get("PATH_FILE1", DEFAULT_PATH_FILE1)
2 Comments
~/.config/myapp.conf or wherever. Document the default config file path and provide a way to override this location (e.g. another env var MYAPP_CONF_FILE). The existence of a config file at all should generally be optional.