I am trying to give value to a certain package and library by building some usage stats. This package is written in python and invoked/imported by users on remote machines/cluster
What we are trying to do is to set up a certain tracking mechanism that will register or log every time someone imports the package and ideally how long he used it.
I can think of a a normal logger that will log at import time of the package and will log when the process closes/dies/exits using atexit function.
# let's import the package here
import package
# at this stage the package automaticaly logs something like the following in a certain file
# os.getpid() | getpass.getuser() | os.uname() | importing package time.time()
# when the process or session is closed. ataxit will execute the same function that will log the following
# os.getpid() | getpass.getuser() | os.uname() | ataxit process time.time()
This way this log file can be analyzed later for info about package usage.
Now, I wonder if there is another more elegant or standard solution to do all of that.
-
4Is this an internal only package? I'm pretty sure that no one will use a package which starts reporting data to another external source like this.enderland– enderland2017年06月08日 15:51:03 +00:00Commented Jun 8, 2017 at 15:51
-
1Oh yeah absolutely. I agree with you, this is for internal usage only. BU needs to give valuation for the implementation by counting internal users and time of usage.Cobry– Cobry2017年06月08日 16:13:37 +00:00Commented Jun 8, 2017 at 16:13
-
This doesn't so much "give value" as measure usage. The idea makes me wonder if the package loader has a verbose enough switch to do this for you.candied_orange– candied_orange2017年06月08日 17:33:23 +00:00Commented Jun 8, 2017 at 17:33
-
1Read about import hooks. They exist in both Python 2 and 3 but differ somehow.9000– 90002017年06月08日 19:58:22 +00:00Commented Jun 8, 2017 at 19:58
-
1As a warning: What about devs working with this package? Every import while running unit test could inflate your count.Christian Sauer– Christian Sauer2017年06月08日 20:17:01 +00:00Commented Jun 8, 2017 at 20:17
1 Answer 1
If you want centralized logging, you should probably send the usage statistics to a remote server, e.g. with the http-logger to logstash. Could be easier to maintain than trying to find dozens of logfiles.
I would be loath to use something like atexit, there are lots of scenarios where this function will never run (yanking the cord, abnormal interpreter shutdown etc.). I would probably start a separate thread which sends a regular ping while your module is imported. Use some log aggregating at you collection server to compute the total time run. You would need to generate a unique ID at import time, but that is easy.
Lastly: The python logging module is ...not really good. Could be easier just to build a small function which sends your logs to the desired server. I spend several hours sending a log to logstash, so I am pretty biased against the logging module.