2

I am trying to give value to a certain package and library by building some usage stats. This package is written in python and invoked/imported by users on remote machines/cluster

What we are trying to do is to set up a certain tracking mechanism that will register or log every time someone imports the package and ideally how long he used it.

I can think of a a normal logger that will log at import time of the package and will log when the process closes/dies/exits using atexit function.

# let's import the package here
import package
# at this stage the package automaticaly logs something like the following in a certain file
# os.getpid() | getpass.getuser() | os.uname() | importing package time.time()
# when the process or session is closed. ataxit will execute the same function that will log the following
# os.getpid() | getpass.getuser() | os.uname() | ataxit process time.time()

This way this log file can be analyzed later for info about package usage.

Now, I wonder if there is another more elegant or standard solution to do all of that.

asked Jun 8, 2017 at 15:36
5
  • 4
    Is this an internal only package? I'm pretty sure that no one will use a package which starts reporting data to another external source like this. Commented Jun 8, 2017 at 15:51
  • 1
    Oh yeah absolutely. I agree with you, this is for internal usage only. BU needs to give valuation for the implementation by counting internal users and time of usage. Commented Jun 8, 2017 at 16:13
  • This doesn't so much "give value" as measure usage. The idea makes me wonder if the package loader has a verbose enough switch to do this for you. Commented Jun 8, 2017 at 17:33
  • 1
    Read about import hooks. They exist in both Python 2 and 3 but differ somehow. Commented Jun 8, 2017 at 19:58
  • 1
    As a warning: What about devs working with this package? Every import while running unit test could inflate your count. Commented Jun 8, 2017 at 20:17

1 Answer 1

1

If you want centralized logging, you should probably send the usage statistics to a remote server, e.g. with the http-logger to logstash. Could be easier to maintain than trying to find dozens of logfiles.

I would be loath to use something like atexit, there are lots of scenarios where this function will never run (yanking the cord, abnormal interpreter shutdown etc.). I would probably start a separate thread which sends a regular ping while your module is imported. Use some log aggregating at you collection server to compute the total time run. You would need to generate a unique ID at import time, but that is easy.

Lastly: The python logging module is ...not really good. Could be easier just to build a small function which sends your logs to the desired server. I spend several hours sending a log to logstash, so I am pretty biased against the logging module.

answered Jun 8, 2017 at 20:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.