I've created a script that executes arbitrary user code in Python, written in a .py file, and returns execution result (contents of stdout, stderr, exec code) + metrics acquired by /usr/bin/time, in particular - elapsed time (seconds), CPU clock ticks, CPU usage (millicores), peak memory usage (MiB).
Limitations:
- User code is always in
.pyfile, - User code may contain raised errors, variations of
rm -rfand such, but capturing metrics is needed regardless -> usingsubprocess.- This script is executed in a file system mounted with read-only option (as a security step against
rm -rf).
- This script is executed in a file system mounted with read-only option (as a security step against
- Maximum exec time is 5 seconds.
- Target OS: Linux
Code:
import argparse
import os
import re
import subprocess
import sys
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--script",
default="./test.py",
help="Path to the Python script to execute."
)
parser.add_argument(
"--timeout", type=float, default=5.0,
help="Maximum time allowed for script execution (in seconds)."
)
args = parser.parse_args()
time_cmd = "/usr/bin/time"
if not os.path.exists(time_cmd):
print(f"Error: {time_cmd} not found. Please install GNU time package.")
sys.exit(1)
script_cmd = [sys.executable, args.script]
# Create a temporary file for time command's output
time_output_file = "/tmp/time_output.txt"
# Prepare GNU time command with detailed format
# %e: elapsed real time (wall clock) in seconds
# %U: CPU time spent in user mode in seconds
# %S: CPU time spent in kernel mode in seconds
# %M: Maximum resident set size in Kilobytes
# %x: Exit status
time_format = "%e %U %S %M %x"
# Full command: /usr/bin/time -f "format" -o output_file python script.py
full_cmd = [
time_cmd,
"-f", time_format,
"-o", time_output_file
] + script_cmd
try:
process = subprocess.Popen(
full_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
try:
stdout, stderr = process.communicate(timeout=args.timeout)
except subprocess.TimeoutExpired:
process.kill()
stdout, stderr = process.communicate()
# Get the detailed stats from GNU time
try:
with open(time_output_file, 'r') as f:
time_output = f.read().strip()
# Finding sequence of 5 numbers
# If script exec failed - first line will be "Command exited with .. status <num>",
# Hence sequence of 5 numbers
numbers = re.findall(r'[-+]?\d*\.\d+|\d+', time_output)
if len(numbers) >= 5:
# Take the last 5 numbers, as they're likely our metrics
real_time = float(numbers[-5])
user_time = float(numbers[-4])
sys_time = float(numbers[-3])
max_rss_kb = int(numbers[-2])
exit_status = int(numbers[-1])
else:
raise ValueError("Could not extract time metrics from output")
except (IOError, ValueError) as e:
print(f"Warning: Failed to read time statistics: {e}")
# Fallback
sys_time = user_time = real_time = 0.001 # Minimal value
max_rss_kb = 4096 # 4MB minimum
exit_status = process.returncode
# Calculate total CPU time
cpu_time_sec = user_time + sys_time
# Calculate clock ticks
clk_tck = os.sysconf("SC_CLK_TCK")
cpu_time_ticks = cpu_time_sec * clk_tck
# Calculate millicores (1000 = 1 core)
avg_cpu_millicores = (cpu_time_sec / real_time * 1000.0) if real_time > 0 else 0
peak_memory_mi = max_rss_kb / 1024.0
if stdout:
print("Script stdout:")
print(stdout.decode('utf-8', errors='replace'))
if stderr:
print("Script stderr:")
print(stderr.decode('utf-8', errors='replace'))
print(f"Exit code: {exit_status}")
print(f"Elapsed CPU time: {cpu_time_sec:.6f} seconds")
print(f"Elapsed CPU clock ticks: {cpu_time_ticks:.0f}")
print(f"Average CPU usage: {avg_cpu_millicores:.6f} millicores")
print(f"Peak Memory usage: {peak_memory_mi:.6f} MiB")
finally:
# Clean up the temporary file
if os.path.exists(time_output_file):
try:
os.unlink(time_output_file)
except OSError:
pass
if __name__ == "__main__":
main()
First of all, how could I improve the current implementation? Maybe I'm missing some trivial details?
Also, what still bothers me is that for following user scripts as input:
print(2)
raise ValueError("test")
import time
import math
def heavy_computation():
"""Perform a CPU-intensive calculation."""
s = 0.0
# Loop over a large range to generate a heavy load
for i in range(1, 100000):
s += math.sqrt(i) * math.log(i + 1)
return s
def main():
target_duration = 2.5 # Target duration in seconds (adjust between 2 and 3 seconds)
start_time = time.time()
total = 0.0
iterations = 0
# Run the heavy computation until the target time is reached
while time.time() - start_time < target_duration:
total += heavy_computation()
iterations += 1
elapsed_time = time.time() - start_time
print("Final result:", total)
print("Total iterations:", iterations)
print("Elapsed time: {:.2f} seconds".format(elapsed_time))
if __name__ == "__main__":
main()
Acquired value of CPU usage (the one in millicores) is equal: both got 1000 millicores = 1 CPU core; and in latter script with CPU-intensive calculation I could understand it, but for former one, with 2 lines inside, it seems that the value is a bit too high. Am I missing something or is it ok?
CPU: Intel i5-10300H
Output for script #1:
$ python3 main.py --script t1.py
Script stdout:
2
Script stderr:
Traceback (most recent call last):
File "/home/user/test/./test.py", line 2, in <module>
raise ValueError("test")
ValueError: test
Exit code: 1
Elapsed CPU time: 0.060000 seconds
Elapsed CPU clock ticks: 6
Average CPU usage: 1000.000000 millicores
Peak Memory usage: 24.773438 MiB
Output for script #2:
$ python3 main.py --script t2.py
Script stdout:
Final result: 24237750396.83276
Total iterations: 106
Elapsed time: 2.50 seconds
Exit code: 0
Elapsed CPU time: 2.510000 seconds
Elapsed CPU clock ticks: 251
Average CPU usage: 1000.000000 millicores
Peak Memory usage: 8.121094 MiB
3 Answers 3
Let's start with the positive side: this code is readable and easy to understand. Good job with alphabetically sorted imports and overall code style - I don't see any concerning linter reports in the most basic configuration. Consider also using ruff format or black to format your code to some fixed style: your formatting preferences are a bit non-standard but reasonable (esp. looking at multi-line list literal indentation).
This is also written in quite defensive style, great!
I won't address your design decisions - let's assume that you really need to run /usr/bin/time on a python script. Other answers have also mentioned some methodological problems with this, but if you're really interested in self-contained invocation of a script, interpreter overhead should indeed be included in your measurements, so this approach is sound. Let's also stick to a "small script" philosophy where you don't want to define custom exceptions or use other high-level features that pay off in larger codebases.
Temp file
At least from a quick glance seems like you did manage to handle temp file. However, the logic is quite verbose (all the finally handler) and uses fixed filename. What if I already have /tmp/time_output.txt and don't want to silently overwrite it? What if I want to run several instances of your script concurrently?
Let me introduce another corner of python standard library: tempfile module. You don't care about the actual name, right? Let's just ask for a temporary file without pretending to need anything else.
import tempfile
# snip...
with tempfile.NamedTemporaryFile('r', encoding='utf-8') as time_file:
# Full command: /usr/bin/time -f "format" -o output_file python script.py
full_cmd = [
time_cmd,
"-f", time_format,
"-o", time_file.name,
*script_cmd,
]
...
# Get the detailed stats from GNU time
try:
time_output = time_file.read().strip()
I think it's also fine to stop catching IOError here. Either NamedTemporaryFile failed to create it (and then the system is likely broken or misconfigured, it isn't a problem for your app to handle) or it exists and is readable. Even if the file is deleted by a foreign actor, your handle remains open. If we get rid of IOError, the second try/except can also go away completely together with questionable (though sometimes appropriate) practice f raising an exception to be immediately caught:
numbers = re.findall(r'[-+]?\d*\.\d+|\d+', time_output)
if len(numbers) >= 5:
# Take the last 5 numbers, as they're likely our metrics
real_time = float(numbers[-5])
user_time = float(numbers[-4])
sys_time = float(numbers[-3])
max_rss_kb = int(numbers[-2])
exit_status = int(numbers[-1])
else:
print("Warning: Failed to read time statistics, using fallback values.")
# Fallback
sys_time = user_time = real_time = 0.001 # Minimal value
max_rss_kb = 4096 # 4MB minimum
exit_status = process.returncode
Use your tools
Popen can handle the stdout/stderr encoding just fine! Instead of decoding them manually later, you can
process = subprocess.Popen(
full_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
encoding='utf-8', # Not really necessary?
errors='replace',
)
...
if stdout:
print("Script stdout:")
print(stdout)
if stderr:
print("Script stderr:")
print(stderr)
Control the output format
Since you already decided to use non-posix -f option for time, why not go further and set up some format that's easier to parse?
time_format = "|%e|%U|%S|%M|%x"
...
# Finding sequence of 5 numbers
# If script exec failed - first line will be "Command exited with .. status <num>",
# Hence sequence of 5 numbers.
# First part is either empty or "command exited..." message
_, *numbers = time_output.split("|")
if len(numbers) == 5:
...
else:
print("Warning: Failed to read time statistics, using fallback values.")
# Fallback
...
CLI interface
First of all, the default for --script is suspicious. Consider removing it and making this argument required (and perhaps positional). It's rather unusual to assume that a user wants to run some file test.py in CWD by default.
Now the hard part: you're designing a CLI tool. At the very least CLI tools have two streams available: stdout and stderr. Usually results go to stdout, while other messages, warnings and errors go to stderr. This is helpful when you want to pipe the output of your tool to something else. "Warning: failed to read..." should go to stderr.
There's one more nit here: you exit with 1 if /usr/bin/time is not found and with 0 otherwise. That might be enough for you now, but is rather useless for someone who knows that they have time. Consider different exit code if the script failed - for example, exit with 2 if the invoked script errored, and with 3 if it exceeded the timeout.
Code organization
You just dumped 100+ lines of code into a single main function. This may work well in a small script, but is borderline impossible to test. Consider splitting into several functions: at least you can make main only responsible for argparse and calling another "api" function with script and timeout arguments. Consider also using a collections.namedtuple (or typed typing.NamedTuple equivalent) for timing results to reduce the amount of local variables - and perhaps extracting this parsing to a helper function then?
measurement methodology
Consider using wait4()
and the built-in
rusage()
function.
Then you don’t need to worry about parsing /usr/bin/time output.
And you have access to other fun statistics, such as memory consumption and faulting.
ru_maxrss gives us the resident set size,
that is, the memory footprint.
cracking argv
Consider doing "import typer". Your implementation is perfectly nice, it could just be a little shorter is all.
The timing anomaly you noticed appears to be due to interpreter start up overhead.
Remember that we need to fork and exec an interpreter process, and also spend time performing some import statements.
To investigate further, it would make sense to instrument the target code with a few calls to time.time().
Then, at the end, you can print out some timestamp differences.
This would, for example, let you identify how much elapsed time certain import statements take.
-
\$\begingroup\$ Why replace a simple builtin
argparse-based parser with third-partytyperif you don't need any of its extra features? \$\endgroup\$STerliakov– STerliakov2025年03月05日 16:32:53 +00:00Commented Mar 5 at 16:32 -
\$\begingroup\$ It's just habitual. I already wrote a type hinted function, perhaps called report(), with informative parameter names. Now I want a CLI. It's just one line of extra effort, very DRY:
typer.run(report). Contrast that with the visual clutter of building up anArgumentParser(). And now I change an arg from int to float, or I add another arg, and I have to remember to do it in both places, and to test both places. Withtyper, things can't get out of sync in that way. \$\endgroup\$J_H– J_H2025年03月05日 16:39:30 +00:00Commented Mar 5 at 16:39
Temp file
There is overhead associated with creating, opening and writing to the temporary
file, although how significant is unknown. If you can find a way to
convince subprocess to capture the output instead of writing it to a file,
that may save some time. Or, as the previous answer suggests, use another
mechanism entirely.
Partitioning
All of the code is in a single function. It might be worth factoring out the temporary file parsing into its own function, returning a dictionary.
Documentation
The comments in the code are helpful. It would also be nice to
add a docstring at the top of the code to summarize its purpose
and to mention using -h to get usage details.
Simpler
These lines:
clk_tck = os.sysconf("SC_CLK_TCK")
cpu_time_ticks = cpu_time_sec * clk_tck
can be combined into one, eliminating an intermediate variable:
cpu_time_ticks = cpu_time_sec * os.sysconf("SC_CLK_TCK")