I was working on making a cache in bash and I found this link - But this guy is attempting to generate new bash functions by introspecting their source and generating a new one as a cache. This seems to me, way overkill. Here is my implementation and it works just fine:
cache() {
local file
file="/tmp/$(printf "%s" ${@})"
if [[ -f "${file}" && $(($(date +%s) - $(date -r "$file" +%s))) -le 1800 ]]; then
cat "${file}"
else
${1} ${@:2} | tee "${file}"
fi
}
mkurls() {
local url="${1:?You must supply a URL}"
local batchSize=${2:-100}
local repoCount
repoCount=$(curl -fsnL "$url?pagelen=0&page=1" | jq -re .size)
local pages=$(((repoCount / batchSize) + 1))
printf "$url?pagelen=${batchSize}&page=%s\n" $(seq 1 "$pages")
}
bbusers() {
_users() {
mkurls https://api.bitbucket.org/2.0/teams/twengg/members/ |
xargs -n 30 -P 20 curl -fsnL | jq -ser '.[].values[] | "\(.display_name), \(.links.self.href)" ' | sort
}
cache _users $@
}
and here are the results:
$ time bbusers &> /dev/null
#output
real 0m5.670s
user 0m0.184s
sys 0m0.099s
$ time bbusers &> /dev/null # seconds later
real 0m0.053s
user 0m0.010s
sys 0m0.017s
As you can see, it's literally 100x faster. If I delete the file behind the cache:
$ rm -fR /tmp/_users
$ time bbusers &> /dev/null
real 0m4.924s
user 0m0.170s
sys 0m0.082s
It goes right back to normal. So, how can this be improved and what am I missing that warrants such a wildly complicated approach as the other guy has?
1 Answer 1
As the "guy" in question, I'm happy to discuss my approach :) feel free also to file issues on GitHub if you have feedback.
You are certainly correct that decorating functions as bash-cache does isn't necessary, but the intent is to simplify the development experience - to memoize a function all you have to do is make a call to bc::cache
, and bash-cache (generally) handles the rest. No need to keep separate functions in sync or wrestle with naming schemes. Whether that's a feature or overkill is in the eye of the beholder, but I find it very flexible and expressive :)
As you've shown it's not too complicated to implement a reasonable caching mechanism in Bash, but the devil is really in the details. With bash-cache I've addressed a number of very subtle issues that most caching implementations fail to handle, including:
- stdout and stderr are both cached and preserved separately (your approach only caches stdout, other approaches merge the two streams with
>&
) - the return code of the function is also preserved, few other implementations I've seen do this (correctly)
- avoids a common gotcha with command substitutions (
$(...)
), which are actually lossy - stale cache data is regularly cleaned up in the background
- caches are separated by user and
chmod
-ed to only be readable by that user; a global cache risks leaking sensitive data
Since this is CodeReview.SE, some tips on your approach:
- Be sure to consistently quote all variables, including arrays like
"$@"
; without quotes Bash will expand variables and arrays with word-splitting, which is rarely desirable and often leads to bugs. In your code it's likely that whitespace arguments will break things. It's a great idea to run any shell code through ShellCheck and address the issues it flags. file="/tmp/$(printf "%s" ${@})"
is unsafe, as it munges arguments together without a delimiter, e.g. it will treatfoo 12 3
andfoo 1 23
as the same set of arguments. It also will fail if any arguments contain invalid filename characters (notably/
), which is (part of) why bash-cache hashes the arguments.- It's probably a good idea to asynchronously delete the cache file when it's too old, instead of making multiple
date
calls in subshells to determine whether the file is still valid. I'd have to benchmark to be sure but I suspect it will be faster. ${1} ${@:2}
isn't necessary, just say"$@"
(note the quotes)- Bash doesn't support nested/anonymous functions, so what you're doing in
bbusers
is actually just re-defining_users
in the top-level namespace on every invocation, which isn't necessary or helpful. Just define_users
as a normal function and then:bbusers() { cache _users "$@"; }
.
-
\$\begingroup\$ All very good tips! As for your "$@" - I misunderstood what it meant by splitting. My intent is "give it to me, as is" so, if someone had
ab "c e" d
that is exactly what I wanted passed in. I will address these all. And, if you're interested, I have this in a repo github.com/chb0github/bashful - i'd welcome more input \$\endgroup\$Christian Bongiorno– Christian Bongiorno2020年04月14日 08:01:34 +00:00Commented Apr 14, 2020 at 8:01 -
\$\begingroup\$ BTW: I am not sure of the wisdom of caching errors - errors usually respond fast and don't need caching and frequently they are transient (so, you'd want to try again). So, if I run a command with the same args, It would have to know, per function implementation, when to invalidate the cache and not \$\endgroup\$Christian Bongiorno– Christian Bongiorno2020年04月14日 08:04:54 +00:00Commented Apr 14, 2020 at 8:04
-
\$\begingroup\$ You're right that caching errors may not be desirable in certain circumstances, but in the shell many commands use exit codes to communicate more than just errors. For example
grep
returns a non-zero exit code if no match is found. My goal with bash-cache is to provide cached results that are as high-fidelity as possible, and work for all commands, rather than make an opinionated decision about how errors should be handled that may or may not be desirable in certain circumstances. But feel free to file a FR or send a PR if you'd like to have abc::cache_success
variant :) \$\endgroup\$dimo414– dimo4142020年04月14日 08:56:36 +00:00Commented Apr 14, 2020 at 8:56