Comparing two bash caches

Question 1

I was working on making a cache in bash and I found this link - But this guy is attempting to generate new bash functions by introspecting their source and generating a new one as a cache. This seems to me, way overkill. Here is my implementation and it works just fine:

cache() {
 local file
 file="/tmp/$(printf "%s" ${@})"
 if [[ -f "${file}" && $(($(date +%s) - $(date -r "$file" +%s))) -le 1800 ]]; then
 cat "${file}"
 else
 ${1} ${@:2} | tee "${file}"
 fi
}
mkurls() {
 local url="${1:?You must supply a URL}"
 local batchSize=${2:-100}
 local repoCount
 repoCount=$(curl -fsnL "$url?pagelen=0&page=1" | jq -re .size)
 local pages=$(((repoCount / batchSize) + 1))
 printf "$url?pagelen=${batchSize}&page=%s\n" $(seq 1 "$pages")
}
bbusers() {
 _users() {
 mkurls https://api.bitbucket.org/2.0/teams/twengg/members/ |
 xargs -n 30 -P 20 curl -fsnL | jq -ser '.[].values[] | "\(.display_name), \(.links.self.href)" ' | sort
 }
 cache _users $@
}

and here are the results:

$ time bbusers &> /dev/null
#output
real 0m5.670s
user 0m0.184s
sys 0m0.099s
$ time bbusers &> /dev/null # seconds later
real 0m0.053s
user 0m0.010s
sys 0m0.017s

As you can see, it's literally 100x faster. If I delete the file behind the cache:

$ rm -fR /tmp/_users
$ time bbusers &> /dev/null
real 0m4.924s
user 0m0.170s
sys 0m0.082s

It goes right back to normal. So, how can this be improved and what am I missing that warrants such a wildly complicated approach as the other guy has?

Question 2

As the "guy" in question, I'm happy to discuss my approach :) feel free also to file issues on GitHub if you have feedback.

You are certainly correct that decorating functions as bash-cache does isn't necessary, but the intent is to simplify the development experience - to memoize a function all you have to do is make a call to bc::cache, and bash-cache (generally) handles the rest. No need to keep separate functions in sync or wrestle with naming schemes. Whether that's a feature or overkill is in the eye of the beholder, but I find it very flexible and expressive :)

As you've shown it's not too complicated to implement a reasonable caching mechanism in Bash, but the devil is really in the details. With bash-cache I've addressed a number of very subtle issues that most caching implementations fail to handle, including:

stdout and stderr are both cached and preserved separately (your approach only caches stdout, other approaches merge the two streams with >&)
the return code of the function is also preserved, few other implementations I've seen do this (correctly)
avoids a common gotcha with command substitutions ($(...)), which are actually lossy
stale cache data is regularly cleaned up in the background
caches are separated by user and chmod-ed to only be readable by that user; a global cache risks leaking sensitive data

Since this is CodeReview.SE, some tips on your approach:

Be sure to consistently quote all variables, including arrays like "$@"; without quotes Bash will expand variables and arrays with word-splitting, which is rarely desirable and often leads to bugs. In your code it's likely that whitespace arguments will break things. It's a great idea to run any shell code through ShellCheck and address the issues it flags.
file="/tmp/$(printf "%s" ${@})" is unsafe, as it munges arguments together without a delimiter, e.g. it will treat foo 12 3 and foo 1 23 as the same set of arguments. It also will fail if any arguments contain invalid filename characters (notably /), which is (part of) why bash-cache hashes the arguments.
It's probably a good idea to asynchronously delete the cache file when it's too old, instead of making multiple date calls in subshells to determine whether the file is still valid. I'd have to benchmark to be sure but I suspect it will be faster.
${1} ${@:2} isn't necessary, just say "$@" (note the quotes)
Bash doesn't support nested/anonymous functions, so what you're doing in bbusers is actually just re-defining _users in the top-level namespace on every invocation, which isn't necessary or helpful. Just define _users as a normal function and then: bbusers() { cache _users "$@"; }.

Question 3

All very good tips! As for your "$@" - I misunderstood what it meant by splitting. My intent is "give it to me, as is" so, if someone had ab "c e" d that is exactly what I wanted passed in. I will address these all. And, if you're interested, I have this in a repo github.com/chb0github/bashful - i'd welcome more input

Question 4

BTW: I am not sure of the wisdom of caching errors - errors usually respond fast and don't need caching and frequently they are transient (so, you'd want to try again). So, if I run a command with the same args, It would have to know, per function implementation, when to invalidate the cache and not

Question 5

You're right that caching errors may not be desirable in certain circumstances, but in the shell many commands use exit codes to communicate more than just errors. For example grep returns a non-zero exit code if no match is found. My goal with bash-cache is to provide cached results that are as high-fidelity as possible, and work for all commands, rather than make an opinionated decision about how errors should be handled that may or may not be desirable in certain circumstances. But feel free to file a FR or send a PR if you'd like to have a bc::cache_success variant :)

dimo414 dimo414 4155 silver badges21 bronze badges · Answer 1 · 2020-04-13 21:21:26Z

As the "guy" in question, I'm happy to discuss my approach :) feel free also to file issues on GitHub if you have feedback.

You are certainly correct that decorating functions as bash-cache does isn't necessary, but the intent is to simplify the development experience - to memoize a function all you have to do is make a call to bc::cache, and bash-cache (generally) handles the rest. No need to keep separate functions in sync or wrestle with naming schemes. Whether that's a feature or overkill is in the eye of the beholder, but I find it very flexible and expressive :)

As you've shown it's not too complicated to implement a reasonable caching mechanism in Bash, but the devil is really in the details. With bash-cache I've addressed a number of very subtle issues that most caching implementations fail to handle, including:

stdout and stderr are both cached and preserved separately (your approach only caches stdout, other approaches merge the two streams with >&)
the return code of the function is also preserved, few other implementations I've seen do this (correctly)
avoids a common gotcha with command substitutions ($(...)), which are actually lossy
stale cache data is regularly cleaned up in the background
caches are separated by user and chmod-ed to only be readable by that user; a global cache risks leaking sensitive data

Since this is CodeReview.SE, some tips on your approach:

Be sure to consistently quote all variables, including arrays like "$@"; without quotes Bash will expand variables and arrays with word-splitting, which is rarely desirable and often leads to bugs. In your code it's likely that whitespace arguments will break things. It's a great idea to run any shell code through ShellCheck and address the issues it flags.
file="/tmp/$(printf "%s" ${@})" is unsafe, as it munges arguments together without a delimiter, e.g. it will treat foo 12 3 and foo 1 23 as the same set of arguments. It also will fail if any arguments contain invalid filename characters (notably /), which is (part of) why bash-cache hashes the arguments.
It's probably a good idea to asynchronously delete the cache file when it's too old, instead of making multiple date calls in subshells to determine whether the file is still valid. I'd have to benchmark to be sure but I suspect it will be faster.
${1} ${@:2} isn't necessary, just say "$@" (note the quotes)
Bash doesn't support nested/anonymous functions, so what you're doing in bbusers is actually just re-defining _users in the top-level namespace on every invocation, which isn't necessary or helpful. Just define _users as a normal function and then: bbusers() { cache _users "$@"; }.

All very good tips! As for your "$@" - I misunderstood what it meant by splitting. My intent is "give it to me, as is" so, if someone had ab "c e" d that is exactly what I wanted passed in. I will address these all. And, if you're interested, I have this in a repo github.com/chb0github/bashful - i'd welcome more input
BTW: I am not sure of the wisdom of caching errors - errors usually respond fast and don't need caching and frequently they are transient (so, you'd want to try again). So, if I run a command with the same args, It would have to know, per function implementation, when to invalidate the cache and not
You're right that caching errors may not be desirable in certain circumstances, but in the shell many commands use exit codes to communicate more than just errors. For example grep returns a non-zero exit code if no match is found. My goal with bash-cache is to provide cached results that are as high-fidelity as possible, and work for all commands, rather than make an opinionated decision about how errors should be handled that may or may not be desirable in certain circumstances. But feel free to file a FR or send a PR if you'd like to have a bc::cache_success variant :)

Stack Exchange Network

Comparing two bash caches

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Comparing two bash caches

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions