The discussion in the comments of this answer made me wonder how hard it would be to write a built-in only version of dirname
.
This is what I ended up with but I'm reasonably confident it should be able to be improved (I haven't spent much time thinking about it yet).
Any improvements/suggestions are welcome.
#!/bin/sh
p=1ドル
alt=2ドル
skip78=
stripslashes() {
i=1ドル
while [ "$i" != "$o" ]; do
o=$i
i=${i%/}
done
eval "2ドル=\$i"
}
[ "$p" = '//' ] || {
case "$p" in
*[!/]*|'')
stripslashes "$p" p
case "$p" in
*/*)
p=${p%/*}
;;
*)
p=.
skip78=skip78
;;
esac
;;
*)
p=/
skip78=skip78
;;
esac
}
[ -n "$skip78" ] || {
{ [ "$p" != '//' ] || [ -z "$alt" ]; } && {
stripslashes "$p" p
[ -z "$p" ] && p=/
}
}
printf -- %s\\n "$p"
The POSIX spec for dirname
is:
If string is
//
, skip steps 2 to 5.If string consists entirely of <slash> characters, string shall be set to a single <slash> character. In this case, skip steps 3 to 8.
If there are any trailing <slash> characters in string, they shall be removed.
If there are no <slash> characters remaining in string, string shall be set to a single <period> character. In this case, skip steps 5 to 8.
If there are any trailing non- <slash> characters in string, they shall be removed.
If the remaining string is
//
, it is implementation-defined whether steps 7 and 8 are skipped or processed.If there are any trailing <slash> characters in string, they shall be removed.
If the remaining string is empty, string shall be set to a single <slash> character.
Update: Made a gist.
2 Answers 2
I don't think there's a really compelling reason for this eval
in stripslashes
:
eval "2ドル=\$i"
It's a dirty hack to update the parameter variable in-place.
But this is not a common practice, I don't know other commands or built-ins that work that way, and I would find such usage unexpected.
I would find it more usable if stripslashes
behaved more like other commands do:
- print the result on
stdout
- set the result in a designated variable (like
REPLY
)
That would be more in-line with common UNIX practices, and unsurprising, familiar.
As a minor nitpick, I avoid quoting when it's unnecessary.
For example instead of [ "$p" != '//' ]
, I would write [ "$p" != // ]
Instead of this:
[ -z "$p" ] && p=/
This form is shorter:
[ "$p" ] || p=/
This might be subjective, but I would rewrite this part using regular if-else:
[ -n "$skip78" ] || { { [ "$p" != '//' ] || [ -z "$alt" ]; } && { stripslashes "$p" p [ -z "$p" ] && p=/ } }
-
\$\begingroup\$ We can't use stdout for stripslashes because that requires a sub-shell and at that point we are back to "just use dirname". We could just write into
p
directly by name but that makesstripslashes
a one-use helper function as opposed to being generally useful written this way. Withbash
as opposed tosh
I'd useprintf -v
there instead ofeval
. As to the quoting and the-z
test those were to avoid possibly confusing people less familiar with the shell though[ -n "$p" ] || p=/
might be better for that purpose as I imagine-n
is much more commonly used than-z
. \$\endgroup\$Etan Reisner– Etan Reisner2015年10月16日 20:57:12 +00:00Commented Oct 16, 2015 at 20:57 -
1\$\begingroup\$ But actually I just realized that
: "${p:=/}"
is clearly the right way to do that bit. \$\endgroup\$Etan Reisner– Etan Reisner2015年10月16日 20:58:45 +00:00Commented Oct 16, 2015 at 20:58
Whenever one sets out to reimplement some existing functions, it is wise to prepare a test suite to ensure the old behaviors have been faithfully preserved. I begin my review by presenting my test script that generates test cases using fuzzing and hand-picked edge cases:
#!/bin/sh
for dir in '' '/' '//' $(tr -dc abc/ </dev/urandom | head -c800 | split -b8 --filter='cat; echo' -)
do
printf '%s' "Testing with input '$dir'..."
my_output=$(./dirname.sh "$dir")
reference_output=$(dirname "$dir")
if [ "$my_output" != "$reference_output" ]
then
printf '%s\n' "my output ($my_output) differs from the reference output ($reference_output)!"
exit 1
else
printf '%s\n' 'OK!'
fi
done
It assumes the reimplemented function is named dirname.sh
with the read and execute bits set, and is placed in and run from the same directory as the test script.
Congratulations!
Your solution passes my tests!
This gives us confidence in the correctness of your implementation.
However, it turns out my test suite wasn't perfect, as it missed out on an extreme edge case that may or may not be important for you.
If the user has set the nounset
shell option, which may be inherited by scripts in the case of bash at least, or from the current environment if you decide to turn this into a function, your implementation will fail with an error message that may look like this:
./dirname.sh: line 4: 2ドル: unbound variable
# or
./dirname.sh: line 9: o: unbound variable
while the original dirname
utility will not be affected by the presence or absence of this shell option, unless an undefined variable is passed directly as an argument to it.
One little nitpick: nowhere is it mentioned on the printf
page that the utility shall conform to XBD Utility Syntax Guidelines, so it is unclear if --
works as intended on any POSIX-conforming printf
implementation.
As the other answer has mentioned, eval
should be seen as a last resort hack because it is extremely easy to shoot yourself in the foot with this potentially dangerous command, and you're even using it on unsanitized user input!
I gave the challenge a try and came up with my own solution that, in my opinion, is more readable and follows the outlined steps more closely.
It does not spawn any subshells, staying true to your intentions.
It also doesn't use any eval
, and uses only one local variable that I promptly clean up to not clutter the environment so that it can readily be turned into a litter-free POSIX function.
The environment variables SKIP_STEP_7
and SKIP_STEP_8
can be set to mimick implementation-dependent behaviors that are allowed by the spec.
Here is my solution presented as an overall improvement (no function call cost and fewer lines of code, etc.):
#!/bin/sh
dir=1ドル
case $dir in
'//') # Step 1
;;
*[!/]* | '')
dir=${dir%"${dir##*[!/]}"} # Step 3
case $dir in
*/*)
dir=${dir%"${dir##*/}"} # Step 5
if [ "$dir" != '//' ]
then
# The following steps are NOT implementation defined!
dir=${dir%"${dir##*[!/]}"} # Step 7
dir=${dir:-/} # Step 8
fi
;;
*)
dir=. # Step 4
;;
esac
;;
*)
dir=/ # Step 2
;;
esac
if [ "$dir" = '//' ] # Step 6
then
# The following steps ARE implementation defined!
[ "${SKIP_STEP_7+skip}" ] || dir=${dir%"${dir##*[!/]}"} # Step 7
[ "${SKIP_STEP_8+skip}" ] || dir=${dir:-/} # Step 8
fi
printf '%s\n' "$dir"
unset -v dir
dirname
. So anything that needs a sub-shell mostly negates the purpose. In theory this script would be turned into a function likestripslashes
that operates on its arguments. \$\endgroup\$