My POSIX is_integer ()
function looks like this for a long time:
#!/bin/sh
is_integer ()
{
[ "1ドル" -eq "1ドル" ] 2> /dev/null
}
However, today, I found it broken. If there are some spaces around the number, it surprisingly also evaluates to true
, and I have no idea how to fix that.
Example of correct (expected) behavior:
is_integer 123
evaluates to true
.
Example of incorrect (unexpected) behavior:
is_integer ' 123'
also evaluates to true
, however it obviously contains a leading space, thus the function is expected to evaluate to false
in such cases.
POSIX-compliant suggestions only, please. Thank you.
3 Answers 3
#!/bin/sh
is_integer ()
{
case "${1#[+-]}" in
(*[!0123456789]*) return 1 ;;
('') return 1 ;;
(*) return 0 ;;
esac
}
Uses only POSIX builtins.
It is not clear from the spec if +1
is supposed to be an integer, if not then remove the +
from the case
line.
It works as follows. the ${1#[+-]}
removes the optional leading sign. If you are left with something containing a non digit then it is not an integer, likewise if you are left with nothing. If it is not not an integer then it is an integer.
Edit: change ^ to ! to negate the character class - thanks @LinuxSecurityFreak
-
I don't get why ^ was changed to !. Isn't ^ used to negate the class? See RE bracket expressionsQuasímodo– Quasímodo2020年07月12日 11:34:27 +00:00Commented Jul 12, 2020 at 11:34
-
True! Just for the record, here the exception is documented in Shell Command Language, Pattern Matching Notation.Quasímodo– Quasímodo2020年07月12日 11:43:57 +00:00Commented Jul 12, 2020 at 11:43
-
Your tests should have some multi-digit strings in them. You might want to extend the patterns to reject leading zeros (accept '0', reject '0'*).icarus– icarus2020年07月12日 21:57:56 +00:00Commented Jul 12, 2020 at 21:57
-
1@fpmurphy That is true that you can use [:digit:] but I would rather not have 123๔ as an integer because the last character is a digit 4 in Thai. In the original version of this post I use
[^0-9]
but changed it to explicitly list the characters I wanted to use in the definition of an integer.icarus– icarus2020年07月12日 22:54:03 +00:00Commented Jul 12, 2020 at 22:54 -
1@fpmurphy, on some BSDs,
[[:digit:]]
will match all decimal digits, not just the Arabic / ASCII ones, even in US English locales.Stéphane Chazelas– Stéphane Chazelas2022年04月25日 05:44:29 +00:00Commented Apr 25, 2022 at 5:44
Not the most efficient (due to the external command), but quite simple:
is_integer () {
expr "X1ドル" : "X-\{0,1\}[0-9][0-9]*$" > /dev/null
}
At least in the implementation I am testing, an initial argument -
is treated not as part of a matching operation, but apparently as part of an invalid arithmetic expression; the X
ensures expr
parses its arguments as a valid match operation.
-
Don't use
[0-9]
for input validation, it often matches thousands of characters some of which can cause nasty problems if not filtered out. Use[0123456789]
instead.Stéphane Chazelas– Stéphane Chazelas2022年04月25日 05:46:03 +00:00Commented Apr 25, 2022 at 5:46
A more complete solution would be as follows:
is_integer() (
export LC_ALL=C
local n=${1#[-+]}
case "$n" in
0[0-7]*) case "$n" in 0*[!0-7]*) return 1;; esac;;
0[xX]*) case "$n" in 0[xX]|0[xX]*[!0-9a-fA-F]*) return 1;; esac;;
*) case "$n" in ''|*[!0-9]*) return 1;; esac;;
esac
)
This strips any leading sign and then parses the string depending upon rather or not it has a prefix of 0
, 0x
or 0X
. Thus, one should be aware not to have arbitrary leading zeros on a value that will be used as decimal number.
$ echo $((01))
1
$ echo $((08))
-ash: arithmetic syntax error
-
Don't use ranges like
[0-7]
for input validation! They often match thousands of characters some of which can cause nasty problems if not ruled out. Use[01234567]
. Note thatlocal
is not POSIX.Stéphane Chazelas– Stéphane Chazelas2022年04月25日 05:41:00 +00:00Commented Apr 25, 2022 at 5:41 -
1Octal and hexadecimal are allowed in POSIX sh arithmetic expressions or in POSIX
printf %d
arguments, not in[
's-eq
operand, but then again leading blanks are allowed in most of those as well. The OP didn't specify where the numbers were going to be used after being sanitised.Stéphane Chazelas– Stéphane Chazelas2022年04月25日 05:49:19 +00:00Commented Apr 25, 2022 at 5:49 -
1See for instance How to ensure user input consists of exactly 6 digits or Rename special characters in filenames to underscore or regex pattern issue for digit validation in ksh for instance here.Stéphane Chazelas– Stéphane Chazelas2022年04月27日 05:16:33 +00:00Commented Apr 27, 2022 at 5:16
-
1@StéphaneChazelas Those aren't sources, those are links to you making similar comments. I've tried to reproduce the incorrect matching, but of the ~1000 characters that would supposedly match incorrectly, I cannot get a single match in any shell or regex implementation I've tried. Do you have a specific reproducible example in which the matching goes wrong?FWDekker– FWDekker2024年02月15日 21:44:34 +00:00Commented Feb 15, 2024 at 21:44
-
2@FWDekker try for instance
LC_ALL=en_US.UTF-8 bash -c '[[ ۸ = [0-9] ]] && echo yes'
on Ubuntu 22.04 for instance (same with the 1000+ other characters mentioned in those other questions). YMMV with the tool, libc, OSes and versions thereof.Stéphane Chazelas– Stéphane Chazelas2024年02月18日 17:35:13 +00:00Commented Feb 18, 2024 at 17:35
-eq
quite a lot wider than to just integers. Stuff likeabc
(the value on variableabc
),12.345
(floating point),1+1
(arithmetic expression) get accepted.