I wanted to use 2.6.2 Parameter Expansion to remove leading characters from a string, but was surprised to find out that "Remove Largest Prefix Pattern" doesn't automatically repeat the pattern.
$ x=aaaaabc
$ printf %s\\n "${x##a}"
aaaabc
As you can see, only the first a
has been removed. Expected output was bc
for any of x=bc
, x=abc
, x=aabc
, x=aaabc
or x=aaaabc
.
I'm struggling to figure out how I have to write the pattern if I want to remove as many a
as possible from the beginning of $x
. I had no luck searching for other threads either, because many answers use bash, but I'm looking for a POSIX shell solution.
3 Answers 3
For certain patterns you might be able to "reverse" the pattern by matching the part of the variable that you want to keep:
$ for x in "" a aa abc aabc aaabc aaabca aaabcabc bc bcaa
> do
> printf %s\\n "${x#"${x%%[!a]*}"}"
> done
bc
bc
bc
bca
bcabc
bc
bcaa
-
Ah yes, keep everything starting with the first character that‘s not "a"!Stephen Kitt– Stephen Kitt2022年10月07日 14:23:26 +00:00Commented Oct 7, 2022 at 14:23
-
For which
patterns
will this fail? Do you mean repeated multi character strings ?QuartzCristal– QuartzCristal2022年10月07日 17:47:38 +00:00Commented Oct 7, 2022 at 17:47 -
@QuartzCristal this isn't a generalized solution, so it's easier to describe the patterns where it can succeed than those where it can fail. This approach only works if you can write a pattern to match the part of the string you want to keep. In this case, you want to keep the longest possible match that starts with "not a" followed by anything (i.e.
*
). (For cases where it can work, it's very clever, +1 from me!)Wildcard– Wildcard2022年10月07日 18:28:03 +00:00Commented Oct 7, 2022 at 18:28 -
@Wildcard Yes, I understand, it is not easy to describe, and yes, +1 from me as well. But an example of what the
certain patterns
means, I mean, where it fails, would make this answer a lot more clear.QuartzCristal– QuartzCristal2022年10月07日 19:13:53 +00:00Commented Oct 7, 2022 at 19:13 -
Quite nice.
reasonedExit () { pattern="${1:-}" ; status="${1:-}" ; shift ; pattern="${pattern#"${pattern%%[![:digit:]]*}"}" ; status="${status%"${pattern?}"}" ; printf "${pattern?}" ${1+"${@?}"} ; exit "${status?}" ; } ; reasonY="12Reason over 9000! So much reason." ; reasonedExit "${reasonY?}" ;
immeëmosol– immeëmosol2025年07月21日 22:17:51 +00:00Commented Jul 21 at 22:17
I don’t think you can do this in a generic fashion (i.e. ignoring specific features of the pattern), using only POSIX shell constructs, without using a loop:
until [ "${x#a}" = "$x" ]; do x="${x#a}"; done
a
as a pattern matches a
, there's no way it can match aaa
.
While the POSIX sh
specification is based on a subset of the Korn shell, and the Korn shell has *(foo)
(matches a sequence of 0 or more foo
s) and +(foo)
operators (matches a sequence of one or more foo
s, same as foo*(foo)
), those were not specified by POSIX as they're not backward compatible with those of the Bourne shell, and means there are a number of contexts where they couldn't be used like in:
find . -name '*(x)'
which is currently required to match on filenames that end in(x)
pattern='*(x)'; case $file in ($pattern) ...; esac
or${file##$pattern}
. Same. You'll notice that ksh88 or pdksh do not recognise those operators in those cases.
Repetition is supported in regular expressions. POSIX specifies a number of utilities that can match regular expressions (expr
, grep
, sed
, awk
...). Some shells have or have had some of those builtin. expr
was built in (or could be made builtin) the Almquist shell. ksh93
can be built with expr
, grep
and sed
builtin and can get their output without forking. Some ash-based shells can also get the output of command substitutions without forking when it's made of one invocation of a builtin command. The busybox
shell is another example of a shell where all those utilities can be invoked without forking nor executing.
On the other hand, printf
which you use in your question is not builtin in ksh88 nor most pdksh derivatives. Appart from the special builtins and builtins such as export
/getopts
/read
... which can only reasonably be builtin, POSIX does not give you guarantee that a command may or may not be builtin.
So:
x=$( expr "x$x" : 'xa*\(.*\)' )
For instance could strip the leading a
s internally in the shell. With a couple of caveats though:
- that returns with a failure exit status if the result is an empty string or some representations of 0
- that also strips trailing newline characters.
- you'll have noticed the
x
prefix we also need to add to prevent that to fail if$x
happens to contain anexpr
operator (see the Application Usage section of POSIXexpr
specification for more details on that).
Or with awk
:
x=$(awk 'BEGIN {sub(/^a*/, "", ARGV[1]); print ARGV[1]}' "$x")
Or sed
:
x=$(printf '%s\n' "$x" | sed '1s/^a*//')
(sed
being the least appropriate here as it works line-based and need to be fed its input via stdin or a file).
-
1Why is
x$x
needed? In which conditions could it fail?QuartzCristal– QuartzCristal2022年10月07日 17:46:19 +00:00Commented Oct 7, 2022 at 17:46 -
1@QuartzCristal, see edit. You can try with
x='('
,x=+
,x=index
with various implementations.Stéphane Chazelas– Stéphane Chazelas2022年10月07日 18:20:48 +00:00Commented Oct 7, 2022 at 18:20 -
I see, thanks @StéphaneChazelasQuartzCristal– QuartzCristal2022年10月07日 19:10:11 +00:00Commented Oct 7, 2022 at 19:10
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
shopt -s extglob; printf '%s\n' "${x##+(a)}"
-- ref 3.5.8.1 Pattern Matching