Got a makefile whith this command which convert folder names on ./cmd/
from snake_case
to PascalCase
test:
@for f in $(shell ls ./cmd/); do \
echo $${f}; \
echo $${f} | sed -r 's/(^|_)([a-z])/\U2円/g'; \
done
What I get when I run it is, with a prefixed uppercase U
:
api_get_manual
UapiUgetUmanual
And what I expect to get:
ApiGetManual
2 Answers 2
\U
, like -r
(for which -E
is now the standard equivalent) is a non-standard extension of the GNU implementation of sed
, inspired from ex
/vi
, also found in perl
, not found in many other implementations.
Here, instead, you could do:
SHELL = zsh
test:
@for f (cmd/*(N:t)) print -rl -- $$f $${$${(C)f}//_}
Using:
cmd/*(N:t)
to expand the glob in aN
ullglob fashion, getting thet
ail of every expansion.${(C)var}
to capitalise words in the variable${var//_}
à la ksh to remove_
characters afterwiseprint -rl --
to printr
aw on separatel
ines.
Note that file names are decoded into text and converted to uppercase as per the user's locale (LC_CTYPE category).
The above, for every sequence of one or more alphanumerical characters, converts the first character to uppercase, and all the rest to lower case and removes all underscores.
A closer match to your approach that only removes the underscores that are followed by a lowercase letter (and convert only that letter to uppercase, leaving the rest alone):
SHELL = zsh
test:
@set -o extendedglob; for f (cmd/*(N:t)) \
print -rl -- $$f $${f//(#b)((#s)|_)([[:lower:]])/$$match[2]:u}
Where
(#b)
is to activate back references, so capture groups can be referenced in the$match
array in the replacement(#s)
fors
tart, the equivalent of regex^
[[:lower:]]
matches character classified as lowercase like in regexps.[a-z]
to restrict to those betweena
andz
which inzsh
is done based on codepoint value so limited to abcdefghijklmnopqrstuvwxyz$var:u
to convert to uppercase like in csh, honouring the locale.
Without zsh
:
test:
@CDPATH= cd cmd && \
perl -le 'for (<*>) {print; s/[[:alnum:]]+/\u\L$$&/g; s/_//g; print}'
Assumes ASCII only letters (stéphane
would be changed to StéPhane
for instance as é
is not recognised as a letter).
Or like in your approach:
test:
@CDPATH= cd cmd && \
perl -le 'for (<*>) {print; s/(^|_)([a-z])/\U$2ドル/g; print}'
If limited to POSIX utilities, you could use awk
to do the capitalising:
test:
@CDPATH= cd cmd && awk -- ' \
BEGIN {for (i = 1; i < ARGC; i++) { \
arg = ARGV[i]; out = ""; \
print arg; \
while (match(arg, /[[:alnum:]]+/)) { \
out = out \
substr(arg, 1, RSTART - 1) \
toupper(substr(arg, RSTART, 1)) \
tolower(substr(arg, RSTART+1, RLENGTH - 1)); \
arg = substr(arg, RSTART+RLENGTH)}; \
out = out arg; \
gsub("_", "", out); \
print out \
} \
}' *
Like with zsh, it will honour the locale for decoding filenames as text, classifying characters as alnum
and converting to uppercase.
To match your approach:
test:
@CDPATH= cd cmd && awk -- ' \
BEGIN {for (i = 1; i < ARGC; i++) { \
arg = ARGV[i]; out = ""; x = 0; \
print arg; \
while (match(arg, (x++ ? "_" : "(^|_)") "[[:lower:]]")) { \
out = out \
substr(arg, 1, RSTART-1) \
toupper(substr(arg, RSTART+RLENGTH-1, 1)); \
arg = substr(arg, RSTART+RLENGTH)}; \
out = out arg; \
gsub("_", "", out); \
print out \
} \
}' *
A few other notes:
- Your
$(shell ...)
is expanded bymake
into the code that as passed without any form of sanitisation, so that won't work for file names that have characters that are special in the syntax of the shell such as space,;
,*
,'
, etc. In fact that's a typical case of arbitrary code execution vulnerability. But then again, when usingmake
you have to give up and hope of doing anything safely or reliably. It should really only be used with strictly controlled data (here it may be fine if you can guarantee that thecmd
directory will only contain the files that you expect it to). echo
can't be used for arbitrary data- in shells other than
zsh
, includingsh
the default shell formake
, parameter expansions much be quoted to prevent split+glob, so$${f}
should be"$$f"
(or"$${f}"
if you prefer).
-
The awk script at least would remove underscores that don't precede lower case letters and would upper case letters that follow other non-alphanumeric chars than underscore. I don't know if the OP can have those cases nor, if so, how they'd want them handled so I asked in a comment.Ed Morton– Ed Morton2025年04月22日 10:52:20 +00:00Commented Apr 22 at 10:52
-
@EdMorton, yes, all three would do that to mimic the
C
apitalisation parameter expansion flag of zsh and delete_
afterwards. Usual way to do snake to camel case, though for file names capitalising the extension may be undesirable.Stéphane Chazelas– Stéphane Chazelas2025年04月22日 11:38:49 +00:00Commented Apr 22 at 11:38 -
Your comment tells us that:
- You aren't using GNU sed, which is required for
\U
. - Your problem has nothing to do with calling sed from a Makefile since you get the same behavior just calling sed directly on the command line.
Instead of relying on GNU sed you could do this using any awk in any shell on every Unix box:
$ echo 'api_get_manual' |
awk '{
r = "_" 0ドル
while ( match(r, /_[a-z]/) ) {
r = substr(r,1,RSTART-1) toupper(substr(r,RSTART+1,1)) substr(r,RSTART+RLENGTH)
}
sub(/^_/, "", r)
print r
}'
Api_Get_Manual
Here's the above running on some input that's not covered by the example in the question so you can decide if the output is desirable or not:
$ cat file
api_get_manual
this_7
_That
foo:bar
foo.pdf
bar.c
awk '{
r = "_" 0ドル
while ( match(r, /_[a-z]/) ) {
r = substr(r,1,RSTART-1) toupper(substr(r,RSTART+1,1)) substr(r,RSTART+RLENGTH)
}
sub(/^_/, "", r)
print 0ドル "\t-> " r
}' file
api_get_manual -> ApiGetManual
this_7 -> This_7
_That -> _That
foo:bar -> Foo:bar
foo.pdf -> Foo.pdf
bar.c -> Bar.c
To use either of the above in a Makefile 0ドル
needs to become $0ドル
and the awk script has to logically all be on 1 line so you need to add a couple of ;
s and escape the newlines within the script, e.g. (untested):
awk '{ \
r = "_" $0ドル; \
while ( match(r, /_[a-z]/) ) { \
r = substr(r,1,RSTART-1) toupper(substr(r,RSTART+1,1)) substr(r,RSTART+RLENGTH) \
} \
sub(/^_/, "", r); \
print r \
}'
-
1Beware on most systems,
[a-z]
matches hundreds of characters besides the 26 letters without diacritics as used in ASCII, some of which don't have an uppercase form, so you could run in an infinite loop here. Recent versions of GNU awk have switched back to[a-z]
being the same as[abcdefghijklmnopqrstuvwxyz]
regardless of the locale (but still honours the locale for conversion from lower to upper case). Seeinfo gawk 'Ranges and Locales'
for details.Stéphane Chazelas– Stéphane Chazelas2025年04月22日 12:46:50 +00:00Commented Apr 22 at 12:46 -
@StéphaneChazelas I understand the
[a-z]
matching issue but I'm just trying to re-use the OPs code where I don't NEED to change it to address their specific problem, thereby hopefully making what NEEDS to change more obvious, which is why I didn't use[[:lower:]]
instead. Meanwhile thematch()
is searching for an underscore while the loop body is removing each underscore found by thematch()
so I'm not seeing how it could be an infinite loop.Ed Morton– Ed Morton2025年04月22日 13:46:44 +00:00Commented Apr 22 at 13:46 -
You're right, I had missed it removed the underscores. At worse, it would remove some consecutive underscores.Stéphane Chazelas– Stéphane Chazelas2025年04月22日 14:22:10 +00:00Commented Apr 22 at 14:22
sed
, but the defaultsed
on your system is not GNU. What Unix are you running this on?-r
GNU extension. Maybe that's toybox (Android) or busybox which both copied GNU's-r
but not\U
.echo 'api_get_manual' | sed -r 's/(^|_)([a-z])/\U2円/g'
outside of the Makefile. Also add the output ofsed --version
.MacOS
withzsh
. The output of that is exactly the same as the one showed on the example. On MacOS is not possible to check thesed --version
stackoverflow.com/a/37639736/4886775this_7
,_That
, andfoo:bar
becomethis7
,That
, andFooBar
or something else?