I've been writing with C for a couple days, so this is a relatively advanced function compared to the simple things I've been doing previously. It's written for the Lua API, but it's still C.
#include "lua.h"
#include "lauxlib.h"
#include <stdlib.h>
#include <string.h>
static int l_rtrim(lua_State *L) {
luaL_checktype(L, 1, LUA_TSTRING);
luaL_checktype(L, 2, LUA_TSTRING);
const char *string = lua_tostring(L, 1);
const char *delimiter = lua_tostring(L, 2);
if (*string == '0円') {
lua_rotate(L, -2, 1);
return 1;
}
size_t len = strlen(string);
size_t dlCnt = 0;
size_t tmpLen = len - 1;
if (tmpLen <= 0) {
lua_rotate(L, -2, 1);
return 1;
}
char *copy = malloc(len + 1);
if (copy == NULL) {
lua_pushstring(L, "out of memory");
lua_error(L);
return 1;
}
memcpy(copy, string, len + 1);
while (copy[tmpLen--] == *delimiter) dlCnt++;
if (dlCnt > 0) {
size_t location = len - dlCnt;
if (location > 0) {
copy[location] = '0円';
lua_pushstring(L, copy);
}
else { /* the entire string is composed of the delimiter */
lua_pushliteral(L, "");
}
}
free(copy);
return 1;
}
static const struct luaL_Reg exports[] = {
{"rtrim", l_rtrim},
{NULL, NULL}
};
extern __declspec(dllexport) int luaopen_MyModule(lua_State *L) {
luaL_newlib(L, exports);
return 1;
}
My function counts how many delimiters trail on the right side, so I can insert a null-terminator before any of those delimiters arrive. It's very inefficient (2.5x slower) compared to my implementation of left-trimming (leading spaces) because in that, I just perform a pointer increment.
My main interest is avoiding UB (because I'm not familiar for when it's invoked) and if there's a method I can use to avoid creating a new string & performing so many allocations & copying during benchmarking. I played around a lot with pointer decrementing but it's not working the way I hoped.
Some example inputs and outputs:
("??hello world??", "?") -> "??hello world"
(" hello world ", " ") -> " hello world"
-
2\$\begingroup\$ Welcome to Code Review! Could you also include a few example inputs and outputs to illustrate what the desired behavior is? \$\endgroup\$200_success– 200_success2022年03月15日 04:53:01 +00:00Commented Mar 15, 2022 at 4:53
-
\$\begingroup\$ You seem to be missing some necessary header includes. It will help reviewers if they can see a complete source file. \$\endgroup\$Toby Speight– Toby Speight2022年03月15日 08:02:27 +00:00Commented Mar 15, 2022 at 8:02
-
\$\begingroup\$ @TobySpeight Sure, I can edit the necessary headers, but the complete file is several hundred lines and I only want a review on this function because my instinct tells me it can be greatly improved :) \$\endgroup\$wellinthatcase– wellinthatcase2022年03月15日 14:17:30 +00:00Commented Mar 15, 2022 at 14:17
-
1\$\begingroup\$ @200_success I've added a couple. I added the module registration code too. \$\endgroup\$wellinthatcase– wellinthatcase2022年03月15日 14:21:09 +00:00Commented Mar 15, 2022 at 14:21
1 Answer 1
Unexpected output
Your code returns the original string if it consists of a single delimiter character:
rtrim(".", ".") --> "."
I find that illogical. If it is the wanted behavior then it should probably be documented. Otherwise remove the
size_t tmpLen = len - 1;
if (tmpLen <= 0) {
lua_rotate(L, -2, 1);
return 1;
}
part.
Undefined behavior
while (copy[tmpLen--] == *delimiter) dlCnt++;
may access memory "before" the start of the string if the string consists only of the delimiter character.
I would always enclose if/while/for blocks in curly parentheses, even if they consist of a single statement.
A bug
If the string does not end with the delimiter character then the delimiter is returned instead of the original string:
rtrim("abc", ".") --> "."
The reason is that if dlCnt == 0
then no result string is pushed onto the stack.
Embedded zeros
Lua strings can contain arbitrary bytes, including the null byte. But
size_t len = strlen(string);
treats the null byte as a delimiter, as it would be appropriate for C strings:
rtrim("abc.def.0円def...", ".") --> "abc.def"
The fix is to use
size_t len = lua_rawlen(L, 1);
instead, or
size_t len;
const char *string = lua_tolstring(L, 1, &len);
Return the original string if possible
Your code returns the original string if that is empty, thereby avoiding the copy and possible allocation in lua_pushstring()
:
if (*string == '0円') {
lua_rotate(L, -2, 1);
return 1;
}
Well, I had to think twice in order to understand how it works, I would perhaps write it as
if (*string == '0円') {
// Pop delimiter from stack and return original string:
lua_pop(L, 1);
return 1;
}
In addition, the same logic could be applied not only for empty string, but generally if the string does not end with the delimiter string.
Avoiding the copy
Pushing the truncated string onto the stack does not require null termination if one uses lua_pushlstring
instead, so that the source string need not be copied and modified.
Putting it all together
... we get the following function, which is simpler, less memory-consuming, and does not suffer from the mentioned bugs:
static int l_rtrim(lua_State *L) {
luaL_checktype(L, 1, LUA_TSTRING);
luaL_checktype(L, 2, LUA_TSTRING);
size_t len;
const char *string = lua_tolstring(L, 1, &len);
const char *delimiter = lua_tostring(L, 2);
size_t trimmed_len = len;
while (trimmed_len > 0 && string[trimmed_len - 1] == *delimiter) {
trimmed_len--;
}
if (trimmed_len < len) {
// Push and return trimmed string:
lua_pushlstring(L, string, len);
} else {
// Pop delimiter from stack and return original string:
lua_pop(L, 1);
}
return 1;
}
-
\$\begingroup\$ Thank you for the criticism and advice. I've taken it well and reimplemented a much more readable, smaller, and faster solution. Performance went up 16% & readability is through the roof.
lua_pushlstring
does internally perform a copy however, I don't think I can avoid that without messing up the internalized string Lua stores. \$\endgroup\$wellinthatcase– wellinthatcase2022年03月16日 11:02:57 +00:00Commented Mar 16, 2022 at 11:02