How to iterate individual characters in Lua string?

Question 1

I have a string in Lua and want to iterate individual characters in it. But no code I've tried works and the official manual only shows how to find and replace substrings :(

str = "abcd"
for char in str do -- error
 print( char )
end
for i = 1, str:len() do
 print( str[ i ] ) -- nil
end

Question 2

In lua 5.1, you can iterate of the characters of a string this in a couple of ways.

The basic loop would be:

for i = 1, #str do
 local c = str:sub(i,i)
 -- do something with c
end

But it may be more efficient to use a pattern with string.gmatch() to get an iterator over the characters:

for c in str:gmatch"." do
 -- do something with c
end

Or even to use string.gsub() to call a function for each char:

str:gsub(".", function(c)
 -- do something with c
end)

In all of the above, I've taken advantage of the fact that the string module is set as a metatable for all string values, so its functions can be called as members using the : notation. I've also used the (new to 5.1, IIRC) # to get the string length.

The best answer for your application depends on a lot of factors, and benchmarks are your friend if performance is going to matter.

You might want to evaluate why you need to iterate over the characters, and to look at one of the regular expression modules that have been bound to Lua, or for a modern approach look into Roberto's lpeg module which implements Parsing Expression Grammers for Lua.

Question 3

Thanks. About lpeg module you have mentioned - does it save tokens positions in original text after tokenization? The task i need to perform is to syntax highlight specific simple language in scite via lua (with no compiled c++ parser). Also, how to install lpeg? Seems it has .c source in distribution - does it need to be compiled alongside lua?

Question 4

Building lpeg will produce a DLL (or .so) that should be stored where require can find it. (i.e. somewhere identified by the content o f the global package.cpath in your lua installation.) You also need to install its companion module re.lua if you want to use its simplified syntax. From an lpeg grammar, you can get callbacks and capture text in a number of ways, and it is certainly possible to use captures to simply store the location of match for later use. If syntax highlight is the goal, then a PEG is not a bad choice of tool.

Question 5

Not to mention the latest releases of SciTE (since 2.22) include Scintillua, an LPEG-based lexer, meaning it can work right out of the box, no re-compilation required.

Question 6

All they doesn't work with non-ASCII characters.

Question 7

Depending on the task at hand it might be easier to use string.byte. It is also the fastest ways because it avoids creating new substring that happends to be pretty expensive in Lua thanks to hashing of each new string and checking if it is already known. You can pre-calculate code of symbols you look for with same string.byte to maintain readability and portability.

local str = "ab/cd/ef"
local target = string.byte("/")
for idx = 1, #str do
 if str:byte(idx) == target then
 print("Target found at:", idx)
 end
end

Question 8

If you're using Lua 5, try:

for i = 1, string.len(str) do
 print( string.sub(str, i, i) )
end

Question 9

There are already a lot of good approaches in the provided answers (here, here and here). If speed is what are you primarily looking for, you should definitely consider doing the job through Lua's C API, which is many times faster than raw Lua code. When working with preloaded chunks (eg. load function), the difference is not that big, but still considerable.

As for the pure Lua solutions, let me share this small benchmark, I've made. It covers every provided answer to this date and adds a few optimizations. Still, the basic thing to consider is:

How many times you'll need to iterate over characters in string?

If the answer is "once", than you should look up first part of the banchmark ("raw speed").
Otherwise, the second part will provide more precise estimation, because it parses the string into the table, which is much faster to iterate over. You should also consider writing a simple function for this, like @Jarriz suggested.

Here is full code:

-- Setup locals
local str = "Hello World!"
local attempts = 5000000
local reuses = 10 -- For the second part of benchmark: Table values are reused 10 times. Change this according to your needs.
local x, c, elapsed, tbl
-- "Localize" funcs to minimize lookup overhead
local stringbyte, stringchar, stringsub, stringgsub, stringgmatch = string.byte, string.char, string.sub, string.gsub, string.gmatch
print("-----------------------")
print("Raw speed:")
print("-----------------------")
-- Version 1 - string.sub in loop
x = os.clock()
for j = 1, attempts do
 for i = 1, #str do
 c = stringsub(str, i)
 end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))
-- Version 2 - string.gmatch loop
x = os.clock()
for j = 1, attempts do
 for c in stringgmatch(str, ".") do end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))
-- Version 3 - string.gsub callback
x = os.clock()
for j = 1, attempts do
 stringgsub(str, ".", function(c) end)
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))
-- For version 4
local str2table = function(str)
 local ret = {}
 for i = 1, #str do
 ret[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
 end
 return ret
end
-- Version 4 - function str2table
x = os.clock()
for j = 1, attempts do
 tbl = str2table(str)
 for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
 c = tbl[i]
 end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))
-- Version 5 - string.byte
x = os.clock()
for j = 1, attempts do
 tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
 for i = 1, #tbl do
 c = tbl[i] -- Note: produces char codes instead of chars.
 end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))
-- Version 5b - string.byte + conversion back to chars
x = os.clock()
for j = 1, attempts do
 tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
 for i = 1, #tbl do
 c = stringchar(tbl[i])
 end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))
print("-----------------------")
print("Creating cache table ("..reuses.." reuses):")
print("-----------------------")
-- Version 1 - string.sub in loop
x = os.clock()
for k = 1, attempts do
 tbl = {}
 for i = 1, #str do
 tbl[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
 end
 for j = 1, reuses do
 for i = 1, #tbl do
 c = tbl[i]
 end
 end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))
-- Version 2 - string.gmatch loop
x = os.clock()
for k = 1, attempts do
 tbl = {}
 local tblc = 1 -- Note: This is faster than table.insert
 for c in stringgmatch(str, ".") do
 tbl[tblc] = c
 tblc = tblc + 1
 end
 for j = 1, reuses do
 for i = 1, #tbl do
 c = tbl[i]
 end
 end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))
-- Version 3 - string.gsub callback
x = os.clock()
for k = 1, attempts do
 tbl = {}
 local tblc = 1 -- Note: This is faster than table.insert
 stringgsub(str, ".", function(c)
 tbl[tblc] = c
 tblc = tblc + 1
 end)
 for j = 1, reuses do
 for i = 1, #tbl do
 c = tbl[i]
 end
 end
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))
-- Version 4 - str2table func before loop
x = os.clock()
for k = 1, attempts do
 tbl = str2table(str)
 for j = 1, reuses do
 for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
 c = tbl[i]
 end
 end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))
-- Version 5 - string.byte to create table
x = os.clock()
for k = 1, attempts do
 tbl = {stringbyte(str,1,#str)}
 for j = 1, reuses do
 for i = 1, #tbl do
 c = tbl[i]
 end
 end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))
-- Version 5b - string.byte to create table + string.char loop to convert bytes to chars
x = os.clock()
for k = 1, attempts do
 tbl = {stringbyte(str, 1, #str)}
 for i = 1, #tbl do
 tbl[i] = stringchar(tbl[i])
 end
 for j = 1, reuses do
 for i = 1, #tbl do
 c = tbl[i]
 end
 end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))

Example output (Lua 5.3.4, Windows):

-----------------------
Raw speed:
-----------------------
V1: elapsed time: 3.713
V2: elapsed time: 5.089
V3: elapsed time: 5.222
V4: elapsed time: 4.066
V5: elapsed time: 2.627
V5b: elapsed time: 3.627
-----------------------
Creating cache table (10 reuses):
-----------------------
V1: elapsed time: 20.381
V2: elapsed time: 23.913
V3: elapsed time: 25.221
V4: elapsed time: 20.551
V5: elapsed time: 13.473
V5b: elapsed time: 18.046

Result:

In my case, the string.byte and string.sub were fastest in terms of raw speed. When using cache table and reusing it 10 times per loop, the string.byte version was fastest even when converting charcodes back to chars (which isn't always necessary and depends on usage).

As you have probably noticed, I've made some assumptions based on my previous benchmarks and applied them to the code:

Library functions should be always localized if used inside loops, because it is a lot faster.
Inserting new element into lua table is much faster using tbl[idx] = value than table.insert(tbl, value).
Looping through table using for i = 1, #tbl is a bit faster than for k, v in pairs(tbl).
Always prefer the version with less function calls, because the call itself adds a little bit to the execution time.

Hope it helps.

Question 10

The elapsed = os.clock() - x adds one global table fetch into the mix. Recommended to take os.clock into a local variable. There is also doing - x on it which may or may not affect the time. Noticeable? Probably not until you run these tests hundreds of times to secure the average, min, and max run times.

Question 11

I did some testing with long strings, these were the results ordered by speed: (Strlen 160k V4: 41.544) (Strlen 400k V1: 9.121) (Strlen 800k V3: 0.064, V2: 0.048, V5b: 0.048, V5: 0.018) NOTES: [v4] runs out of memory on strings longer than around 160k, at least on my machine. [v5/5b] byte() only handles slices up to 7997 chars long, so I had to warp both in an extra loop to process the string in chunks of that size.

Question 12

Thank you very much for creating this benchmark. I can reproduce your results with Lua 5.1 on Linux. Not much different. However, I've run them with my LuaJIT version, and the results for V5 are catastrophically worse! Every other version with LuaJIT is clearly faster (V1 is half a second instead of 3!), but V5 took too 47 seconds instead of the 2 seconds for Lua 5.1. I recommend everyone to be very careful with what version they choose if they are about performance, and try with your supported Lua implementations!

Question 13

PS: the trick that you used for optimizing V5, if removed, and changed to a double loop, like in V1, it yields the expected fast run time. It moves from 47 seconds to just 0.078 seconds.

Question 14

Iterating to construct a string and returning this string as a table with load()...

itab=function(char)
local result
for i=1,#char do
 if i==1 then
 result=string.format('%s','{')
 end
result=result..string.format('\'%s\'',char:sub(i,i))
 if i~=#char then
 result=result..string.format('%s',',')
 end
 if i==#char then
 result=result..string.format('%s','}')
 end
end
 return load('return '..result)()
end
dump=function(dump)
for key,value in pairs(dump) do
 io.write(string.format("%s=%s=%s\n",key,type(value),value))
end
end
res=itab('KOYAANISQATSI')
dump(res)

Puts out...

1=string=K
2=string=O
3=string=Y
4=string=A
5=string=A
6=string=N
7=string=I
8=string=S
9=string=Q
10=string=A
11=string=T
12=string=S
13=string=I

Question 15

All people suggest a less optimal method

Will be best:

 function chars(str)
 strc = {}
 for i = 1, #str do
 table.insert(strc, string.sub(str, i, i))
 end
 return strc
 end
 str = "Hello world!"
 char = chars(str)
 print("Char 2: "..char[2]) -- prints the char 'e'
 print("-------------------\n")
 for i = 1, #str do -- testing printing all the chars
 if (char[i] == " ") then
 print("Char "..i..": [[space]]")
 else
 print("Char "..i..": "..char[i])
 end
 end

Question 16

"Less optimal" for what task? "Best" for what task?

Question 17

This is the less optimal method.

RBerteig RBerteig 43.7k7 gold badges91 silver badges131 bronze badges · Accepted Answer · 2009-05-07 00:31:47Z

In lua 5.1, you can iterate of the characters of a string this in a couple of ways.

The basic loop would be:

for i = 1, #str do
 local c = str:sub(i,i)
 -- do something with c
end

But it may be more efficient to use a pattern with string.gmatch() to get an iterator over the characters:

for c in str:gmatch"." do
 -- do something with c
end

Or even to use string.gsub() to call a function for each char:

str:gsub(".", function(c)
 -- do something with c
end)

In all of the above, I've taken advantage of the fact that the string module is set as a metatable for all string values, so its functions can be called as members using the : notation. I've also used the (new to 5.1, IIRC) # to get the string length.

The best answer for your application depends on a lot of factors, and benchmarks are your friend if performance is going to matter.

You might want to evaluate why you need to iterate over the characters, and to look at one of the regular expression modules that have been bound to Lua, or for a modern approach look into Roberto's lpeg module which implements Parsing Expression Grammers for Lua.

Thanks. About lpeg module you have mentioned - does it save tokens positions in original text after tokenization? The task i need to perform is to syntax highlight specific simple language in scite via lua (with no compiled c++ parser). Also, how to install lpeg? Seems it has .c source in distribution - does it need to be compiled alongside lua?
Building lpeg will produce a DLL (or .so) that should be stored where require can find it. (i.e. somewhere identified by the content o f the global package.cpath in your lua installation.) You also need to install its companion module re.lua if you want to use its simplified syntax. From an lpeg grammar, you can get callbacks and capture text in a number of ways, and it is certainly possible to use captures to simply store the location of match for later use. If syntax highlight is the goal, then a PEG is not a bad choice of tool.
Not to mention the latest releases of SciTE (since 2.22) include Scintillua, an LPEG-based lexer, meaning it can work right out of the box, no re-compilation required.

CollectivesTM on Stack Overflow

How to iterate individual characters in Lua string?

6 Answers 6

4 Comments

Comments

Comments

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

6 Answers 6

4 Comments

Comments

Comments

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related