Introduction
Valve recently launched the Dota 2 Workshop Tools, which allows players to create their own maps with custom gamemodes, similar to Warcraft 3 in capabilities.
Ability and unit definitions are stored in txt files under a specific format, very similar to JSON:
{
"key" "value"
"otherKey" { //C-style comments
"moreKey" "moreValue"
} //no block comments!
}
It's essentialy JSON with strings only, no punctuation and comments.
I've been taking my time into writing a serializer/deserializer for such files in Lua, so that fellow map creators can use it to save arbitrary data into files. I feel like it's a bit clunky, and not the most readable solution, seeing that I'm new with Lua, this is why I've joined this site. I'd like to see which points can be improved on the code, ither with the use of convenience functions, or plainly logic changes. Even style changes are welcome, since I'm completely unaware of Lua's coding standards.
The code
Everything is contained in a 150-line Lua file, available at GitHub. Below is the mentioned file. I've wrapped everything into the KV == nil
condition so I can have safely local variables and functions, essentialy creating a closure.
if KV == nil then
local function contains(table, element)
for _, value in pairs(table) do
if value == element then
return true
end
end
return false
end
local chars = {
indent = "\t",
escape = "\\",
delim = '"',
commentStart = "/",
commentEnd = {"\n", "\r"},
blockStart = "{",
blockEnd = "}"
}
-- iterate_chars doc
-- receives string to operate and a iterator func
-- iterator is called for each character in the string (unless skip)
-- iterator is called with current, previous and next chars, and also
-- the remaining uniterated string
-- iterator can return false to abort iteration, or return a number
-- if iterator returns a number N, skips the next N chars.
-- otherwise proceed normally
function iterate_chars(str, iterator)
local remaining = ""
local jump = 0
for i = 1, #str do
if i > jump then -- skips characters when under the jump
local prev = str:sub(i-1,i-1)
local peek = str:sub(i+1,i+1)
local char = str:sub(i,i)
local remain = str:sub(i, #str) -- everything still unprocessed, including the current char
local iterresult = iterator(char, prev, peek, remain, i) -- calls the iterating function
if type(iterresult) == "number" then
jump = i + iterresult -- if the function returns a number, skip that many characters
elseif iterresult == false then
return remain -- if the function returns false, stop iteration and return the remaining
end
end
end
end
function parse_string(str)
local stringmode = false
local result = ""
local remain = iterate_chars(str, function(char, prev, peek)
if char == chars.escape then -- separate check so escape backslashes don't get processed
if peek == chars.delim then
result = result .. peek -- escape for \", allowing quotes inside strings
end
elseif char == chars.delim then
if prev ~= chars.escape then
-- if it's a quote that is not escaped...
if stringmode then
return false -- stops iteration when the closing quote is found
else
stringmode = true
end
end
elseif stringmode then
result = result .. char
end
end)
-- result, remain, processed length
return {result, remain, #result+2}
end
local function parse_block(str)
-- parses a string, inserts the content into tbl and returns processed length
local function parse_add_string(tbl, toparse)
local data = parse_string(toparse)
table.insert(tbl, data[1])
return data[3]
end
local blockmode = false
local commentmode = false
local isKey = true
local result = {}
local keyslist = {}
local valuelist = {}
local remain = iterate_chars(str, function(char, prev, peek, remain)
if char == chars.commentStart and peek == chars.commentStart then -- start comment
commentmode = true
elseif commentmode and contains(chars.commentEnd, char) then -- end comment
commentmode = false
elseif commentmode then
-- just ignore...
elseif not blockmode and char == chars.blockStart then -- block start detected
blockmode = true
elseif not char:match("%s") and char ~= chars.indent then -- non space char detected
if blockmode then
if char == chars.blockEnd then -- end of block detected
blockmode = false
return false
elseif isKey then -- when on key mode, parse strings only
isKey = false
return parse_add_string(keyslist, remain)
else
isKey = true
if char == chars.blockStart then -- values can also be blocks
local data = parse_block(remain)
table.insert(valuelist, data[1])
return data[3]
else -- in addition to strings
return parse_add_string(valuelist, remain)
end
end
end
end
end)
-- joins the keys and values into one table
for n, key in pairs(keyslist) do
result[key] = valuelist[n]
end
-- result, remain, processed length
return {result, remain, #str-#remain}
end
local function serialize_block(tbl, indent)
indent = indent or 0
print(indent)
local indentation = indent and indent ~= 0 and chars.indent:rep(indent) or "" -- for first level
local result = chars.blockStart
for k,v in pairs(tbl) do
result = table.concat({result, '\n', indentation, chars.indent, chars.delim, k, chars.delim, " "}, "") -- Joins whitespace and key to the result
if type(v) == "table" then -- Joins the value onto the result
result = result .. serialize_block(v, indent + 1)
else
result = table.concat({result, chars.delim, tostring(v), chars.delim}, "")
end
end
return table.concat({result, "\n", indentation, chars.blockEnd}, "") -- finally joins the closing character to the result
end
KV = {}
function KV:Parse(str)
return parse_block(str)[1] -- returns the parsed content
end
function KV:Dump(tbl)
return serialize_block(tbl) -- returns the stringified table
end
end
1 Answer 1
Essentially, the data format is more similar to the lua-table syntax than JSON. It'll be far easier to parse the text string (or file, which can be loaded as a string variable) to be changed to a table instead. Here's the little code I wrote (and tried).
local function comment(w)
local r, m = w, w:match '(.+)//'
local t, _ = m:gsub( '"', '' )
if _ % 2 == 1 then return r else return m end
end
local function parser(x)
local r = x:gsub( "(.-)[\n\r]+",
function(w)
if w:match '//' then w = comment(w) end
w = w:gsub( '("[^"]+") {', '[%1] = {' )
w = w:gsub( '("[^"]+") ("[^"]+")', '[%1] = %2,' )
return w..'\n'
end
)
return loadstring( "return "..r:gsub('}', '},'):gsub(',$', '') )()
end
Now, to execute (or test), you pass your string to the function parser
, which'll return
the key-value pairs as a table. For my testing purposes, I used the data string as follows:
{
"key" "value"
"otherKey" { //C-style "comments" "to break?"
"moreKey" "moreValue"
{"let's see" "if // breaks" } //should it?}
} //no block comments!
"let's see" "if // breaks"
}
You can use any pretty printers for printing this returned lua-table back. Do comment if you don't understand any of the steps in my program.
-
1\$\begingroup\$ That's awesome, would you mind explaining some points? 1. I don't really understand the
comment
function, it's returning the whole string if the number of quotes is even, and just what is before the//
if it's not? (what happens with a escaped quote\"
?) 2. I can imagine that(.-)
has the Regex equivalent of(.*?)
? 3. What are the[%1]
s? (specifically, the square brackets). I also forgot to mention, but any number of whitespace characters is allowed between keys and values (how could that be done with this solution). 4. Finally, will those patterns going to degrade performance? ... \$\endgroup\$Kroltan– Kroltan2014年08月27日 09:50:05 +00:00Commented Aug 27, 2014 at 9:50 -
\$\begingroup\$ (It's for a game. Performance is a very good thing to have. I can suppose that scan methods are linear? If so, it would take about
(n^2)*3n+2m
(n = string length, m = result length), no? I know Lua isn't really performant either, but still... \$\endgroup\$Kroltan– Kroltan2014年08月27日 09:54:00 +00:00Commented Aug 27, 2014 at 9:54 -
1\$\begingroup\$ @Kroltan Yes. 1. The
comment
function would fail for an escaped quote, but that was a corner case and I didn't think of that. 2. Yes. 3. In a lua-table, the keys should be enclosed in square brackets if they are strings, which is what I'm doing here.%1
and%2
are the matched groups. For varying number of spaces, you can use ` *` or ` +` as per your needs. \$\endgroup\$hjpotter92– hjpotter922014年08月28日 06:22:40 +00:00Commented Aug 28, 2014 at 6:22