7
\$\begingroup\$

Introduction

Valve recently launched the Dota 2 Workshop Tools, which allows players to create their own maps with custom gamemodes, similar to Warcraft 3 in capabilities.

Ability and unit definitions are stored in txt files under a specific format, very similar to JSON:

{
 "key" "value"
 "otherKey" { //C-style comments
 "moreKey" "moreValue"
 } //no block comments!
}

It's essentialy JSON with strings only, no punctuation and comments.

I've been taking my time into writing a serializer/deserializer for such files in Lua, so that fellow map creators can use it to save arbitrary data into files. I feel like it's a bit clunky, and not the most readable solution, seeing that I'm new with Lua, this is why I've joined this site. I'd like to see which points can be improved on the code, ither with the use of convenience functions, or plainly logic changes. Even style changes are welcome, since I'm completely unaware of Lua's coding standards.

The code

Everything is contained in a 150-line Lua file, available at GitHub. Below is the mentioned file. I've wrapped everything into the KV == nil condition so I can have safely local variables and functions, essentialy creating a closure.

if KV == nil then
 local function contains(table, element)
 for _, value in pairs(table) do
 if value == element then
 return true
 end
 end
 return false
 end
 local chars = {
 indent = "\t",
 escape = "\\",
 delim = '"',
 commentStart = "/",
 commentEnd = {"\n", "\r"},
 blockStart = "{",
 blockEnd = "}"
 }
 -- iterate_chars doc
 -- receives string to operate and a iterator func
 -- iterator is called for each character in the string (unless skip)
 -- iterator is called with current, previous and next chars, and also
 -- the remaining uniterated string
 -- iterator can return false to abort iteration, or return a number
 -- if iterator returns a number N, skips the next N chars.
 -- otherwise proceed normally
 function iterate_chars(str, iterator)
 local remaining = ""
 local jump = 0
 for i = 1, #str do
 if i > jump then -- skips characters when under the jump
 local prev = str:sub(i-1,i-1) 
 local peek = str:sub(i+1,i+1)
 local char = str:sub(i,i)
 local remain = str:sub(i, #str) -- everything still unprocessed, including the current char
 local iterresult = iterator(char, prev, peek, remain, i) -- calls the iterating function
 if type(iterresult) == "number" then
 jump = i + iterresult -- if the function returns a number, skip that many characters
 elseif iterresult == false then
 return remain -- if the function returns false, stop iteration and return the remaining
 end
 end
 end
 end
 function parse_string(str)
 local stringmode = false
 local result = ""
 local remain = iterate_chars(str, function(char, prev, peek)
 if char == chars.escape then -- separate check so escape backslashes don't get processed
 if peek == chars.delim then
 result = result .. peek -- escape for \", allowing quotes inside strings
 end
 elseif char == chars.delim then
 if prev ~= chars.escape then
 -- if it's a quote that is not escaped...
 if stringmode then
 return false -- stops iteration when the closing quote is found
 else
 stringmode = true
 end
 end
 elseif stringmode then
 result = result .. char
 end
 end)
 -- result, remain, processed length
 return {result, remain, #result+2}
 end
 local function parse_block(str)
 -- parses a string, inserts the content into tbl and returns processed length
 local function parse_add_string(tbl, toparse)
 local data = parse_string(toparse)
 table.insert(tbl, data[1])
 return data[3]
 end
 local blockmode = false
 local commentmode = false
 local isKey = true
 local result = {}
 local keyslist = {}
 local valuelist = {}
 local remain = iterate_chars(str, function(char, prev, peek, remain)
 if char == chars.commentStart and peek == chars.commentStart then -- start comment
 commentmode = true
 elseif commentmode and contains(chars.commentEnd, char) then -- end comment
 commentmode = false
 elseif commentmode then
 -- just ignore...
 elseif not blockmode and char == chars.blockStart then -- block start detected
 blockmode = true
 elseif not char:match("%s") and char ~= chars.indent then -- non space char detected
 if blockmode then
 if char == chars.blockEnd then -- end of block detected
 blockmode = false
 return false
 elseif isKey then -- when on key mode, parse strings only
 isKey = false
 return parse_add_string(keyslist, remain)
 else
 isKey = true
 if char == chars.blockStart then -- values can also be blocks
 local data = parse_block(remain)
 table.insert(valuelist, data[1])
 return data[3]
 else -- in addition to strings
 return parse_add_string(valuelist, remain)
 end
 end
 end
 end
 end)
 -- joins the keys and values into one table
 for n, key in pairs(keyslist) do
 result[key] = valuelist[n]
 end
 -- result, remain, processed length 
 return {result, remain, #str-#remain}
 end
 local function serialize_block(tbl, indent)
 indent = indent or 0
 print(indent)
 local indentation = indent and indent ~= 0 and chars.indent:rep(indent) or "" -- for first level
 local result = chars.blockStart
 for k,v in pairs(tbl) do
 result = table.concat({result, '\n', indentation, chars.indent, chars.delim, k, chars.delim, " "}, "") -- Joins whitespace and key to the result
 if type(v) == "table" then -- Joins the value onto the result
 result = result .. serialize_block(v, indent + 1)
 else
 result = table.concat({result, chars.delim, tostring(v), chars.delim}, "")
 end
 end
 return table.concat({result, "\n", indentation, chars.blockEnd}, "") -- finally joins the closing character to the result
 end
 KV = {}
 function KV:Parse(str)
 return parse_block(str)[1] -- returns the parsed content
 end
 function KV:Dump(tbl)
 return serialize_block(tbl) -- returns the stringified table
 end
end
asked Aug 27, 2014 at 1:19
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Essentially, the data format is more similar to the lua-table syntax than JSON. It'll be far easier to parse the text string (or file, which can be loaded as a string variable) to be changed to a table instead. Here's the little code I wrote (and tried).

local function comment(w)
 local r, m = w, w:match '(.+)//'
 local t, _ = m:gsub( '"', '' )
 if _ % 2 == 1 then return r else return m end
end
local function parser(x)
 local r = x:gsub( "(.-)[\n\r]+",
 function(w)
 if w:match '//' then w = comment(w) end
 w = w:gsub( '("[^"]+") {', '[%1] = {' )
 w = w:gsub( '("[^"]+") ("[^"]+")', '[%1] = %2,' )
 return w..'\n'
 end
 )
 return loadstring( "return "..r:gsub('}', '},'):gsub(',$', '') )()
end

Now, to execute (or test), you pass your string to the function parser, which'll return the key-value pairs as a table. For my testing purposes, I used the data string as follows:

{
 "key" "value"
 "otherKey" { //C-style "comments" "to break?"
 "moreKey" "moreValue"
 {"let's see" "if // breaks" } //should it?}
 } //no block comments!
 "let's see" "if // breaks"
}

You can use any pretty printers for printing this returned lua-table back. Do comment if you don't understand any of the steps in my program.

answered Aug 27, 2014 at 2:58
\$\endgroup\$
3
  • 1
    \$\begingroup\$ That's awesome, would you mind explaining some points? 1. I don't really understand the comment function, it's returning the whole string if the number of quotes is even, and just what is before the // if it's not? (what happens with a escaped quote \"?) 2. I can imagine that (.-) has the Regex equivalent of (.*?)? 3. What are the [%1]s? (specifically, the square brackets). I also forgot to mention, but any number of whitespace characters is allowed between keys and values (how could that be done with this solution). 4. Finally, will those patterns going to degrade performance? ... \$\endgroup\$ Commented Aug 27, 2014 at 9:50
  • \$\begingroup\$ (It's for a game. Performance is a very good thing to have. I can suppose that scan methods are linear? If so, it would take about (n^2)*3n+2m (n = string length, m = result length), no? I know Lua isn't really performant either, but still... \$\endgroup\$ Commented Aug 27, 2014 at 9:54
  • 1
    \$\begingroup\$ @Kroltan Yes. 1. The comment function would fail for an escaped quote, but that was a corner case and I didn't think of that. 2. Yes. 3. In a lua-table, the keys should be enclosed in square brackets if they are strings, which is what I'm doing here. %1 and %2 are the matched groups. For varying number of spaces, you can use ` *` or ` +` as per your needs. \$\endgroup\$ Commented Aug 28, 2014 at 6:22

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.