lua-users home
lua-l archive

Re: Most succinct way to parse an HTTP header string

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Luiz Henrique de Figueiredo once stated:
> > I want to parse a HTTP header string "name:value" pair. In REXX this is 
> 
> This should work just fine:
> 
> local name,value=string.match(line,"(.-):%s*(.-)$")
 Actually, that may fail. According to RFC-2616, section 4.2:
	Header fields can be extended over multiple lines by preceding each
	extra line with at least one SP or HT. 
 So this is a valid header:
User-Agent: The Wizbang Frobulator 1.2p333
	(this is Microsoft Windows compatible. No, really!)
	(It also conforms to the Gecko layout engine)
	(and WebKit)
Here's the code I use to parse headers [1]:
local lpeg = require "lpeg"
local P = lpeg.P
local S = lpeg.S
local C = lpeg.C
local Cf = lpeg.Cf
local Ct = lpeg.Ct
local Cg = lpeg.Cg
-- -------------------------------------------------------
-- This function will collapse repeated headers into a table,
-- but otherwise, the value will be a string
-- --------------------------------------------------------
local function doset(t,i,v)
 if t[i] == nil then
 t[i] = v
 elseif type(t[i]) == 'table' then
 t[i][#t[i]+1] = v
 else
 t[i] = { t[i] , v }
 end
 return t
end
local crlf = P"\r"^-1 * P"\n"
local lwsp = S" \t"
local eoh = (crlf * #crlf) + (crlf - (crlf^-1 * lwsp))
local lws = (crlf^-1 * lwsp)^0
local value = C((P(1) - eoh)^0) / function(v)
 return v:gsub("[%s%c]+"," ")
 end
local name = C((P(1) - (P":" + crlf + lwsp))^1)
local header = Cg(name * ":" * lws * value * eoh)
headers = Cf(Ct("") * header^1,doset) * crlf
Given the following headers:
Host: www.example.net
User-Agent: The Wizbang Frobulator 1.2p333
	(this is Microsoft Windows compatible. No, really!)
	(It also conforms to the Gecko layout engine)
	(and WebKit)
Accept: text/html;q=.9, 
	text/plain;q=.5,
	text/*;q=0
Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
"headers:match(text)" will return a table:
{
 ['User-Agent'] = "The Wizbang Frobulator 1.2p333 (this is Microsoft Windows compatible. No, really!) (It also conforms to the Gecko layout engine) (and WebKit)",
 ['Accept'] = "text/html;q=.9, text/plain;q=.5, text/*;q=0",
 ['Accept-Charset'] = "iso-8859-5, unicode-1-1;q=0.8
}
 -spc (man, that real world---it's sooooo messy)
[1]	If I'm parsing email, I'll use:
	https://github.com/spc476/LPeg-Parsers/blob/master/email.lua

AltStyle によって変換されたページ (->オリジナル) /