lua-users home
lua-l archive

Re: Stripping HTML tags

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Aug 15, 2005, at 21:44, Florian Berger wrote:
I thought that stripping HTML tags was easy until I saw something like this:
<a href="http://www.example.com"; alt="> example"> example </a>
Argh! Nasty, nasty HTML! 8^)
Perhaps you could try LUXMLInputStream:
http://dev.alt.textdrive.com/file/lu/LUXMLInputStream.lua
Usage example:
local aContent = "<a href=\"http://www.example.com\"; alt=\"> example\"> example </a>"
local anInputStream = LUXMLInputStream( aContent )
for aType, aText, aName, someAttributes in anInputStream:iterator() do
 if aType == LUXMLInputStream.Text then
 print( aType, aText )
 elseif someAttributes ~= nil then
 print( aType, aName, someAttributes )
 else
 print( aType, aName )
 end
end
> 1 a { href = http://www.example.com }
> 3 example"> example
> 2 a
Or if you simply want the textual part:
for aText in anInputStream:iterator( true ) do
 print( aText )
end
>	example"> example
Cheers
--
PA, Onnay Equitursay
http://alt.textdrive.com/

AltStyle によって変換されたページ (->オリジナル) /