[uf-discuss] "Must Ignore vs. Microformats"

Wed Jul 19 19:05:49 PDT 2006

On Jul 19, 2006, at 10:55 AM, Tantek Çelik wrote:
> On 7/19/06 10:34 AM, "Charles Iliya Krempeaux" 
> <supercanadian at gmail.com>
> wrote:
>> One "good" thing about XML, IMO, is that for certain simple markups
>> based on XML, it's easier for a beginner-level or intermediate-level
>> developer to write a parser for it (as compared to writing a parser
>> for Micrformats... since HTML is more difficult to parse).
>>>> (For example, writing a parser in C, C++, PHP, Java, C# or whatever.)
>> [...]
> This is why the supposed "easier to parse" aspect of XML is incredibly
> misleading. It ignores both the need to be easier to publish, and the 
> fact
> that XML, in fact, is *harder* to publish.

Also, the Babel aspect of XML means that you always do need to write a 
parser, if not of the XML itself but to transform the 
plucked-from-the-air schema and arbitrary choices of what is an 
attribute and what an element to the data structure you are using.
A key part of Microformats is converging the schemas so this becomes 
much less necessary.
>> One example of such a simple format based on XML is RSS.
>> You're kidding right?
>> It is certainly *not* pretty easy for someone to write a parser for 
> RSS that
> actually works with real RSS on the Web.

Have a look at the Universal Feed Parsers 3000 test cases...
http://feedparser.org