[uf-dev] python microformats parser
 Phil Dawes 
 phil at phildawes.net
 
 Mon Nov 14 13:26:41 PST 2005
 
 
 
Hi Dev list,
I've put together a simple python microformat parser for use in python 
projects (including my own structured-data-aggregator thingy JAM*VAT[1]).
Download from here:
http://phildawes.net/microformats/microformatparser.html
Unpack and type:
 > python microformatparser.py http://tantek.com/log/2005/10.html
and it prints:
vevent
 dtstart : 2005年11月11日
 dtend : 2005年11月14日
 summary : Hackers conference
vevent
 dtstart : 2006年02月27日
 dtend : 2006年03月04日
 summary : W3C Technical Plenary
 location : Sofitel, Mandelieu, France
[..etc..]
vcard
 url : http://tantek.com/
 logo : http://tantek.com/icon80px.jpg
 fn : Tantek Çelik
My original aim was to produce a microformat parser that would have a 
good go at parsing any semantic xhtml (given a starting point - e.g. 
class="vcard").
I actually got quite far with this (the jamvat demo installation[1] 
currently runs this parser). The main problem was deciding when a 
property is a parent of other properties -
e.g. the structure:
 <address class="vcard" id="hcard">
 <a class="url fn" href="http://tantek.com/">
 <img src="/icon80px.jpg" class="logo" alt="">
 Tantek Çelik
 </a>
 </address>
..could easily be interpretted by a naive parser as being of structure:
 vcard {
 url : http://tantek.com
 fn : {
 logo: http://tantek.com/icon80px.jpg
 }
}
instead of the more correct:
 vcard {
 url : http://tantek.com
 fn : Tantek Çelik
 logo: http://tantek.com/icon80px.jpg
}
Having bashed my head against this a bit, I decided to start from the 
other direction: using a hardcoded schema, and then adding genericity 
where possible. So this is a first stab at the latter approach - it's 
driven by a simple datastructure which tells it which properties to look 
out for, and also which ones can be 'parents' of other properties. 
Here's the structure in v0.1:
-----------------
vcardprops = MicroformatSchema(['fn','family-name', 'given-name', 
'additional-name', 'honorific-prefix', 'honorific-suffix', 'nickname', 
'sort-string','url','email','type','tel','post-office-box', 
'extended-address', 'street-address', 'locality', 'region', 
'postal-code', 'country-name', 'label', 'latitude', 'longitude', 'tz', 
'photo', 'logo', 'sound', 'bday','title', 'role','organization-name', 
'organization-unit','category', 'note','class', 'key', 'mailer', 'uid', 
'rev'],['n','email','adr','geo','org','tel'])
veventprops = 
MicroformatSchema(["summary","url","dtstart","dtend","location"],[])
SCHEMAS= {'vcard':vcardprops,'vevent':veventprops}
-----------------
Hope this is of use to somebody!
Cheers,
Phil
[1] http://phildawes.net/jamvat/
 
 
More information about the microformats-dev
mailing list