html5lib not thread safe. Is the Python SAX library thread-safe?

Cameron Simpson cs at zip.com.au
Sun Mar 11 17:45:01 EDT 2012


On 11Mar2012 13:30, John Nagle <nagle at animats.com> wrote:
| "html5lib" is apparently not thread safe.
| (see "http://code.google.com/p/html5lib/issues/detail?id=189")
| Looking at the code, I've only found about three problems.
| They're all the usual "cached in a global without locking" bug.
| A few locks would fix that.
|| But html5lib calls the XML SAX parser. Is that thread-safe?
| Or is there more trouble down at the bottom?
|| (I run a multi-threaded web crawler, and currently use BeautifulSoup,
| which is thread safe, although dated. I'm looking at converting to
| html5lib.)

IIRC, BeautifulSoup4 may do that for you:
 http://www.crummy.com/software/BeautifulSoup/bs4/doc/
 http://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser
 "Beautiful Soup 4 uses html.parser by default, but you can plug in
 lxml or html5lib and use that instead."
Just for interest, re locking, I wrote a little decorator the other day,
thus:
 @locked_property
 def foo(self):
 compute foo here ...
 return foo value
and am rolling its use out amongst my classes. Code:
 def locked_property(func, lock_name='_lock', prop_name=None, unset_object=None):
 ''' A property whose access is controlled by a lock if unset.
 '''
 if prop_name is None:
 prop_name = '_' + func.func_name
 def getprop(self):
 ''' Attempt lockless fetch of property first.
 Use lock if property is unset.
 '''
 p = getattr(self, prop_name)
 if p is unset_object:
 with getattr(self, lock_name):
 p = getattr(self, prop_name)
 if p is unset_object:
 p = func(self)
 setattr(self, prop_name, p)
 return p
 return property(getprop)
It tries to be lockless in the common case. I suspect it is only safe in
CPython where there is a GIL. If raw python assignments and fetches can
overlap (eg Jypthon I think?) I probably need shared "read" lock around
the first "p = getattr(self, prop_name). Any remarks?
Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
Ed Campbell's <ed at Tekelex.Com> pointers for long trips:
1. lay out the bare minimum of stuff that you need to take with you, then
 put at least half of it back.


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /