BeautifulSoup doesn't work with a threaded input queue?

Christopher Reimer christopher_reimer at yahoo.com
Sun Aug 27 15:35:03 EDT 2017


On 8/27/2017 11:54 AM, Peter Otten wrote:
> The documentation
>> https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
>> says you can make the BeautifulSoup object from a string or file.
> Can you give a few more details where the queue comes into play? A small
> code sample would be ideal.

A worker thread uses a request object to get the page and puts it into 
queue as page.content (HTML).  Another worker thread gets the 
page.content from the queue to apply BeautifulSoup and nothing happens.
soup = BeautifulSoup(page_content, 'lxml')
print(soup)
No output whatsoever. If I remove 'lxml', I get the UserWarning that no 
parser wasn't explicitly set and get the reference to threading.py at 
line 80.
I verified that page.content that goes into and out of the queue is the 
same page.content that goes into and out of a list.
I read somewhere that BeautifulSoup may not be thread-safe. I've never 
had a problem with threads storing the output into a queue. Using a 
queue (random order) instead of a list (sequential order) to feed pages 
for the input is making it wonky.
Chris R.


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /