Beautifulsoup and python

Question 1

I am very new to using beautifulsoup therefore my question might seem like I am misunderstanding something, however here goes.

I am currently trying to make a synonym dictionary as the ones I can currently find are not amazing. In this regard I am building on someone elses work, the guy who made PyDictionary, therefore I am pulling synonyms from http://www.thesaurus.com/

In this example I am trying to pull only the noun synonyms from view-source:http://www.thesaurus.com/browse/animal?s=t

I have found this piece which indicates the the synonyms under the next relevancy block are nouns:

 <div class="synonym-description">
 <em class="txt">noun</em>
 <strong class="ttl">animate being; mammal</strong>
 </div>
 <div class="relevancy-block">
 <div class="relevancy-list">

My next question is essentially how do I specify that I only want to look in the class block "relevancy-list" directly after the class="txt>noun

After this I wanna look for the line

 <li><a href="http://www.thesaurus.com/browse/pet" class="common-word" data-id="1" data-category="{&quot;name&quot;: &quot;relevant-3&quot;, &quot;color&quot;: &quot;#fcbb45&quot;}" data-complexity="1" data-length="1"><span class="text">pet</span><span class="star inactive">star</span></a></li>

And pull out the text under class="txt"

Currently I am loading it into an object via :

BeautifulSoup(requests.get(url).text)

How I am literally at a loss of where to go next, I have tried googling but to no real avail.

Question 2

import requests, bs4
url = "http://www.thesaurus.com/browse/animal?s=t"
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
for txt in soup.find_all(class_="txt"):
 relevancy_list = txt.find_next(class_="relevancy-list")

Question 3

Thanks a lot that helped! I found the answer and will write it out!

Question 4

You can use the find_all function where the first argument is the type ('div', 'a' etc.) and in the second argument you can filter by class.

soup.find_all('em', {'class':"txt"})

This way you will get all 'em' with the class 'txt'.

soup.find_all('div', {'class':"relevancy-block"})

Here you will find all the 'div' with class name 'relevancy-block'

Question 5

Awesome, then the next step would be, how do I look in the following div class="relevancy-block" after the first time I found noun?

Question 6

@nonein , what do you mean found noun?

Question 7

Sorry for not being super clear, I found the answer and will post it momentarily.

Question 8

I found a way of doing this thanks to both comments I received:

The following code first looks at the filters then subsequently if the filter is a noun or a verb, if it is a noun it lists all the nouns classified as common-words

def _get_soup_object(url):
 return BeautifulSoup(requests.get(url).text)
term="animal" 
data = _get_soup_object("http://www.thesaurus.com/browse/{0}".format(term))
for selector_var in data.find_all(class_="filters"):
 word_type=selector_var.find_all(class_="txt")
 if word_type[0].text=="adj":
 print("This is an adjective, which we don't want")
 elif word_type[0].text=="noun":
 print("This is a noun, which we do want")
 word_list=selector_var.find_all(class_="common-word")
 for indv_word in word_list:
 print(indv_word.text[:-4])

宏杰李 12.2k2 gold badges32 silver badges37 bronze badges · Accepted Answer · 2017-02-09 14:20:20Z

1

import requests, bs4
url = "http://www.thesaurus.com/browse/animal?s=t"
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
for txt in soup.find_all(class_="txt"):
 relevancy_list = txt.find_next(class_="relevancy-list")

Share

Improve this answer

edited Feb 9, 2017 at 14:27

answered Feb 9, 2017 at 14:20

宏杰李's user avatar

宏杰李

12.2k2 gold badges32 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

no nein

no nein Over a year ago

Thanks a lot that helped! I found the answer and will write it out!

2017年02月09日T14:39:28.707Z+00:00

CollectivesTM on Stack Overflow

Beautifulsoup and python

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related