0

I am very new to using beautifulsoup therefore my question might seem like I am misunderstanding something, however here goes.

I am currently trying to make a synonym dictionary as the ones I can currently find are not amazing. In this regard I am building on someone elses work, the guy who made PyDictionary, therefore I am pulling synonyms from http://www.thesaurus.com/

In this example I am trying to pull only the noun synonyms from view-source:http://www.thesaurus.com/browse/animal?s=t

I have found this piece which indicates the the synonyms under the next relevancy block are nouns:

 <div class="synonym-description">
 <em class="txt">noun</em>
 <strong class="ttl">animate being; mammal</strong>
 </div>
 <div class="relevancy-block">
 <div class="relevancy-list">

My next question is essentially how do I specify that I only want to look in the class block "relevancy-list" directly after the class="txt>noun

After this I wanna look for the line

 <li><a href="http://www.thesaurus.com/browse/pet" class="common-word" data-id="1" data-category="{&quot;name&quot;: &quot;relevant-3&quot;, &quot;color&quot;: &quot;#fcbb45&quot;}" data-complexity="1" data-length="1"><span class="text">pet</span><span class="star inactive">star</span></a></li>

And pull out the text under class="txt"

Currently I am loading it into an object via :

BeautifulSoup(requests.get(url).text)

How I am literally at a loss of where to go next, I have tried googling but to no real avail.

asked Feb 9, 2017 at 14:10

3 Answers 3

1
import requests, bs4
url = "http://www.thesaurus.com/browse/animal?s=t"
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
for txt in soup.find_all(class_="txt"):
 relevancy_list = txt.find_next(class_="relevancy-list")
answered Feb 9, 2017 at 14:20
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot that helped! I found the answer and will write it out!
1

You can use the find_all function where the first argument is the type ('div', 'a' etc.) and in the second argument you can filter by class.

soup.find_all('em', {'class':"txt"})

This way you will get all 'em' with the class 'txt'.

soup.find_all('div', {'class':"relevancy-block"})

Here you will find all the 'div' with class name 'relevancy-block'

answered Feb 9, 2017 at 14:18

3 Comments

Awesome, then the next step would be, how do I look in the following div class="relevancy-block" after the first time I found noun?
@nonein , what do you mean found noun?
Sorry for not being super clear, I found the answer and will post it momentarily.
0

I found a way of doing this thanks to both comments I received:

The following code first looks at the filters then subsequently if the filter is a noun or a verb, if it is a noun it lists all the nouns classified as common-words

def _get_soup_object(url):
 return BeautifulSoup(requests.get(url).text)
term="animal" 
data = _get_soup_object("http://www.thesaurus.com/browse/{0}".format(term))
for selector_var in data.find_all(class_="filters"):
 word_type=selector_var.find_all(class_="txt")
 if word_type[0].text=="adj":
 print("This is an adjective, which we don't want")
 elif word_type[0].text=="noun":
 print("This is a noun, which we do want")
 word_list=selector_var.find_all(class_="common-word")
 for indv_word in word_list:
 print(indv_word.text[:-4])
answered Feb 9, 2017 at 14:43

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.