Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 25cb2a6

Browse files
udpate README
1 parent dc2b9ba commit 25cb2a6

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

‎README.md‎

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,3 +250,52 @@ for i in range(40):
250250
Now how do I know that we have to increment by ```30``` well I checked the pattern of
251251
urls by visiting the pages and stop at ```270``` so that we only request 10 pages.
252252
You can use whatever value you want but it should be multiple of ```30```
253+
254+
# Reading Restaurant Title
255+
Now we will be using the previous code that we wrote in ```formatting_url.py``` and
256+
extract the particular piece of text from the html tags that we need which is the title
257+
of the restaurant from each search page.
258+
259+
Visit the [url](https://www.yelp.com/search?find_desc=Restaurants&find_loc=los+angeles&start=30) and open developers tools
260+
and point at the block of restaurant with title, rating, review etc. and find the
261+
li tag with class ```regular-search-result```
262+
263+
We will be using this class for searching the particular ```li``` tag from the response
264+
using ```BeautifulSoup```
265+
266+
**reading_name.py**
267+
```
268+
import requests
269+
...
270+
info_block = soup.findAll('li', {'class': 'regular-search-result'})
271+
print(info_block)
272+
```
273+
274+
Run the file and you should the whole li tag and its inner tags printed. But we want
275+
to extract the title of the restaurant from each li tag, for that we have to find
276+
the class used in the title of restaurant
277+
278+
The title is wrapped inside a **anchor** tag with class ```biz-name```
279+
280+
```
281+
info_block = soup.findAll('a', {'class': 'biz-name'})
282+
print(info_block)
283+
284+
count = 0
285+
for info in info_block:
286+
print(info.text)
287+
count += 1
288+
289+
print(count)
290+
```
291+
292+
On printing the ```text``` of the html tag we get the title of the restaurant, these are
293+
not all the title cause some block don't have ```biz-name``` class but we have what we
294+
need.
295+
296+
# Advanced Extraction
297+
In this section we will be go a little more further and extract the name, address,
298+
phone-number of the restaurant.
299+
300+
This time we will be looking for the ```div``` tag that has class ```biz-listing-large```
301+
that contains the restaurant details.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /