Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

Instead of having a get_content function you could add a function that only parses a source passed to it, now it can be either the main page or the suggested apps. Note that you're requesting for main page's content twice even though you had it already.

In addition to the above point you could make few more improvements:

Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

Use __name__ == '__main__' to prevent your code from running when it is imported as a module.
Use __name__ == '__main__' to prevent your code from running when it is imported as a module.

After making the changes your code may look like this:

import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
 source = session.get(url).text
 main_app = parse_content(source)
 print(main_app)
 for linked_app in get_linked_app_links(source):
 print(linked_app)
def get_linked_app_links(source):
 soup = BeautifulSoup(source, "html.parser")
 for title in soup.select("a.name"):
 linked_app = get_app_data(title.get("href"))
 yield linked_app
def get_app_data(url):
 source = session.get(url).text
 return parse_content(source)
def parse_content(source):
 broth = BeautifulSoup(source, "html.parser")
 item = {
 "app_name": broth.select("h1[itemprop=name]")[0].text,
 "developer": broth.select("div.left h2")[0].text,
 "price": broth.select("div.price")[0].text
 }
 return item

In addition to the above point you could make few more improvements:

Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

Use __name__ == '__main__' to prevent your code from running when it is imported as a module.

After making the changes your code may look like this:

import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
 source = session.get(url).text
 main_app = parse_content(source)
 print(main_app)
 for linked_app in get_linked_app_links(source):
 print(linked_app)
def get_linked_app_links(source):
 soup = BeautifulSoup(source, "html.parser")
 for title in soup.select("a.name"):
 linked_app = get_app_data(title.get("href"))
 yield linked_app
def get_app_data(url):
 source = session.get(url).text
 return parse_content(source)
def parse_content(source):
 broth = BeautifulSoup(source, "html.parser")
 item = {
 "app_name": broth.select("h1[itemprop=name]")[0].text,
 "developer": broth.select("div.left h2")[0].text,
 "price": broth.select("div.price")[0].text
 }
 return item

In addition to the above point you could make few more improvements:

Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Use __name__ == '__main__' to prevent your code from running when it is imported as a module.

After making the changes your code may look like this:

import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
 source = session.get(url).text
 main_app = parse_content(source)
 print(main_app)
 for linked_app in get_linked_app_links(source):
 print(linked_app)
def get_linked_app_links(source):
 soup = BeautifulSoup(source, "html.parser")
 for title in soup.select("a.name"):
 linked_app = get_app_data(title.get("href"))
 yield linked_app
def get_app_data(url):
 source = session.get(url).text
 return parse_content(source)
def parse_content(source):
 broth = BeautifulSoup(source, "html.parser")
 item = {
 "app_name": broth.select("h1[itemprop=name]")[0].text,
 "developer": broth.select("div.left h2")[0].text,
 "price": broth.select("div.price")[0].text
 }
 return item

Source Link

answered Aug 28, 2017 at 14:45

Ashwini Chaudhary

answered Aug 28, 2017 at 14:45

Ashwini Chaudhary

3.2k
16
18

In addition to the above point you could make few more improvements:

Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:

The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).

Use __name__ == '__main__' to prevent your code from running when it is imported as a module.

After making the changes your code may look like this:

import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
 source = session.get(url).text
 main_app = parse_content(source)
 print(main_app)
 for linked_app in get_linked_app_links(source):
 print(linked_app)
def get_linked_app_links(source):
 soup = BeautifulSoup(source, "html.parser")
 for title in soup.select("a.name"):
 linked_app = get_app_data(title.get("href"))
 yield linked_app
def get_app_data(url):
 source = session.get(url).text
 return parse_content(source)
def parse_content(source):
 broth = BeautifulSoup(source, "html.parser")
 item = {
 "app_name": broth.select("h1[itemprop=name]")[0].text,
 "developer": broth.select("div.left h2")[0].text,
 "price": broth.select("div.price")[0].text
 }
 return item

lang-py