Instead of having a get_content
function you could add a function that only parses a source passed to it, now it can be either the main page or the suggested apps. Note that you're requesting for main page's content twice even though you had it already.
In addition to the above point you could make few more improvements:
Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
- Use
__name__ == '__main__'
to prevent your code from running when it is imported as a module.Use
__name__ == '__main__'
to prevent your code from running when it is imported as a module.
After making the changes your code may look like this:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
source = session.get(url).text
main_app = parse_content(source)
print(main_app)
for linked_app in get_linked_app_links(source):
print(linked_app)
def get_linked_app_links(source):
soup = BeautifulSoup(source, "html.parser")
for title in soup.select("a.name"):
linked_app = get_app_data(title.get("href"))
yield linked_app
def get_app_data(url):
source = session.get(url).text
return parse_content(source)
def parse_content(source):
broth = BeautifulSoup(source, "html.parser")
item = {
"app_name": broth.select("h1[itemprop=name]")[0].text,
"developer": broth.select("div.left h2")[0].text,
"price": broth.select("div.price")[0].text
}
return item
Instead of having a get_content
function you could add a function that only parses a source passed to it, now it can be either the main page or the suggested apps. Note that you're requesting for main page's content twice even though you had it already.
In addition to the above point you could make few more improvements:
Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
- Use
__name__ == '__main__'
to prevent your code from running when it is imported as a module.
After making the changes your code may look like this:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
source = session.get(url).text
main_app = parse_content(source)
print(main_app)
for linked_app in get_linked_app_links(source):
print(linked_app)
def get_linked_app_links(source):
soup = BeautifulSoup(source, "html.parser")
for title in soup.select("a.name"):
linked_app = get_app_data(title.get("href"))
yield linked_app
def get_app_data(url):
source = session.get(url).text
return parse_content(source)
def parse_content(source):
broth = BeautifulSoup(source, "html.parser")
item = {
"app_name": broth.select("h1[itemprop=name]")[0].text,
"developer": broth.select("div.left h2")[0].text,
"price": broth.select("div.price")[0].text
}
return item
Instead of having a get_content
function you could add a function that only parses a source passed to it, now it can be either the main page or the suggested apps. Note that you're requesting for main page's content twice even though you had it already.
In addition to the above point you could make few more improvements:
Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Use
__name__ == '__main__'
to prevent your code from running when it is imported as a module.
After making the changes your code may look like this:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
source = session.get(url).text
main_app = parse_content(source)
print(main_app)
for linked_app in get_linked_app_links(source):
print(linked_app)
def get_linked_app_links(source):
soup = BeautifulSoup(source, "html.parser")
for title in soup.select("a.name"):
linked_app = get_app_data(title.get("href"))
yield linked_app
def get_app_data(url):
source = session.get(url).text
return parse_content(source)
def parse_content(source):
broth = BeautifulSoup(source, "html.parser")
item = {
"app_name": broth.select("h1[itemprop=name]")[0].text,
"developer": broth.select("div.left h2")[0].text,
"price": broth.select("div.price")[0].text
}
return item
Instead of having a get_content
function you could add a function that only parses a source passed to it, now it can be either the main page or the suggested apps. Note that you're requesting for main page's content twice even though you had it already.
In addition to the above point you could make few more improvements:
Make the code PEP 8 compatible. Currently you will see decent number of issues if you run it on http://pep8online.com/.
Use session to re-use connections. Since we are making requests to the same host this would speed up the requests. From docs:
The Session object allows you to persist certain parameters across requests. It also persists cookies across all requests made from the Session instance, and will use urllib3's connection pooling. So if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
- Use
__name__ == '__main__'
to prevent your code from running when it is imported as a module.
After making the changes your code may look like this:
import requests
from bs4 import BeautifulSoup
session = requests.Session()
def get_links(url):
source = session.get(url).text
main_app = parse_content(source)
print(main_app)
for linked_app in get_linked_app_links(source):
print(linked_app)
def get_linked_app_links(source):
soup = BeautifulSoup(source, "html.parser")
for title in soup.select("a.name"):
linked_app = get_app_data(title.get("href"))
yield linked_app
def get_app_data(url):
source = session.get(url).text
return parse_content(source)
def parse_content(source):
broth = BeautifulSoup(source, "html.parser")
item = {
"app_name": broth.select("h1[itemprop=name]")[0].text,
"developer": broth.select("div.left h2")[0].text,
"price": broth.select("div.price")[0].text
}
return item