Skip to main content
Code Review

Return to Revisions

2 of 2
Commonmark migration

There are at least three things that may help to make the code more efficient:

  • switch to lxml instead of html.parser (requires lxml to be installed)
  • use a SoupStrainer to parse only the relevant part of the document
  • you can switch to http instead of https. While this would bring the security aspect down, you would avoid overhead of SSL handshaking, encryption etc - I've noticed the execution time difference locally, try it out

Improved code:

import requests
from bs4 import BeautifulSoup, SoupStrainer
page = requests.get('http://www.msn.com/en-sg/money/topstories/10-top-stocks-of-2017/ar-BBGgEyA?li=AA54rX&ocid=spartandhp')
parse_only = SoupStrainer("body")
soup = BeautifulSoup(page.content, 'lxml', parse_only=parse_only) 
with open('filename.txt', mode='wt', encoding='utf-8') as file:
 title = soup.find('h1').text.strip()
 file.write(title + ' \n')
 for p_tag in soup.select('p') :
 file.write(p_tag.get_text() + '\n')

Note that I've also removed the unused variables and imports.

Btw, if it weren't for the title, we could've pinpointed SoupStrainer to p elements only - might've improved performance even more.

alecxe
  • 17.5k
  • 8
  • 52
  • 93
default

AltStyle によって変換されたページ (->オリジナル) /