0

I'm trying to scrape the links for the top 10 articles on medium each day - by the looks of it, it seems like all the article links are in the class "postArticle-content," but when I run this code, I only get the top 3. Is there a way to get all 10?

from bs4 import BeautifulSoup
import requests
r = requests.get("https://medium.com/browse/726a53df8c8b")
data = r.text
soup = BeautifulSoup(data)
data = soup.findAll('div', attrs={'class' : 'postArticle-content'}) 
for div in data:
 links = div.findAll('a')
 for link in links:
 print(link.get('href'))
asked Feb 27, 2017 at 22:38

1 Answer 1

1

requests gave you the entire results.

That page contains only the first three. The website's design is to use javascript code, running in the browser, to load additional content and add it to the page.

You need an entire web browser, with a javascript engine, to do what you are trying to do. The requests and beautiful-soup libraries are not a web browser. They are merely an implementation of the HTTP protocol and an HTML parser, respectively.

answered Feb 27, 2017 at 22:48
Sign up to request clarification or add additional context in comments.

2 Comments

That makes sense - would feedparser or selenium be the sort of library to take a look at? stackoverflow.com/questions/28499274/…
If the site provides you with an Atom or RSS feed, then using that (with feedparser) would be suitable. Selenium would also be suitable, as it lets you easily automate the operation of a complete browser.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.