I'm trying to scrape the links for the top 10 articles on medium each day - by the looks of it, it seems like all the article links are in the class "postArticle-content," but when I run this code, I only get the top 3. Is there a way to get all 10?
from bs4 import BeautifulSoup
import requests
r = requests.get("https://medium.com/browse/726a53df8c8b")
data = r.text
soup = BeautifulSoup(data)
data = soup.findAll('div', attrs={'class' : 'postArticle-content'})
for div in data:
links = div.findAll('a')
for link in links:
print(link.get('href'))
1 Answer 1
requests gave you the entire results.
That page contains only the first three. The website's design is to use javascript code, running in the browser, to load additional content and add it to the page.
You need an entire web browser, with a javascript engine, to do what you are trying to do. The requests and beautiful-soup libraries are not a web browser. They are merely an implementation of the HTTP protocol and an HTML parser, respectively.
2 Comments
Explore related questions
See similar questions with these tags.