Does Python3 have a JavaScript based scraping library that is not Selenium? I'm trying to scrape https://www.mailinator.com/v2/inbox.jsp?zone=public&query=test, but the inbox is loaded with JavaScript. The reason I don't want to use Selenium is I don't want it to open a window when I run it.
Here is my non-working code:
import requests
from bs4 import BeautifulSoup as soup
INBOX = "https://www.mailinator.com/v2/inbox.jsp?zone=public&query={}"
def check_inbox(name):
stuff = soup(requests.get(INBOX.format(name)).text,"html.parser")
print(stuff.find("ul",{"class":"single_mail-body"}))
check_inbox("retep")
Do any such libraries exist?
I couldn't find anything for the Google search python 3 javascript scraper outside of Selenium.
-
Possible duplicate of Web-scraping JavaScript page with PythonHum4n01d– Hum4n01d2017年10月23日 21:58:37 +00:00Commented Oct 23, 2017 at 21:58
-
@Hum4n01d this is python3, not python.Peter S– Peter S2017年10月23日 21:59:16 +00:00Commented Oct 23, 2017 at 21:59
-
I don't see why that would make a difference.Hum4n01d– Hum4n01d2017年10月23日 22:00:04 +00:00Commented Oct 23, 2017 at 22:00
-
different syntax, libraries aren't compatiblePeter S– Peter S2017年10月23日 22:00:31 +00:00Commented Oct 23, 2017 at 22:00
-
Ok, but overall the solution is still going to be the same. You need a library that renders the page with JavaScript before you start scraping.Hum4n01d– Hum4n01d2017年10月23日 22:02:06 +00:00Commented Oct 23, 2017 at 22:02
1 Answer 1
You don't need javascript actually, because it's client side, so you can emulate it.
If you inspect the webpage (developer tools > network), you'll see that there is a websocket connection to this :
wss://www.mailinator.com/ws/fetchinbox?zone=public&query=test
Now if you implement a websocket client using python, you'll be able to cleanly fetch your mails (see this : https://github.com/aaugustin/websockets/blob/master/example/client.py).
EDIT :
As mentioned by John, augustin's ws client repo is dead. Today I'd use this : https://websockets.readthedocs.io/en/stable/
7 Comments
websockets.exceptions.InvalidStatusCode: Status code not 101: 500import websockets, asyncio from bs4 import BeautifulSoup as soup INBOX = "wss://www.mailinator.com/ws/fetchinbox?zone=public&query=test" async def hello(): async with websockets.connect(INBOX) as ws: response = await ws.recv() print(response) asyncio.get_event_loop().run_until_complete(hello())Sec-WebSocket-Key. How would I go about generating one of these?