I have to scrape a website that uses javascript to display content. I have to use standard libs only as I will run this script on a server where there is not any browser. I have found selenium but it requires a browser that in my case is not possible to install.
Any idea or solution?
-
Why don't you rely on Scrapy for doing the task? Avoid reinventing the wheel.narko– narko2015年09月18日 07:11:55 +00:00Commented Sep 18, 2015 at 7:11
-
You can use Requests library.Vikas Ojha– Vikas Ojha2015年09月18日 07:12:18 +00:00Commented Sep 18, 2015 at 7:12
-
Scarpy , Beautifulsoup are pretty good libraries for the sameTushar Gupta– Tushar Gupta2015年09月18日 07:41:35 +00:00Commented Sep 18, 2015 at 7:41
-
1These modules (Requests,Beautifulsoup) could not do itHafiz Muhammad Shafiq– Hafiz Muhammad Shafiq2015年09月18日 07:59:43 +00:00Commented Sep 18, 2015 at 7:59
-
@Shafiq Do you mind if I ask why requests and bs4 couldn't complete the task? These would have been my first go-to solutions.pmccallum– pmccallum2015年09月18日 08:09:25 +00:00Commented Sep 18, 2015 at 8:09
2 Answers 2
Have a look at Ghost.py http://jeanphix.me/Ghost.py/. It doesn't require a browser.
pip install Ghost.py
from ghost import Ghost
ghost = Ghost()
page, resources = ghost.open('http://stackoverflow.com/')
Comments
You didn't mention anything about how the website is using javascript, but if it uses AJAX requests that are triggered after any kind of user interaction, you will need to use something like Selenium to automatize that behaviour. Here, you can find a short tutorial of how to scrape with Scrapy + Selenium. This of course requires a browser previously installed in your machine.