1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

How to scrape javascript webpage using python standard libs only

Asked 10 years, 3 months ago

Viewed 62 times

I have to scrape a website that uses javascript to display content. I have to use standard libs only as I will run this script on a server where there is not any browser. I have found selenium but it requires a browser that in my case is not possible to install.

Any idea or solution?

Improve this question

edited Sep 18, 2015 at 7:38

narko's user avatar

narko

3,9451 gold badge31 silver badges36 bronze badges

asked Sep 18, 2015 at 7:07

Hafiz Muhammad Shafiq's user avatar

Hafiz Muhammad Shafiq

8,71613 gold badges70 silver badges140 bronze badges

Why don't you rely on Scrapy for doing the task? Avoid reinventing the wheel.

narko
– narko

2015年09月18日 07:11:55 +00:00
Commented Sep 18, 2015 at 7:11
You can use Requests library.

Vikas Ojha
– Vikas Ojha

2015年09月18日 07:12:18 +00:00
Commented Sep 18, 2015 at 7:12
Scarpy , Beautifulsoup are pretty good libraries for the same

Tushar Gupta
– Tushar Gupta

2015年09月18日 07:41:35 +00:00
Commented Sep 18, 2015 at 7:41
1

These modules (Requests,Beautifulsoup) could not do it

Hafiz Muhammad Shafiq
– Hafiz Muhammad Shafiq

2015年09月18日 07:59:43 +00:00
Commented Sep 18, 2015 at 7:59
@Shafiq Do you mind if I ask why requests and bs4 couldn't complete the task? These would have been my first go-to solutions.

pmccallum
– pmccallum

2015年09月18日 08:09:25 +00:00
Commented Sep 18, 2015 at 8:09

Add a comment |

2 Answers 2

Sorted by: Reset to default

Have a look at Ghost.py http://jeanphix.me/Ghost.py/. It doesn't require a browser.

pip install Ghost.py
from ghost import Ghost
ghost = Ghost()
page, resources = ghost.open('http://stackoverflow.com/')

Improve this answer

answered Sep 18, 2015 at 8:00

ram hemasri's user avatar

ram hemasri

1,63611 silver badges14 bronze badges

Comments

You didn't mention anything about how the website is using javascript, but if it uses AJAX requests that are triggered after any kind of user interaction, you will need to use something like Selenium to automatize that behaviour. Here, you can find a short tutorial of how to scrape with Scrapy + Selenium. This of course requires a browser previously installed in your machine.

Improve this answer

answered Sep 18, 2015 at 8:11

narko's user avatar

narko

3,9451 gold badge31 silver badges36 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

How to scrape javascript webpage using python standard libs only

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related