8

Is there any programming libraries available that will parse an HTML document, execute JavaScript and then allow me to navigate the DOM? This needs to be performed server side, not client side. Any language will do, but Java, PHP, or Ruby are preferred.

asked Jan 26, 2010 at 20:10

9 Answers 9

6
answered Jan 26, 2010 at 20:25
Sign up to request clarification or add additional context in comments.

2 Comments

+1 Forgot about that one... On my Mac I'd just use Python's Applescript capability to run the JS straight on Safari though.
Links for updated community version: envjs.com and github.com/thatcher/env-js
2

in java: http://lobobrowser.org/cobra/java-html-parser.jsp
this is a a Javascript-aware, CSS-aware HTML parser
the most important feature in relation to your question: It is Javascript-aware. DOM modifications that occur during parsing will be reflected in the resulting DOM.

answered Jan 26, 2010 at 20:32

1 Comment

Link off, mirror please.
2

Java has support for javascript with Rhino, also look at this page for server side javascript solutions: http://en.wikipedia.org/wiki/Server-side_JavaScript

answered Jan 26, 2010 at 20:23

Comments

1

For Java, be sure to check out HtmlUnit and HttpUnit.

answered Jan 26, 2010 at 21:59

Comments

1

PhantomJS does this and can be used with any server side language. See some integration modules below for NodeJS and PHP

NodeJS

https://npmjs.org/package/node-phantom

https://github.com/sgentle/phantomjs-node

PHP

https://github.com/diggin/php-PhantomjsRunner

answered Mar 29, 2013 at 21:16

Comments

0

PHP has DOMDocument for navigating the DOM. I haven't heard of anything for executing JavaScript.

answered Jan 26, 2010 at 20:20

Comments

0

Start from this post and follow a links. Or just search for Rhino.

answered Jan 26, 2010 at 20:48

1 Comment

Oh... same link as Luca Matteis gave... Sorry!
0

There are now several projects that do a really good job of this:

  • PhantomJS is a headless version of WebKit, and there are some helpful wrappers such as CasperJS.

  • Zombie.js which is a wrapper over jsdom written in Javascript (Node.js).

You need to write JavaScript code to interact with both of these projects. I like Zombie.js better so far, since it is easier to set up, and you can use any Node.js/npm modules in your code.

answered Feb 26, 2014 at 16:58

Comments

0

node.js ?

Node can run any javascript file in its console. I would try node first & see if it can do what you want as it likely has the largest user base & documentation.

answered Sep 10, 2016 at 22:52

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.