I'm trying to extract, with python, some javascript variables from an HTML site:
<script>
var nData = new Array();
var Data = "5b7b......";
nData = CallInit(Data);
...
...
</script>
I can see the content of "nData" in firebug (DOM Panel) without problem:
[Object { height="532", width="1280", url="https://example.org...8EDA4F3F5F395B9&key=lh1", more...}, Object { height="266", width="640", url="https://example.org...8EDA4F3F5F395B9&key=lh1", more...}]
The content of nData is an URL. How can i parse/extract the content of nData to python? It's possible?
Thanks
-
Can you give us a link to the site?halex– halex2015年04月17日 09:23:31 +00:00Commented Apr 17, 2015 at 9:23
-
Do you have influence on the source code in a JS context before moving it to python? For example open webpage, insert a JS-write statement and save it as HTML. So you can write the variable as html first and then parse it via python.wenzul– wenzul2015年04月17日 09:24:51 +00:00Commented Apr 17, 2015 at 9:24
-
If not you need kind of javascript runtime environment. May checkout the answers of stackoverflow.com/questions/2346584/… and stackoverflow.com/questions/2894946/….wenzul– wenzul2015年04月17日 09:30:10 +00:00Commented Apr 17, 2015 at 9:30
-
@wenzul no, i'm only trying to extract the url from the site, and use it in a python script.Reat0ide– Reat0ide2015年04月17日 09:33:30 +00:00Commented Apr 17, 2015 at 9:33
1 Answer 1
With the help of the python library Ghost.py it should be possible to get a dynamic variable out of executed Javascript code.
I just tried it out with some small test site and got a Javascript variable named a which I use on that page as a python object. I did the following:
Install Ghost.py with
pip install Ghost.py.Install PySide (it's a prerequisite for Ghost.py) with
pip install PySide.Use the following python code:
from ghost import Ghost ghost = Ghost() ghost.open('https://dl.dropboxusercontent.com/u/13991899/test/index.html') js_variable, _ = ghost.evaluate('a', expect_loading=True) print js_variable
You should be able to get your variable nData into the python variable js_variable by opening your site with ghost.open and then call ghost.evaluate('nData').
3 Comments
CallInit and build your urls with python with Data as a parameter.