My task is to get a list of named functions from any web-page using Python.
I have a script written using JavaScript. It does what I need.
When page is loaded I can run the script from JS console (e.g. from dev-tools in GoogleChrome). I have the array of names of the functions as the result. Well, but I go to the page and execute the script from browser manually. But the question is to do the same from Python. It can look something like this:
def get_named_functions_list(url):
myscript = settings.get_js_code() # here I get script that I told above
tool.open(url)
while not tool.document.READY: # here I wait while the page will completely loaded
pass
js_result = tool.execute_from_console(myscript)
return list(js_result.values())
So, is there a tool in Python that helps to solve the problem automatically?
UPDATE: To be more clear I can divide the task to the list of subtasks (in Python):
- Request to the given url
- Waiting for document.ready(function...) will finished.
- Execute my JS-code (like in browser).
- Getting of result the JS-code returns.
-
Your question is unclear. may be you need scrappingbinu.py– binu.py2017年05月16日 19:39:32 +00:00Commented May 16, 2017 at 19:39
-
For this task, you'll probably need to use an HTML parser and a JavaScript parser.Anderson Green– Anderson Green2017年05月16日 19:53:48 +00:00Commented May 16, 2017 at 19:53
-
@binu.py, I've updated the topic to be more clear. Maybe it will help. As for scrapping, I does not need to get data from page. The key task is to execute JS in browser scope. I think, it should works like a simple python non-GUI browser or something like this.Bogdan– Bogdan2017年05月16日 19:57:31 +00:00Commented May 16, 2017 at 19:57
-
If you want to do it in backend i do not think it is possible. If you want to execute some function on load you may need to check with the templating language you are using. If this is for testing you will need python selenium to do your taskbinu.py– binu.py2017年05月16日 20:06:37 +00:00Commented May 16, 2017 at 20:06
-
@binu.py, for example, I have google.com, facebook for checking. I want to get information about what named JS functions the domain uses. So, I run my script, it makes request to the urls above and gives me two lists of strings. Every list contains from names of functions that available in JS scope.Bogdan– Bogdan2017年05月16日 20:30:43 +00:00Commented May 16, 2017 at 20:30
2 Answers 2
I have solved the problem with the using of selenium.
Then I have downloaded the PhantomJS driver to use selenium without a browser window and added it to PATH.
Finally, I use the following Python script:
from selenium import webdriver
myscript = settings.get_js_code() # here I get content of *.js file
driver = webdriver.PhantomJS()
driver.get(url)
result = driver.execute_script(myscript)
driver.quit()
Note: your script have to return something to get the result.
2 Comments
with open(path) as file: content = file.read(), where content is a Javascript function which returns something.I've developed a function using the selenium library and the Google Chrome driver (the PhantomJS driver is obsolete for the last version of selenium).
Here is the code:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
def execute_script_on_url(
page_url,
js_script,
load_wait_time=1,
script_wait_time=0
):
service = Service(ChromeDriverManager().install())
options = webdriver.ChromeOptions()
# Stop browser windows to pop up.
options.add_argument('--headless')
driver = webdriver.Chrome(service=service, options=options)
try:
# Open the web page and wait for it to load.
driver.get(page_url)
time.sleep(load_wait_time)
# Execute the JS script and wait for it to finish.
result = driver.execute_script(js_script)
time.sleep(script_wait_time)
return result
finally:
driver.quit()
This function takes the page_url and the js_script to execute as text. You can convert a file to text like this:
script = open('./the-file-route.js', 'r').read()
And then you can use the function like this: (for exmple)
result = execute_script_on_url('https://www.google.com/', script)
If the script returns something, then it will be stored in the result variable. For example, if your JavaScript code looks like this (foolish example):
return [1, 2, 3, 4];
Then if you print in Python the result:
print(result)
# Expected output: [1, 2, 3, 4]