I am scraping a pool of URLs with a python/selenium framework using Firefox and Geckodriver.
For each URL, all I am doing is to find one element by class name
and save it to disk. In order speed this up as much as possible, I am creating a custom Firefox profile with the specs defined below in firefox_profile.set_preference
.
The challenge is that testing all combinations of all Firefox profile parameters in order to know which specs are best for speed is laborious. Therefore, I am hoping to get some input.
Are there any preferences parameters beyond the ones mentioned below that also would speed up Firefox?
Is any of those preference parameters mentioned below not suitable to speed up Firefox?
def create_driver():
firefox_profile = webdriver.FirefoxProfile()
firefox_profile.set_preference('browser.download.animateNotifications', False)
firefox_profile.set_preference('browser.fullscreen.animate', False)
firefox_profile.set_preference('browser.preferences.animateFadeIn', False)
firefox_profile.set_preference('browser.tabs.animate', False)
firefox_profile.set_preference('browser.cache.use_new_backend', 1)
firefox_profile.set_preference('browser.sessionhistory.max_total_viewers', 0)
firefox_profile.set_preference('browser.safebrowsing.enabled', False)
firefox_profile.set_preference('browser.shell.checkDefaultBrowser', False)
firefox_profile.set_preference('browser.startup.page', 0)
firefox_profile.set_preference('layout.animated-image-layers.enabled', False)
firefox_profile.set_preference('extensions.checkCompatibility', False)
firefox_profile.set_preference('extensions.checkUpdateSecurity', False)
firefox_profile.set_preference('extensions.logging.enabled', False)
firefox_profile.set_preference('extensions.update.autoUpdateEnabled', False)
firefox_profile.set_preference('extensions.update.enabled', False)
firefox_profile.set_preference('print.postscript.enabled', False)
firefox_profile.set_preference('toolkit.storage.synchronous', 0)
firefox_profile.set_preference('image.animation_mode', 'none')
firefox_profile.set_preference('images.dither', False)
firefox_profile.set_preference('content.notify.interval', 1000000)
firefox_profile.set_preference('content.switch.treshold', 100000)
firefox_profile.set_preference('nglayout.initialpaint.delay', 1000000)
firefox_profile.set_preference('network.dnscacheentries', 200)
firefox_profile.set_preference('network.dnscacheexpiration', 600)
firefox_profile.set_preference('network.prefetch-next', False)
firefox_profile.set_preference('permissions.default.image', False)
firefox_profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', False)
firefox_profile.set_preference('dom.ipc.plugins.flash.disable-protected-mode', False)
firefox_profile.set_preference('app.update.enabled', False)
firefox_profile.set_preference('app.update.service.enabled', False)
firefox_profile.set_preference('app.update.auto', False)
firefox_profile.set_preference('app.update.staging', False)
firefox_profile.set_preference('app.update.silent', False)
firefox_profile.set_preference("javascript.enabled", False)
driver = webdriver.Firefox(firefox_profile = firefox_profile)
return driver
1 Answer 1
Regardless of if these settings are correct, or if you could get by without a webbrowser, this is not a very good way to set all these values. At the very least extract the key, value pairs to a dictionary, which you ideally read from a config file.
import webdriver
SETTINGS = {'browser.download.animateNotifications': False,
...}
def create_driver():
firefox_profile = webdriver.FirefoxProfile()
for key, value in SETTINGS.items():
firefox_profile.set_preference(key, value)
driver = webdriver.Firefox(firefox_profile = firefox_profile)
return driver
-
\$\begingroup\$ true - apologies - just edited the type and thanks for the improved syntax \$\endgroup\$sudonym– sudonym2017年12月14日 04:14:15 +00:00Commented Dec 14, 2017 at 4:14
-
\$\begingroup\$ I just changed the syntax accordingly and can't confirm that this has any impact on scraping performance speed-wise \$\endgroup\$sudonym– sudonym2017年12月14日 09:26:26 +00:00Commented Dec 14, 2017 at 9:26
-
\$\begingroup\$ @suonym And it should not. It was a comment on coding style, not a speed improvement. \$\endgroup\$Graipher– Graipher2017年12月14日 09:29:57 +00:00Commented Dec 14, 2017 at 9:29
Explore related questions
See similar questions with these tags.
requests
and specifying a User-Agent header (pretending to be a browser)? Thanks. \$\endgroup\$