1

Can anyone direct me to a good Python screen scraping library for javascript code (hopefully one with good documentation/tutorials)? I'd like to see what options are out there, but most of all the easiest to learn with fastest results... wondering if anyone had experience. I've heard some stuff about spidermonkey, but maybe there are better ones out there?

Specifically, I use BeautifulSoup and Mechanize to get to here, but need a way to open the javascript popup, submit data, and download/parse the results in the javascript popup.

<a href="javascript:openFindItem(12510109)" onclick="s_objectID=&quot;javascript:openFindItem(12510109)_1&quot;;return this.s_oc?this.s_oc(e):true">Find Item</a>

I'd like to implement this with Google App engine and Django. Thanks!

asked May 28, 2010 at 2:38

3 Answers 3

1

What I usually do is automate an actual browser in these cases, and grab the processed HTML from there.

Edit:

Here's an example of automating InternetExplorer to navigate to a URL and grab the title and location after the page loads.

from win32com.client import Dispatch
from ctypes import Structure, pointer, windll
from ctypes import c_int, c_long, c_uint
import win32con
import pywintypes
class POINT(Structure):
 _fields_ = [('x', c_long),
 ('y', c_long)]
 def __init__( self, x=0, y=0 ):
 self.x = x
 self.y = y
class MSG(Structure):
 _fields_ = [('hwnd', c_int),
 ('message', c_uint),
 ('wParam', c_int),
 ('lParam', c_int),
 ('time', c_int),
 ('pt', POINT)]
def wait_until_ready(ie):
 pMsg = pointer(MSG())
 NULL = c_int(win32con.NULL)
 while True:
 while windll.user32.PeekMessageW(pMsg, NULL, 0, 0, win32con.PM_REMOVE) != 0:
 windll.user32.TranslateMessage(pMsg)
 windll.user32.DispatchMessageW(pMsg)
 if ie.ReadyState == 4:
 break
ie = Dispatch("InternetExplorer.Application")
ie.Visible = True
ie.Navigate("http://google.com/")
wait_until_ready(ie)
print "title:", ie.Document.Title
print "location:", ie.Document.location
answered May 28, 2010 at 3:08
Sign up to request clarification or add additional context in comments.

3 Comments

Is this similar to selenium? I've tried automating this way, but am having some trouble with the generated python source code. I'd need to follow all javascript links of this type and download/parse data from each
I just automate the browser directly. On Windows, you can do this with Internet Explorer, or in a cross-platform way with WebKit.
I think that Selenium (seleniumhq.org) with Firefox is the way to go in Linux.
1

I use the Python bindings to webkit to render basic JavaScript and Chickenfoot for more advanced interactions. See this webkit example for more info.

answered May 28, 2010 at 14:37

Comments

1

You can also use a "programatic web browser" named Spynner. I found this to be the best solution. Relatively easy to use.

answered May 30, 2011 at 6:13

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.