I am trying to scrap data from here (using python 2.7):
http://financials.morningstar.com/valuation/earnings-estimates.html?t=AMD
When I right click and choose "View Page Sources" in Chrome browser, the content that I am looking for is not there. For example I am looking for "Average Rating".
I searched Stackoverflow and saw this question and answer:
Python 3, Web-scraping, and Javascript [Oh My]
But when I tried the main answer, I could not find any XMLHttpRequest function.
I appreciate any help on this.
-
1In the Firefox network inspector I see 3 AJAX requests (click "XHR" at the bottom).Martin Tournoij– Martin Tournoij2015年02月28日 23:39:56 +00:00Commented Feb 28, 2015 at 23:39
-
Thanks Carpetsmoker. Sure I used the Firefox and now I see a number of "Get" and "Post". How can I use this information now?TJ1– TJ12015年02月28日 23:46:11 +00:00Commented Feb 28, 2015 at 23:46
-
Similar for the network inspector in Chrome. Click on network, click the XHR filter, open the M* page, you'll see 3 XHR items, click on one of them in the left column (name), you'll then see a URL - copy that and go to the page in your browserlfoosion– foosion2015年02月28日 23:56:07 +00:00Commented Feb 28, 2015 at 23:56
-
Thanks foosion. Strangely I cannot see this in Chrome!TJ1– TJ12015年03月01日 00:04:08 +00:00Commented Mar 1, 2015 at 0:04
1 Answer 1
It looks like the data you want is pulled from
http://financials.morningstar.com/valuation/annual-estimate-list.action?&t=XNAS:AMD®ion=usa&culture=en-US&cur=&r=1425167484279.9668&_=1425167484280
http://financials.morningstar.com/valuation/analyst-opinion-list.action?&t=XNAS:AMD®ion=usa&culture=en-US&cur=&r=1425167484282.3906&_=1425167484282
http://financials.morningstar.com/valuation/forward-comparisons-list.action?&t=XNAS:AMD®ion=usa&culture=en-US&cur=&r=1425167484284.5396&_=1425167484284
You should be able to scrape these urls directly.
4 Comments
text/css (formatting), text/javascript (javascript - could include dynamic data but usually not), and image/gif (pictures). This leaves the main page (we have already established it does not hold the data you want), a favicon file with the wrong data type (should be image/x-icon), the three files listed above, and an application/json file (data for the commodity quotes ticker at the top of the page).