I'm pretty new to web scraping and was wondering if it's possible to extract the information I need from a javascript app. Currently, I'm using beautifulsoup in python and am interested in this output from the html parser:
<p><script>
var acct = '488'; var loc = ''; var cat = ''; var stylesheet=''; var hideLastnames = true;
var jsHost = (("https:" == document.location.protocol) ? "https://" : "http://");
document.write("<scr"+"ipt src='"+jsHost+"ajax.googleapis.com/ajax/libs/jquery/1.7/jquery.min.js' type='text/javascript'></scr"+"ipt>");
document.write("<scr"+"ipt>var jQuery = jQuery.noConflict(true);</scr"+"ipt>");
document.write("<scr"+"ipt src='"+jsHost+"www.groupexpro.com/schedule/embed/schedule_embed_responsive.js.php?a="+acct+"' type='text/javascript'></scr"+"ipt>");
</script></p>
In the actual website (https://recreation.gocrimson.com/fitness/schedules), it looks like this. Ideally, I would want to store a json with all the information listed in the table. Has anyone had experience doing something similar?
-
1What is your code so far?johnashu– johnashu2020年04月21日 20:45:31 +00:00Commented Apr 21, 2020 at 20:45
-
You will probably need selenium if you want what the script actually loads..johnashu– johnashu2020年04月21日 20:46:32 +00:00Commented Apr 21, 2020 at 20:46
-
the question is unclear at allαԋɱҽԃ αмєяιcαη– αԋɱҽԃ αмєяιcαη2020年04月21日 22:28:30 +00:00Commented Apr 21, 2020 at 22:28
1 Answer 1
https://recreation.gocrimson.com/fitness/schedules requests a different URL to get the schedule data in JSONP format.
URL:https://www.groupexpro.com/schedule/embed/json.php?schedule&instructor_id=true&format=jsonp&a=488&location=&category=&start=1587380400&end=1587898800
Try to understand the URL and modify it to for your purpose.
Example
from bs4 import BeautifulSoup
import requests
import json
headers ={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0"}
page=requests.get("https://www.groupexpro.com/schedule/embed/json.php?schedule&instructor_id=true&format=jsonp&a=488",headers=headers)
#Extract json from jsonp
jsondata='{'+page.text.split('{')[1].split('}')[0]+'}'
#can also be loaded into python dict using
data=json.loads(jsondata)
Comments
Explore related questions
See similar questions with these tags.