Pull values from javascript source in Python BeautifulSoup

Asked 5 years, 8 months ago

Viewed 89 times

I'm pretty new to web scraping and was wondering if it's possible to extract the information I need from a javascript app. Currently, I'm using beautifulsoup in python and am interested in this output from the html parser:

<p><script>
var acct = '488'; var loc = ''; var cat = ''; var stylesheet=''; var hideLastnames = true;
var jsHost = (("https:" == document.location.protocol) ? "https://" : "http://");
document.write("<scr"+"ipt src='"+jsHost+"ajax.googleapis.com/ajax/libs/jquery/1.7/jquery.min.js' type='text/javascript'></scr"+"ipt>");
document.write("<scr"+"ipt>var jQuery = jQuery.noConflict(true);</scr"+"ipt>");
document.write("<scr"+"ipt src='"+jsHost+"www.groupexpro.com/schedule/embed/schedule_embed_responsive.js.php?a="+acct+"' type='text/javascript'></scr"+"ipt>");
</script></p>

In the actual website (https://recreation.gocrimson.com/fitness/schedules), it looks like this. Ideally, I would want to store a json with all the information listed in the table. Has anyone had experience doing something similar?

Improve this question

edited Feb 13, 2021 at 16:22

DisappointedByUnaccountableMod's user avatar

DisappointedByUnaccountableMod

6,8444 gold badges21 silver badges23 bronze badges

asked Apr 21, 2020 at 20:37

jessica.wu's user avatar

jessica.wu

31 bronze badge

1

What is your code so far?

johnashu
– johnashu

2020年04月21日 20:45:31 +00:00
Commented Apr 21, 2020 at 20:45
You will probably need selenium if you want what the script actually loads..

johnashu
– johnashu

2020年04月21日 20:46:32 +00:00
Commented Apr 21, 2020 at 20:46
the question is unclear at all

αԋɱҽԃ αмєяιcαη
– αԋɱҽԃ αмєяιcαη

2020年04月21日 22:28:30 +00:00
Commented Apr 21, 2020 at 22:28

Add a comment |

1 Answer 1

Sorted by: Reset to default

https://recreation.gocrimson.com/fitness/schedules requests a different URL to get the schedule data in JSONP format.

URL:https://www.groupexpro.com/schedule/embed/json.php?schedule&instructor_id=true&format=jsonp&a=488&location=&category=&start=1587380400&end=1587898800

Try to understand the URL and modify it to for your purpose.

Example

from bs4 import BeautifulSoup
import requests
import json
headers ={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0"}
page=requests.get("https://www.groupexpro.com/schedule/embed/json.php?schedule&instructor_id=true&format=jsonp&a=488",headers=headers)
#Extract json from jsonp
jsondata='{'+page.text.split('{')[1].split('}')[0]+'}'
#can also be loaded into python dict using
data=json.loads(jsondata)

Improve this answer

edited Apr 21, 2020 at 21:35

answered Apr 21, 2020 at 21:28

inxp's user avatar

inxp

2495 silver badges17 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

Pull values from javascript source in Python BeautifulSoup

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related