SUBMIT "ACCEPT" button - Python - beautifulsoap

peter.neumaier at gmail.com peter.neumaier at gmail.com
Wed Jan 28 11:36:45 EST 2015


I am totally new to Python and please accept my apologies upfront for potential newbie errors. I am trying to parse a 'simple' web page: http://flow.gassco.no/
When opening the page first time in my browser I need to confirm T&C with an accept button. After accepting T&C I would like to scrape some data from that follow up page. It appears that when opening in a browser directly http://flow.gassco.no/acceptDisclaimer I would get around that T&C.
But not when I open the URL via beautifulsoap
My parsing/scraping tool is implemented in bs, but I fail to parse the content as I am not getting around T&C. When printing "response.text" from BS, I get below code. How do I get around this form for accepting terms & conditions so that I can parse/scrape data from that page?
Here is what I am doing:
#!/usr/bin/env python 
import requests 
import bs4 
index_url='http://flow.gassco.no/acceptDisclaimer'
def get_video_page_urls(): 
response = requests.get(index_url) 
soup = bs4.BeautifulSoup(response.text) 
return soup 
print(get_video_page_urls()) 
++++
PRINTOUT from response.text:
 <form action="acceptDisclaimer" method="get">
 <input class="accept" type="submit" value="Accept"/>
 <input class="decline" name="decline" onclick="window.location ='http://www.gassco.no'" type="button" value="Decline"/>
 </form></div></div></div></div></div>
 <script type="text/javascript">
 var _gaq = _gaq || [];
 _gaq.push(['_setAccount', 'UA-30727768-1']);
 _gaq.push(['_trackPageview']);
 (function() {
 var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
 ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
 var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
 })();
</script>


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /