1

I want to crawl bets of bookmakers directly from their webpages. Currently I try to get the quotes from a provider called unibet.com. The problem: I need to send a post request in order to get an appropriate filtering of the quotes I want.

Therefore I go to the following webpage https://www.unibet.com/betting/grid/all-football/germany/bundesliga/1000094994.odds# where in the upper part of the bets section are several checkboxes. I uncheck every box instead of "Match". Then I click on the update Button and recorded the post request with chrome. The following screenshot demonstrates what is being sent:

enter image description here

After that I get a filtered result that only contains the quotes for a match.

Now, I just want to have these quotes. Therefore I wrote the following python code:

 req = urllib2.Request( 'https://www.unibet.com/betting/grid/grid.do?eventGroupIds=1000094994' )
 req.add_header("Content-type", "application/x-www-form-urlencoded")
 post_data = [ ('format','iframe'),
 ('filtered','true'),
 ('gridSelectedTab','1'),
 ('_betOfferCategoryTab.filterOptions[1_604139].checked','true'),
 ('betOfferCategoryTab.filterOptions[1_604139].checked','on'),
 ('_betOfferCategoryTab.filterOptions[1_611318].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_611319].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_611321].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_604144].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_624677].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_604142].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_604145].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_611322].checked','false'),
 ('_betOfferCategoryTab.filterOptions[1_604148].checked','false'),
 ('gridSelectedTimeframe','')]
 post_data = urllib.urlencode(post_data)
 req.add_header('Content-Length', len(post_data ))
 resp = urllib2.urlopen(req, post_data )
 html = resp.read()

The problem: Instead of a filtered result I get the full list of all quotes and bet types as if all checkboxes had been checked. I do not understand why my python request returns the unfiltered data?

asked Sep 16, 2012 at 14:06
0

1 Answer 1

1

The site stores your preferences in a session cookie. Because you're not capturing and sending the appropriate cookie, upon updating the site presents its default results.

Try this:

import cookielib
cookiejar = cookielib.CookieJar()
opener = urllib2.build_opener(
 urllib2.HTTPRedirectHandler(),
 urllib2.HTTPHandler(debuglevel=0),
 urllib2.HTTPSHandler(debuglevel=0),
 urllib2.HTTPCookieProcessor(cookiejar),
)

Now, instead of using urllib2.open() just call opener as a function call: opener() and pass your args.

answered Nov 1, 2012 at 16:42
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.