Return to Question

Commonmark migration

edited Jun 10, 2020 at 13:24

###Class### Most obvious change of all, from the code from the previous question, is that I have moved everything into a class. This is so I can eventually make this into a handy enough module.

Class

###get_results### TheMost obvious change of all, from the code from the previous question, is that I have moved everything into a class. This is so I can eventually make this into a handy enough module.

`get_results`

The for loop, which pulls the pieces of information from the source code is now part of a get_results method.

OrderedDict & Serializing

###OrderedDict & Serializing### ToTo keep order of the games, I've implemented an OrderedDict, as regular dictionaries don't preserve the key order. Then, the match_results OrderedDict is dumped into JSON text. The data can be easily represented, as seen in the print_results method.

#Code#

Code

###Class### Most obvious change of all, from the code from the previous question, is that I have moved everything into a class. This is so I can eventually make this into a handy enough module.

###get_results### The for loop, which pulls the pieces of information from the source code is now part of a get_results method.

###OrderedDict & Serializing### To keep order of the games, I've implemented an OrderedDict, as regular dictionaries don't preserve the key order. Then, the match_results OrderedDict is dumped into JSON text. The data can be easily represented, as seen in the print_results method.

#Code#

Class

Most obvious change of all, from the code from the previous question, is that I have moved everything into a class. This is so I can eventually make this into a handy enough module.

`get_results`

The for loop, which pulls the pieces of information from the source code is now part of a get_results method.

OrderedDict & Serializing

To keep order of the games, I've implemented an OrderedDict, as regular dictionaries don't preserve the key order. Then, the match_results OrderedDict is dumped into JSON text. The data can be easily represented, as seen in the print_results method.

Code

Source Link

asked Jul 14, 2017 at 13:07

Luke

asked Jul 14, 2017 at 13:07

Luke

1.1k
7
18

Serializing output of a match result web scraper

This is a follow up to a previous question.

As part of learning the object oriented approach and web scraping in Python, I've set out to write a program that will give me match results of professional Counter-Strike games, in order that appears on hltv.org. At first I just wanted a simple script that will download the website and get the results to print them out but I decided I don't have to stop there.

The program goes through the source code to find today's match results. Then, pieces of information like the winning team and their score are pulled out of that source code so they can be represented in certain ways.

I'd greatly appreciate feedback, so I can know what's good and what isn't about this code. If there is any improvements that could be implemented, I'd be eager to learn about them.

Changes

First I'd like to thank user alecxe for his useful feedback and directing me on the right track.

###Class### Most obvious change of all, from the code from the previous question, is that I have moved everything into a class. This is so I can eventually make this into a handy enough module.

###get_results### The for loop, which pulls the pieces of information from the source code is now part of a get_results method.

In the first versions, I've completely omitted the possibility of a match ending in a tie. This can only happen if the match has a best-of-two format. The format is rather uncommon and it's usually adopted in group stages of smaller tournaments.

It came to me when I was trying to run the code and I got an unexpected AttributeError. It took me a while to realise the code wasn't suddenly broken; the tags in the source code simply change, from team team-won and team to team and team . As the for loop was looking for team team-won specifically, the search would return None and the error would raise.

I'm not really comfortable for catching that particular error, but for now it works the way I want it to. If anyone knows a better way, I'd appreciate some feedback on it.

I'm not really sure if this is the most efficient way, I know it works just fine for this purpose. I haven't really done much with JSON text before.

#Code#

#!/usr/bin/env python3
import json
from collections import OrderedDict
from time import localtime, strftime
import requests
from bs4 import BeautifulSoup
class ResultScraper:
 MAPS = {
 'mrg': 'Mirage',
 'trn': 'Train',
 'ovp': 'Overpass',
 'inf': 'Inferno',
 'cch': 'Cache',
 'cbl': 'Cobblestone',
 'nuke': 'Nuke',
 'bo2': 'Best-of-two',
 'bo3': 'Best-of-three',
 'bo5': 'Best-of-five',
 '-': 'Default win'
 }
 def __init__(self, stars=0):
 self.url = 'https://www.hltv.org/results'
 self.date = strftime('%d %B %Y')
 if isinstance(stars, int) and 1 <= stars <= 5:
 self.stars = stars
 self.url += '?stars={}'.format(self.stars)
 def scrape(self):
 source = requests.get(self.url).text
 return BeautifulSoup(source, 'lxml')
 def check_match_dates(self, tag):
 result_tag = tag.name == 'div' and 'result-con' in tag.get('class', [])
 if not result_tag:
 return False
 timestamp = int(tag['data-zonedgrouping-entry-unix']) / 1000
 return strftime('%d %B %Y', localtime(timestamp)) == self.date
 def get_results(self):
 match_results = OrderedDict()
 soup = self.scrape()
 for result in soup(self.check_match_dates):
 timestamp = result['data-zonedgrouping-entry-unix']
 event = result.select_one('.event-name').get_text()
 map_played = result.select_one('.map-text').get_text()
 try:
 winning_team = result.select_one('.team.team-won').get_text()
 winning_team_score = result.select_one('.score-won').get_text()
 losing_team = result.select_one('.team.').get_text()
 losing_team_score = result.select_one('.score-lost').get_text()
 except AttributeError:
 winning_team = result.select_one('.team1').get_text(strip=True)
 losing_team = result.select_one('.team2').get_text(strip=True)
 winning_team_score = result.select_one('.score-tie').get_text()
 losing_team_score = winning_team_score
 match_results[timestamp] = {
 'winning_team': winning_team,
 'winning_team_score': winning_team_score,
 'losing_team': losing_team,
 'losing_team_score': losing_team_score,
 'event': event,
 'map': self.MAPS[map_played]
 }
 return json.dumps(match_results, indent=4, separators=(',', ':'))
 def print_results(self):
 results = json.loads(self.get_results(), object_pairs_hook=OrderedDict)
 if not results:
 print('No match results for {}'.format(self.date))
 else:
 for match in results.values():
 print('{winning_team:>20} {winning_team_score:<2} - '
 '{losing_team_score:>2} {losing_team:<20}'
 ' {map:<13}'.format(**match))
 print('\nCS:GO match results for {}'.format(self.date))
 print('Powered by HLTV.org')
if __name__ == '__main__':
 rs = ResultScraper()
 rs.print_results()

lang-py