4
\$\begingroup\$

I'm pretty sure the script below can be refactored but I'm not sure how to go about it. Assuming I need to make use of inheritances but that's a skill I'm still working on. Any thoughts on how this script can be improved would be greatly appreciated. This is just a snippet of my project - the end goal is to retrieve data from multiple external API's, process/clean the data and export individual datasets to tables via mysqlalchemy.

import pandas as pd
import download
def parse_to_list(dic,k):
 """
 Function parse_to_list takes a dictionary and nested key to return
 portion of dictionary.
 """
 lst = []
 for x in dic['events']:
 if x['type'] == 'MATCH':
 for y in x['markets']:
 if k in y['displayName']:
 try:
 lst.append(y)
 except:
 pass
 return lst
def gamelist_parse(dic):
 event_id = dic['eventId']
 away_team = dic['selections'][0]['teamData']['teamAbbreviation']
 home_team = dic['selections'][1]['teamData']['teamAbbreviation']
 game_ref = f'{away_team}@{home_team}'
 return [(event_id,game_ref)]
def runline_parse(dic):
 event_id = dic['eventId']
 away_team = dic['selections'][0]['teamData']['teamAbbreviation']
 away_price = dic['selections'][0]['price']['a']
 home_team = dic['selections'][1]['teamData']['teamAbbreviation']
 home_price = dic['selections'][1]['price']['a']
 away_line = dic['line'] if away_price > home_price else dic['line'] * -1
 home_line = dic['line'] if away_price < home_price else dic['line'] * -1
 return [(away_team,event_id,away_price,away_line),(home_team,event_id,home_price,home_line)]
class Gamelist:
 display_name = 'Money Line'
 df_cols = ['event_id','game_ref']
 df_idx = 'event_id'
 parser = gamelist_parse
class Runline:
 display_name = 'Run Line'
 df_cols = ['team','event_id','price','line']
 df_idx = 'team'
 parser = runline_parse
class Parser:
 def __init__(self,dic,clsdata):
 self._dic = dic
 self._clsdata = clsdata
 self.to_list()
 self.parse()
 self.df = self.to_df()
 def to_list(self): 
 self.data = parse_to_list(self._dic,self._clsdata.display_name)
 def parse(self):
 self.data = [self._clsdata.parser(x) for x in self.data]
 self.data = [item for sublist in self.data for item in sublist]
 def to_df(self):
 return pd.DataFrame(self.data,columns=self._clsdata.df_cols).set_index(self._clsdata.df_idx)
 
#EXECUTION
games_dic = download.GamesDict('Baseball','MLB').main() #returns dictionary of data
gamelist = Parser(games_dic,Gamelist).df
runline = Parser(games_dic,Runline).df
print(gamelist)
asked Aug 3, 2022 at 2:12
\$\endgroup\$
1

1 Answer 1

4
\$\begingroup\$

Don't name variables x, y and k where more descriptive names are possible.

This exception-handling:

 try:
 lst.append(y)
 except:
 pass

first, shouldn't exist; but also will never actually catch any exceptions; there's nothing to fail here.

Add PEP484 type hints.

This code:

away_team = dic['selections'][0]['teamData']['teamAbbreviation']
home_team = dic['selections'][1]['teamData']['teamAbbreviation']

doesn't seem like it's going to succeed, because there are many rows for which there are no teamData dictionaries.

This:

away_line = dic['line'] if away_price > home_price else dic['line'] * -1
home_line = dic['line'] if away_price < home_price else dic['line'] * -1

can just negate -line rather than multiplying. Also, does this really do what you want in the case where the home and away prices are equal to each other? Can you assume that the home line is always the negative away line?

You may benefit from making Parser as an abstract parent of Gamelist and Runline, and varying the record generation by basic polymorphism.

You've omitted a section of your data processing, where you need to iterate over sports etc.

Suggested

from typing import Any, Iterator, Iterable
import pandas as pd
import requests
def get_data() -> list[dict[str, Any]]:
 with requests.get(
 url='https://www.williamhill.com/us/il/bet/api/v3/events/highlights',
 params={'promotedOnly': 'false'},
 headers={'Accept': 'application/json'},
 ) as resp:
 resp.raise_for_status()
 return resp.json()
def get_match_markets(dic: list[dict[str, Any]]) -> Iterator[dict[str, Any]]:
 for sport in dic:
 for competition in sport.get('competitions', ()):
 for event in competition.get('events', ()):
 if event.get('type') == 'MATCH':
 yield from event.get('markets', ())
class Parser:
 DISPLAY_NAME: str
 DF_COLS: tuple[str]
 DF_IDX: str
 def to_df(self, markets: Iterable[dict[str, Any]]) -> pd.DataFrame:
 df = pd.DataFrame.from_records(
 self._get_records(markets), columns=self.DF_COLS, index=self.DF_IDX,
 )
 return df
 @classmethod
 def _get_records(cls, markets: Iterable[dict[str, Any]]) -> Iterator[tuple]:
 for market in markets:
 if market['displayName'] == cls.DISPLAY_NAME:
 yield from cls._get_data(market)
 @classmethod
 def _get_data(cls, market: dict[str, Any]) -> tuple:
 raise NotImplementedError()
class Gamelist(Parser):
 DISPLAY_NAME = 'Money Line'
 DF_IDX = 'event_id'
 DF_COLS = (DF_IDX, 'game_ref')
 @staticmethod
 def team_abbrev(team: dict) -> str:
 return team.get('teamData', {}).get('teamAbbreviation', '')
 @classmethod
 def _get_data(cls, market: dict[str, Any]) -> tuple[tuple, ...]:
 event_id = market['eventId']
 away_team, home_team = market['selections'][:2]
 game_ref = (
 f'{cls.team_abbrev(away_team)}@{cls.team_abbrev(home_team)}'
 )
 return (event_id, game_ref),
class Runline(Parser):
 DISPLAY_NAME = 'Run Line'
 DF_IDX = 'team'
 DF_COLS = (DF_IDX, 'event_id', 'price', 'line')
 @classmethod
 def _get_data(cls, market: dict[str, Any]) -> tuple[tuple, ...]:
 event_id = market['eventId']
 away_team, home_team = market['selections'][:2]
 away_abbrev = away_team['teamData']['teamAbbreviation']
 home_abbrev = home_team['teamData']['teamAbbreviation']
 away_price = away_team['price']['a']
 home_price = home_team['price']['a']
 line = market['line']
 away_line = line if away_price > home_price else -line
 return (
 (away_abbrev, event_id, away_price, away_line),
 (home_abbrev, event_id, home_price, -away_line),
 )
def main() -> None:
 games_dic = get_data()
 markets = tuple(get_match_markets(games_dic))
 print(Gamelist().to_df(markets))
 print(Runline().to_df(markets))
if __name__ == '__main__':
 main()

Output

 game_ref
event_id 
ffb909b1-647e-4e55-8418-08185b91d110 CHC@STL
d4422840-d074-4285-ab8b-0e9a7efe50f4 LAD@SF
582f0d9b-68db-4c00-9cd5-a67b5e158b59 COL@SD
fb5729e1-7c18-4677-bc8e-2945138363ac OAK@LAA
64ad7ff0-c332-40ff-8b94-1ddbfd49cdbf LV@JAX
1e5e5964-a63b-4caf-9654-48535d9a61eb LV@JAX
a52257b5-ab62-4f82-bc7d-348dfc85f3e4 @
c2d17d0e-001f-4ef1-b560-6446830d91ee BUF@LAR
d3df8dc1-fc2b-48c6-9ce9-27175f0eaf5b SF@CHI
9e39445d-7c97-48c0-b976-331fdc0ad14c PHI@DET
22906ec4-07fc-4555-a0a7-fbdf0b72bb2a BAL@NYJ
cd54c138-261c-4995-8593-3dfd8b292b81 false@ATL
80df769c-227b-4d6e-95b0-0cd8063e1aa7 NE@MIA
dbe5a92b-b3d6-4540-97c5-df467df706f4 CLE@CAR
8cee45a2-8213-4897-a77e-c80f2b633346 PIT@CIN
273c7880-67ee-499f-acd7-9f3bccb96637 JAX@WAS
67653008-cb76-4276-affc-185c630f0ad4 IND@HOU
f41715b2-79e6-48bf-b950-a39294437a81 KC@ARZ
5d5e9a33-80b4-4172-9731-0022763d4d81 GB@MIN
48a31fb4-1f85-4867-9ebe-b1e7970e8080 LV@LAC
35fcb682-9e61-4066-b9b5-a834161336bc NYG@TEN
bab85113-360f-4fc4-9ff4-6df0fbc8b9e1 TB@DAL
9112dd1b-4e4f-4790-823e-8ae354ff34da DEN@SEA
7d0709c4-b547-4fa0-95a0-274513a7ee2a @
c878f903-533a-48dd-a60d-74bcc4fc1c54 @
3f54141a-fdf2-40a1-b6f1-c551f35c560e @
1b478545-1c07-4bf2-a411-da11fab5c024 @
 event_id price line
team 
CHC ffb909b1-647e-4e55-8418-08185b91d110 -130 1.5
STL ffb909b1-647e-4e55-8418-08185b91d110 110 -1.5
LAD d4422840-d074-4285-ab8b-0e9a7efe50f4 105 1.5
SF d4422840-d074-4285-ab8b-0e9a7efe50f4 -125 -1.5
COL 582f0d9b-68db-4c00-9cd5-a67b5e158b59 122 -1.5
SD 582f0d9b-68db-4c00-9cd5-a67b5e158b59 -145 1.5
OAK fb5729e1-7c18-4677-bc8e-2945138363ac -110 1.5
LAA fb5729e1-7c18-4677-bc8e-2945138363ac -110 -1.5
answered Aug 4, 2022 at 0:10
\$\endgroup\$
5
  • 1
    \$\begingroup\$ Thank you so much! This definitely gives me better direction. On first run the code did break on KeyError: 'teamData' (as you spoke about) so my first step is going to be diagnosing that. \$\endgroup\$ Commented Aug 4, 2022 at 3:12
  • \$\begingroup\$ Not much diagnosis needed - the workaround I have shown should suffice. \$\endgroup\$ Commented Aug 4, 2022 at 15:15
  • \$\begingroup\$ I actually forgot to mention that I only want to pull data for specific sports leagues at a time. What I did was create GamesDict class that returns the dict filtered for sport/league. \$\endgroup\$ Commented Aug 5, 2022 at 9:27
  • \$\begingroup\$ I actually ran into another problem. I realized I need the startTime that's in events. I'm assuming I can make a few changes to parse through events (for Gamelist) and then markets (for everything else)? \$\endgroup\$ Commented Aug 5, 2022 at 11:33
  • \$\begingroup\$ @Nick you'd need to modify get_match_markets, where event is iterated. \$\endgroup\$ Commented Aug 8, 2022 at 21:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.