My Python program parses songs from a website, corrects song titles and artists with the Last.fm API, searches for the spotify uri using the Spotify API, stores all the information in a SQLite database and then uploads it into a Spotify playlist with the Spotify API.
I would like to make the program object oriented and need some advice on how to do that. Some general python advice would also be useful.
I have a separate config.py file with all the needed API variables.
scraper.py
# -*- coding: utf-8 -*-
# import config file
import config
# import libraries
from bs4 import BeautifulSoup
import datetime
import urllib.request as urllib
import sys
import time
import re
import sqlite3
# webdriver libraries
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# spotipy library
import spotipy
import spotipy.util as util
# import pylast
import pylast
# song class holds information about each song
class Song:
artist = None
song = None
spotify_uri = None
def __init__(self, artist, song, spotify_uri):
self.artist = artist
self.song = song
self.spotify_uri = spotify_uri
def printSong(self):
print(self.artist, '-', self.song, ', Uri:', self.spotify_uri)
##------------------------------------------------------------------------------
## Get Date of latest sunday
##
## @return formatted date of last sunday as yyyymmdd
#
def getSundayDate():
today = datetime.date.today()
sun_offset = (today.weekday() - 6) % 7
sunday_of_week = today - datetime.timedelta(days=sun_offset)
sunday_date = sunday_of_week.strftime('%Y%m%d')
return sunday_date
##------------------------------------------------------------------------------
## URL Pattern
##
## https://fm4.orf.at/player/20190120/SSU
## URL pattern:
## /yyyymmdd/SSU
## /20190120/SSU
## SSU is just Sunny Side Up the show from 10am till 1pm
## URL pattern changes ever day, we need to change it every week,
## to only get sundays
##
## @return concatenated URL of website
def getURLPattern():
return 'https://fm4.orf.at/player/' + getSundayDate() + '/SSU'
##------------------------------------------------------------------------------
## Get html source from page specified by page_url
##
## @return html source as beautiful soup object
#
def getHtmlFromPage():
page_URL = getURLPattern()
options = Options()
options.headless = True
profile = webdriver.FirefoxProfile()
profile.set_preference("media.volume_scale", "0.0")
driver = webdriver.Firefox(options=options, firefox_profile=profile)
driver.get(page_URL)
wait = WebDriverWait(driver, 3)
wait.until(EC.presence_of_element_located((By.CLASS_NAME,
'broadcast-items-list')))
time.sleep(1)
soup = BeautifulSoup(driver.page_source, "html.parser")
driver.quit()
return soup
##------------------------------------------------------------------------------
## remove bad characters from list
##
## @param list, list with elements to check
#
def sanitize(strList):
regex_remove = r'([^A-z\s\däöüÄÖÜß-][\\\^]?)'
regex_ft = r'(ft\.?([^\n]\s?\w*)+)'
# check for bad characters
for i in range(len(strList)):
strList[i] = re.sub(regex_remove, "", str(strList[i]))
strList[i] = re.sub(regex_ft, "", strList[i])
##------------------------------------------------------------------------------
## print music
##
## @param lists to print
#
def printMusic(interpreter_list, title_list):
for element in range(len(interpreter_list)):
print(interpreter_list[element] + " : " + title_list[element])
##------------------------------------------------------------------------------
## parse html
##
## @param lists to write results to
#
def parseHtml(interpreter_list, title_list):
soup = getHtmlFromPage()
# find all interpreter in playlist
interpreter = soup.find_all("div", {"class": "interpreter"})
# find all titles in playlist
title = soup.find_all("div", {"class": "title"})
# Check for errors
if (len(interpreter) != len(title)):
raise Exception("The amount of interpreters don't correspond" +
"to the amount of titles.")
if (len(interpreter) == 0):
raise Exception("No FM4 music playlist found in given url")
for element in range(len(interpreter)):
interpreter_list.append(interpreter[element].text)
title_list.append(title[element].text)
##------------------------------------------------------------------------------
## create Token with given credentials
##
## @return authentication token
#
def getToken():
# authetication token
token = util.prompt_for_user_token(config.USERNAME, config.SCOPE, config.CLIENT_ID,
config.CLIENT_SECRET, config.REDIRECT_URI)
if token:
return token
else:
raise Exception("Could not get authentication token from spotify!")
##------------------------------------------------------------------------------
## search track and get spotify uri
##
## @param token, authentication token
## @param interpreter && title, strings containing track info
## @return uri string
#
def getUri(spotify_Obj, interpreter, title):
result = spotify_Obj.search(q=interpreter + ' ' + title)
if (result != None):
if (len(result['tracks']['items']) != 0):
track_id = result['tracks']['items'][0]['uri']
uri = str(track_id)
return uri
##------------------------------------------------------------------------------
## correct artist name and track title with lastFm api
##
## @param1 artist_name, name of artist to correct
## @param2 title_name, title name to correct
## @return track_corrected, corrected Track object
#
def getTrackInfo(artist_name, track_name):
# network authentication
last_Fm = getLastFmNetworkAuth()
# declare artist_name as artist object
artist = last_Fm.get_artist(artist_name)
# correct artist name
artist_corrected_name = artist.get_correction()
track = last_Fm.get_track(artist_corrected_name, track_name)
track_corrected_name = track.get_correction()
trackInfo = pylast.Track(artist_corrected_name, track_corrected_name,
last_Fm)
return trackInfo
##------------------------------------------------------------------------------
## get last fm network authentication
##
## @return network authentication token
#
def getLastFmNetworkAuth():
network = pylast.LastFMNetwork(config.LASTFM_API_KEY, config.LASTFM_API_SECRET)
return network
##------------------------------------------------------------------------------
## parse music items from website, put them into a list, sanitize lists,
## correct artist names and song titles with last.fm API and save list in a
## sqlite database for further usage
##
## @return network authentication token
#
def parseTracksIntoSongClassList(song_list):
# lists containing the Interpreter and title
interpreter_list = []
title_list = []
# fill lists with results
parseHtml(interpreter_list, title_list)
print(datetime.datetime.now(), "Done parsing html")
# remove bad characters from lists
sanitize(interpreter_list)
sanitize(title_list)
# get Token and create spotify object
sp = spotipy.Spotify(getToken())
# correct artist and title names
for element in range(len(interpreter_list)):
track_info = getTrackInfo(interpreter_list[element],
title_list[element])
title = str(track_info.get_name())
artist = str(track_info.get_artist())
if (title != artist):
if (title is not None):
title_list[element] = title
if (artist is not None):
interpreter_list[element] = artist
else:
title_list[element] = title_list[element]
interpreter_list[element] = interpreter_list[element]
# get spotify uri for song
spotify_uri = getUri(sp, interpreter_list[element], title_list[element])
if (spotify_uri != None and len(spotify_uri) != 0):
track_uri = str(spotify_uri)
song_list.append(Song(interpreter_list[element],
title_list[element], track_uri))
print(datetime.datetime.now(), "Done parsing songs")
##------------------------------------------------------------------------------
## insert new songs to database, checks for duplicates and ignores them
##
## @param song_list, list containing songs which need to be inserted
## into database
#
def updateDatabase(song_list):
conn = sqlite3.connect('SongDatabase.db')
c = conn.cursor()
# date to insert into table
today = datetime.date.today()
today.strftime('%Y-%m-%d')
c.execute('''CREATE TABLE IF NOT EXISTS songs
(SongID INTEGER PRIMARY KEY, artist_name TEXT, song_name TEXT,
spotify_uri TEXT, UploadDate TIMESTAMP, Uploaded INTEGER,
UNIQUE(artist_name, song_name, spotify_uri) ON CONFLICT IGNORE)''')
for item in range(len(song_list)):
c.execute('''INSERT INTO songs
(artist_name, song_name, spotify_uri, UploadDate, Uploaded)
VALUES (?,?,?,?,?)''', (song_list[item].artist, song_list[item].song,
song_list[item].spotify_uri, today, 0))
conn.commit()
c.close()
print(datetime.datetime.now(), "Done updating Database")
##------------------------------------------------------------------------------
## copy Uris from song_list into new list
##
## @param song_list, list containing songs which get copied into new list
## @return track_list, list containing all song uris
#
def getUrisList(song_list):
uri_list = []
for song in range(len(song_list)):
uri_list.append(song_list[song].spotify_uri)
print(uri_list)
return uri_list
##------------------------------------------------------------------------------
## Main part of the program
## get html and parse important parts into file
#
if __name__ == '__main__':
# list to fill with corrected songs
song_list = []
# parse songs into song_list
parseTracksIntoSongClassList(song_list)
# insert song_list into database
updateDatabase(song_list)
dataManager.py
# -*- coding: utf-8 -*-
# import config file
import config
import sqlite3
import pandas as pd
# spotipy library
import spotipy
import spotipy.util as util
##------------------------------------------------------------------------------
## create Token with given credentials
##
## @return authentication token
#
def getToken():
# authetication token
token = util.prompt_for_user_token(config.USERNAME, config.SCOPE, config.CLIENT_ID,
config.CLIENT_SECRET, config.REDIRECT_URI)
return token
##------------------------------------------------------------------------------
## insert new songs to database, checks for duplicates and ignores them
##
## @param song_list, list containing songs to be inserted into database
#
def uploadSongsToSpotify():
# declare db name
database_name = 'SongDatabase.db'
# spotify auth token
sp = spotipy.Spotify(getToken())
if sp:
# spotify username
username = config.USERNAME
# spotify ide of playlist
playlist_id = config.PLAYLIST_ID
conn = sqlite3.connect(database_name)
c = conn.cursor()
c.execute("""SELECT spotify_uri FROM songs WHERE (Uploaded = 0)""")
# save query results in tuple
data = c.fetchall()
# save uris in list, for spotipy
uri_list = []
for item in range(len(data)):
uri_list.append(str(data[item][0]))
print(uri_list)
# upload uri_list to spotify
# check for empty list
if (len(uri_list) != 0):
sp.user_playlist_add_tracks(username, playlist_id, uri_list)
# set Uploaded values in database to 1
c.execute("""UPDATE songs SET Uploaded = ? WHERE Uploaded = ?""", (1, 0))
conn.commit()
else:
raise Exception("There aren't any new songs in database, songs were already uploaded")
c.close()
else:
raise Exception("Could not get token from spotify API")
if __name__ == '__main__':
uploadSongsToSpotify()
1 Answer 1
Some simple suggestions after a first look at scraper.py
:
class Song
defines a method calledprintSong
. Don't do this. Instead, use the dundermethod__str__
(and maybe__repr__
) to define a mechanism to "stringify" the song, and then let the normalprint
function handle it:print(str(song)) # or ... print(repr(song))
Your
getSundayDate
computes the date of the appropriate Sunday, then returns it as a string. Instead, return the date object. Let the caller handle formatting the string, since the caller isgetUrlPattern
which does nothing but format strings...Throughout your code you have these giant banner comments introducing your functions. Get rid of them, and put the descriptive text inside a docblock comment. This is why docblocks exist in Python:
No!
##------------------------------------------------------------------------------ ## remove bad characters from list ## ## @param list, list with elements to check # def sanitize(strList):
Yes.
def sanitize(strList): """Remove bad characters from list. @param strList, list with elements to check. """
Don't raise
Exception
objects. ClassException
is the base class of the standard error types. If you have to install a block to catch what you're raising, you are going to have to doexcept Exception:
or maybe justexcept:
, and that's no good. Either create your own exception class, likeclass SongException(Exception): ;pass
or use the standard types (IndexError
,ValueError
, andTypeError
for the most part).In
parseHtml
you do this:for element in range(len(interpreter)): interpreter_list.append(interpreter[element].text) title_list.append(title[element].text)
Written like a true Java programmer! But this isn't Java. So watch this video first: Loop Like a Native by Ned Batchelder. There are a couple of ways to rewrite this loop. You could zip the two source lists together, unpack them into a tuple, and operate on them:
for interp, elt in zip(interpreter, element): interpreter_list.append(interp.text) element_list.append(elt.text)
Or you could use a comprehension to iterate over each list separately to generate the text values, and use the
list.extend
method to implicitly.append
each element of a sequence:interpreter_list.extend((elt.text for elt in interpreter)) element_list.extend((elt.text for elt in element))
Have some of this Python-flavored Cool-aid! It's quite delicious... ;-)
You define
getToken
in both source files. I'm not sure what that's about...
Looking at your dataManager.py
file, it's quite short. I'd suggest that you just roll both files into a single source file.
Your post title asks how you can make your code more object-oriented. I don't think you need to do that, and I don't think you should try. You are writing a program that is very procedural: do this, then do that, next do the other, and finally store things here. That's not a good match for OO code, especially since the elements in question are all different. I encourage you to focus on using simple functions to ensure that you have good separation of concerns and encapsulation. I would also suggest visiting the documentation for Python's "magic methods" (aka dundermethods), and sitting through the Batchelder video I linked. There's a huge amount of Python mastery in that one 30-minute presentation.
Explore related questions
See similar questions with these tags.