4
\$\begingroup\$

My Python program parses songs from a website, corrects song titles and artists with the Last.fm API, searches for the spotify uri using the Spotify API, stores all the information in a SQLite database and then uploads it into a Spotify playlist with the Spotify API.

I would like to make the program object oriented and need some advice on how to do that. Some general python advice would also be useful.

I have a separate config.py file with all the needed API variables.

scraper.py

# -*- coding: utf-8 -*-
# import config file
import config
# import libraries
from bs4 import BeautifulSoup
import datetime
import urllib.request as urllib
import sys
import time
import re
import sqlite3
# webdriver libraries
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# spotipy library
import spotipy
import spotipy.util as util
# import pylast
import pylast 
# song class holds information about each song
class Song:
 artist = None
 song = None
 spotify_uri = None
 def __init__(self, artist, song, spotify_uri):
 self.artist = artist
 self.song = song
 self.spotify_uri = spotify_uri
 def printSong(self):
 print(self.artist, '-', self.song, ', Uri:', self.spotify_uri)
##------------------------------------------------------------------------------
## Get Date of latest sunday
## 
## @return formatted date of last sunday as yyyymmdd
#
def getSundayDate():
 today = datetime.date.today()
 sun_offset = (today.weekday() - 6) % 7
 sunday_of_week = today - datetime.timedelta(days=sun_offset)
 sunday_date = sunday_of_week.strftime('%Y%m%d')
 return sunday_date
##------------------------------------------------------------------------------
## URL Pattern
##
## https://fm4.orf.at/player/20190120/SSU
## URL pattern:
## /yyyymmdd/SSU
## /20190120/SSU
## SSU is just Sunny Side Up the show from 10am till 1pm
## URL pattern changes ever day, we need to change it every week, 
## to only get sundays
## 
## @return concatenated URL of website
def getURLPattern():
 return 'https://fm4.orf.at/player/' + getSundayDate() + '/SSU'
##------------------------------------------------------------------------------
## Get html source from page specified by page_url
## 
## @return html source as beautiful soup object
#
def getHtmlFromPage():
 page_URL = getURLPattern()
 options = Options()
 options.headless = True
 profile = webdriver.FirefoxProfile()
 profile.set_preference("media.volume_scale", "0.0")
 driver = webdriver.Firefox(options=options, firefox_profile=profile)
 driver.get(page_URL)
 wait = WebDriverWait(driver, 3)
 wait.until(EC.presence_of_element_located((By.CLASS_NAME, 
 'broadcast-items-list')))
 time.sleep(1)
 soup = BeautifulSoup(driver.page_source, "html.parser")
 driver.quit()
 return soup
##------------------------------------------------------------------------------
## remove bad characters from list 
## 
## @param list, list with elements to check
#
def sanitize(strList):
 regex_remove = r'([^A-z\s\däöüÄÖÜß-][\\\^]?)'
 regex_ft = r'(ft\.?([^\n]\s?\w*)+)'
 # check for bad characters
 for i in range(len(strList)):
 strList[i] = re.sub(regex_remove, "", str(strList[i]))
 strList[i] = re.sub(regex_ft, "", strList[i])
##------------------------------------------------------------------------------
## print music 
## 
## @param lists to print
#
def printMusic(interpreter_list, title_list):
 for element in range(len(interpreter_list)):
 print(interpreter_list[element] + " : " + title_list[element])
##------------------------------------------------------------------------------
## parse html
## 
## @param lists to write results to
#
def parseHtml(interpreter_list, title_list):
 soup = getHtmlFromPage()
 # find all interpreter in playlist
 interpreter = soup.find_all("div", {"class": "interpreter"})
 # find all titles in playlist
 title = soup.find_all("div", {"class": "title"})
 # Check for errors
 if (len(interpreter) != len(title)):
 raise Exception("The amount of interpreters don't correspond" +
 "to the amount of titles.")
 if (len(interpreter) == 0):
 raise Exception("No FM4 music playlist found in given url") 
 for element in range(len(interpreter)):
 interpreter_list.append(interpreter[element].text)
 title_list.append(title[element].text)
##------------------------------------------------------------------------------
## create Token with given credentials
## 
## @return authentication token
#
def getToken():
 # authetication token
 token = util.prompt_for_user_token(config.USERNAME, config.SCOPE, config.CLIENT_ID, 
 config.CLIENT_SECRET, config.REDIRECT_URI)
 if token:
 return token
 else:
 raise Exception("Could not get authentication token from spotify!")
##------------------------------------------------------------------------------
## search track and get spotify uri
## 
## @param token, authentication token
## @param interpreter && title, strings containing track info
## @return uri string
#
def getUri(spotify_Obj, interpreter, title):
 result = spotify_Obj.search(q=interpreter + ' ' + title)
 if (result != None):
 if (len(result['tracks']['items']) != 0):
 track_id = result['tracks']['items'][0]['uri']
 uri = str(track_id)
 return uri
##------------------------------------------------------------------------------
## correct artist name and track title with lastFm api
## 
## @param1 artist_name, name of artist to correct
## @param2 title_name, title name to correct
## @return track_corrected, corrected Track object
#
def getTrackInfo(artist_name, track_name):
 # network authentication
 last_Fm = getLastFmNetworkAuth()
 # declare artist_name as artist object
 artist = last_Fm.get_artist(artist_name)
 # correct artist name
 artist_corrected_name = artist.get_correction()
 track = last_Fm.get_track(artist_corrected_name, track_name)
 track_corrected_name = track.get_correction()
 trackInfo = pylast.Track(artist_corrected_name, track_corrected_name, 
 last_Fm)
 return trackInfo
##------------------------------------------------------------------------------
## get last fm network authentication
## 
## @return network authentication token
#
def getLastFmNetworkAuth():
 network = pylast.LastFMNetwork(config.LASTFM_API_KEY, config.LASTFM_API_SECRET)
 return network
##------------------------------------------------------------------------------
## parse music items from website, put them into a list, sanitize lists, 
## correct artist names and song titles with last.fm API and save list in a 
## sqlite database for further usage
## 
## @return network authentication token
#
def parseTracksIntoSongClassList(song_list):
 # lists containing the Interpreter and title
 interpreter_list = []
 title_list = []
 # fill lists with results
 parseHtml(interpreter_list, title_list)
 print(datetime.datetime.now(), "Done parsing html")
 # remove bad characters from lists
 sanitize(interpreter_list)
 sanitize(title_list)
 # get Token and create spotify object
 sp = spotipy.Spotify(getToken())
 # correct artist and title names
 for element in range(len(interpreter_list)):
 track_info = getTrackInfo(interpreter_list[element], 
 title_list[element])
 title = str(track_info.get_name())
 artist = str(track_info.get_artist())
 if (title != artist):
 if (title is not None):
 title_list[element] = title
 if (artist is not None):
 interpreter_list[element] = artist
 else:
 title_list[element] = title_list[element]
 interpreter_list[element] = interpreter_list[element]
 # get spotify uri for song
 spotify_uri = getUri(sp, interpreter_list[element], title_list[element]) 
 if (spotify_uri != None and len(spotify_uri) != 0):
 track_uri = str(spotify_uri)
 song_list.append(Song(interpreter_list[element], 
 title_list[element], track_uri))
 print(datetime.datetime.now(), "Done parsing songs")
##------------------------------------------------------------------------------
## insert new songs to database, checks for duplicates and ignores them
## 
## @param song_list, list containing songs which need to be inserted 
## into database
#
def updateDatabase(song_list):
 conn = sqlite3.connect('SongDatabase.db')
 c = conn.cursor()
 # date to insert into table
 today = datetime.date.today() 
 today.strftime('%Y-%m-%d')
 c.execute('''CREATE TABLE IF NOT EXISTS songs 
 (SongID INTEGER PRIMARY KEY, artist_name TEXT, song_name TEXT, 
 spotify_uri TEXT, UploadDate TIMESTAMP, Uploaded INTEGER, 
 UNIQUE(artist_name, song_name, spotify_uri) ON CONFLICT IGNORE)''')
 for item in range(len(song_list)):
 c.execute('''INSERT INTO songs 
 (artist_name, song_name, spotify_uri, UploadDate, Uploaded) 
 VALUES (?,?,?,?,?)''', (song_list[item].artist, song_list[item].song, 
 song_list[item].spotify_uri, today, 0))
 conn.commit()
 c.close()
 print(datetime.datetime.now(), "Done updating Database")
##------------------------------------------------------------------------------
## copy Uris from song_list into new list
##
## @param song_list, list containing songs which get copied into new list
## @return track_list, list containing all song uris
#
def getUrisList(song_list):
 uri_list = []
 for song in range(len(song_list)):
 uri_list.append(song_list[song].spotify_uri)
 print(uri_list)
 return uri_list
##------------------------------------------------------------------------------
## Main part of the program
## get html and parse important parts into file
#
if __name__ == '__main__':
 # list to fill with corrected songs 
 song_list = []
 # parse songs into song_list
 parseTracksIntoSongClassList(song_list)
 # insert song_list into database
 updateDatabase(song_list)

dataManager.py

# -*- coding: utf-8 -*-
# import config file
import config
import sqlite3
import pandas as pd
# spotipy library
import spotipy
import spotipy.util as util
##------------------------------------------------------------------------------
## create Token with given credentials
## 
## @return authentication token
#
def getToken():
 # authetication token
 token = util.prompt_for_user_token(config.USERNAME, config.SCOPE, config.CLIENT_ID, 
 config.CLIENT_SECRET, config.REDIRECT_URI)
 return token
##------------------------------------------------------------------------------
## insert new songs to database, checks for duplicates and ignores them
## 
## @param song_list, list containing songs to be inserted into database
#
def uploadSongsToSpotify():
 # declare db name
 database_name = 'SongDatabase.db'
 # spotify auth token
 sp = spotipy.Spotify(getToken())
 if sp:
 # spotify username
 username = config.USERNAME
 # spotify ide of playlist
 playlist_id = config.PLAYLIST_ID
 conn = sqlite3.connect(database_name)
 c = conn.cursor()
 c.execute("""SELECT spotify_uri FROM songs WHERE (Uploaded = 0)""")
 # save query results in tuple
 data = c.fetchall()
 # save uris in list, for spotipy
 uri_list = []
 for item in range(len(data)):
 uri_list.append(str(data[item][0]))
 print(uri_list)
 # upload uri_list to spotify
 # check for empty list
 if (len(uri_list) != 0):
 sp.user_playlist_add_tracks(username, playlist_id, uri_list)
 # set Uploaded values in database to 1
 c.execute("""UPDATE songs SET Uploaded = ? WHERE Uploaded = ?""", (1, 0))
 conn.commit()
 else:
 raise Exception("There aren't any new songs in database, songs were already uploaded")
 c.close()
 else:
 raise Exception("Could not get token from spotify API")
if __name__ == '__main__':
 uploadSongsToSpotify()
200_success
146k22 gold badges190 silver badges479 bronze badges
asked Jan 26, 2019 at 16:43
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Some simple suggestions after a first look at scraper.py:

  1. class Song defines a method called printSong. Don't do this. Instead, use the dundermethod __str__ (and maybe __repr__) to define a mechanism to "stringify" the song, and then let the normal print function handle it:

    print(str(song)) # or ... print(repr(song))

  2. Your getSundayDate computes the date of the appropriate Sunday, then returns it as a string. Instead, return the date object. Let the caller handle formatting the string, since the caller is getUrlPattern which does nothing but format strings...

  3. Throughout your code you have these giant banner comments introducing your functions. Get rid of them, and put the descriptive text inside a docblock comment. This is why docblocks exist in Python:

    No!

    ##------------------------------------------------------------------------------
    ## remove bad characters from list 
    ## 
    ## @param list, list with elements to check
    #
    def sanitize(strList):
    

    Yes.

    def sanitize(strList):
     """Remove bad characters from list.
     @param strList, list with elements to check.
     """
    
  4. Don't raise Exception objects. Class Exception is the base class of the standard error types. If you have to install a block to catch what you're raising, you are going to have to do except Exception: or maybe just except:, and that's no good. Either create your own exception class, like class SongException(Exception): ;pass or use the standard types (IndexError, ValueError, and TypeError for the most part).

  5. In parseHtml you do this:

    for element in range(len(interpreter)):
     interpreter_list.append(interpreter[element].text)
     title_list.append(title[element].text)
    

    Written like a true Java programmer! But this isn't Java. So watch this video first: Loop Like a Native by Ned Batchelder. There are a couple of ways to rewrite this loop. You could zip the two source lists together, unpack them into a tuple, and operate on them:

    for interp, elt in zip(interpreter, element):
     interpreter_list.append(interp.text)
     element_list.append(elt.text)
    

    Or you could use a comprehension to iterate over each list separately to generate the text values, and use the list.extend method to implicitly .append each element of a sequence:

    interpreter_list.extend((elt.text for elt in interpreter))
    element_list.extend((elt.text for elt in element))
    

    Have some of this Python-flavored Cool-aid! It's quite delicious... ;-)

  6. You define getToken in both source files. I'm not sure what that's about...

Looking at your dataManager.py file, it's quite short. I'd suggest that you just roll both files into a single source file.

Your post title asks how you can make your code more object-oriented. I don't think you need to do that, and I don't think you should try. You are writing a program that is very procedural: do this, then do that, next do the other, and finally store things here. That's not a good match for OO code, especially since the elements in question are all different. I encourage you to focus on using simple functions to ensure that you have good separation of concerns and encapsulation. I would also suggest visiting the documentation for Python's "magic methods" (aka dundermethods), and sitting through the Batchelder video I linked. There's a huge amount of Python mastery in that one 30-minute presentation.

answered Jan 26, 2019 at 17:49
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.