So I've been working on a tracklist generator that scrapes from Amazon Music using a url using Python for albums with one artist. I've made uhhh this, I'm really new to this requests and beautifulsoup4 thing. I wonder if I can improve it to make it more efficient.
import requests
from bs4 import BeautifulSoup
Amazon=str(input("Please enter an Amazon music url:"))
r=requests.get(Amazon)
soup = BeautifulSoup(r.text,'html.parser')
name=soup.find_all('a', attrs={'class':'a-link-normal a-color-base TitleLink a-text-bold'}) #find out the names of the track title
time=soup.find_all('td',attrs={'class':'a-text-right a-align-center'}) #find the duration of the track
artist= soup.find('a', attrs={'id':'ProductInfoArtistLink'}) #find the creator of the track, which for now can only take one
for i in range(1,len(name),2):
print(str(int(i/2+1))+'. '+name[int(i)].text+' - '+ artist.text + ' (' + time[int((i-1)/2)].text[12:16] + ')')
#first int produces a placeholder number for the track e.g 1., 2.
#second int produces track name, which len shows twice of number of tracks
#artist text gives artist name
#time gives time and puts it in brackets
-
\$\begingroup\$ 80 character line limit and comments need to be on their own line about the line they're supposed to be commenting about. I'm not listing this as an answer because these two things are literally the only things I can point out. Check out pep8 \$\endgroup\$user106363– user1063632018年11月08日 22:08:10 +00:00Commented Nov 8, 2018 at 22:08
-
1\$\begingroup\$ Why do you need to do these shenanigans with twice the length and halving the integer? Is each track included twice in the results? \$\endgroup\$Graipher– Graipher2018年11月09日 10:43:20 +00:00Commented Nov 9, 2018 at 10:43
-
\$\begingroup\$ Yeah, the stuff is doubled. \$\endgroup\$Durian Jaykin– Durian Jaykin2018年11月10日 12:28:44 +00:00Commented Nov 10, 2018 at 12:28
1 Answer 1
Whenever you find yourself writing long (or sometimes even short) comments explaining a single line/a block of lines, you should ask yourself if this was not better placed in a function. Functions can be given a meaningful name and you can add a docstring to them (which can be considerably longer than a comment practically can). It also give you one obvious place to change if, for example, the Amazon Music website is changed at some point.
IMO, here the function names should already be self-explanatory enough, so I did not add any docstrings.
import requests
from bs4 import BeautifulSoup
from itertools import count
def get_soup(url):
r = requests.get(url)
r.raise_for_status()
return BeautifulSoup(r.text, 'lxml')
def track_titles(soup):
attrs = {'class': 'a-link-normal a-color-base TitleLink a-text-bold'}
return [a.text for a in soup.find_all('a', attrs=attrs)[::2]]
def track_durations(soup):
attrs = {'class': 'a-text-right a-align-center'}
return [td.text.strip() for td in soup.find_all('td', attrs=attrs)]
def track_artist(soup):
return soup.find('a', attrs={'id':'ProductInfoArtistLink'}).text
if __name__ == "__main__":
url = input("Please enter an Amazon music url:")
soup = get_soup(url)
titles = track_titles(soup)
durations = track_durations(soup)
artist = track_artist(soup)
for i, title, duration in zip(count(1), titles, durations):
print(f"{i}. {title} - {artist} ({duration})")
Other things I changed:
- Added a
if __name__ == "__main__":
guard to ensure you can import this module from another script. - Used the fact that
input
already returns a string (in Python 3). - Added
r.raise_for_status()
so that the code raises an exception if the request does not succeed. - Consistently followed Python's official style-guide, PEP8.
- Used the usually faster
lxml
parser. - Iterate over the elements instead of the indices.
- Used the recently introduced
f-string
to make the formatting easier. - Used
str.strip
instead of hardcoding indices for the duration. - Made the functions do all the cleanup.
-
\$\begingroup\$ i wanted to make something that parsed from Amazon music and generated a tracklist for musicbrainz to help add to their database tracklists easier. A format that goes like this \$\endgroup\$Durian Jaykin– Durian Jaykin2018年11月10日 12:26:01 +00:00Commented Nov 10, 2018 at 12:26
-
\$\begingroup\$ No. Title - Artist (mm:ss) (as in 1. Planet Telex - Radiohead (4:21)) \$\endgroup\$Durian Jaykin– Durian Jaykin2018年11月10日 12:27:05 +00:00Commented Nov 10, 2018 at 12:27
-
\$\begingroup\$ @DurianJaykin I have never used Amazon Music. Is it possible to post an example link or would I need an account for it to work properly? \$\endgroup\$Graipher– Graipher2018年11月10日 13:18:49 +00:00Commented Nov 10, 2018 at 13:18
-
1\$\begingroup\$ amazon.com/End-Time-EP-Jim-Yosef/dp/B01DPY73E8 \$\endgroup\$Durian Jaykin– Durian Jaykin2018年11月11日 14:35:15 +00:00Commented Nov 11, 2018 at 14:35
-
1\$\begingroup\$ There is no need for an account, just view page source and you should find the html needed. \$\endgroup\$Durian Jaykin– Durian Jaykin2018年11月11日 14:35:52 +00:00Commented Nov 11, 2018 at 14:35