Python webscrape Amazon Music, generate tracklist

Question 1

So I've been working on a tracklist generator that scrapes from Amazon Music using a url using Python for albums with one artist. I've made uhhh this, I'm really new to this requests and beautifulsoup4 thing. I wonder if I can improve it to make it more efficient.

import requests
from bs4 import BeautifulSoup
Amazon=str(input("Please enter an Amazon music url:"))
r=requests.get(Amazon)
soup = BeautifulSoup(r.text,'html.parser')
name=soup.find_all('a', attrs={'class':'a-link-normal a-color-base TitleLink a-text-bold'}) #find out the names of the track title
time=soup.find_all('td',attrs={'class':'a-text-right a-align-center'}) #find the duration of the track
artist= soup.find('a', attrs={'id':'ProductInfoArtistLink'}) #find the creator of the track, which for now can only take one
for i in range(1,len(name),2):
 print(str(int(i/2+1))+'. '+name[int(i)].text+' - '+ artist.text + ' (' + time[int((i-1)/2)].text[12:16] + ')') 
#first int produces a placeholder number for the track e.g 1., 2.
#second int produces track name, which len shows twice of number of tracks
#artist text gives artist name
#time gives time and puts it in brackets

Question 2

80 character line limit and comments need to be on their own line about the line they're supposed to be commenting about. I'm not listing this as an answer because these two things are literally the only things I can point out. Check out pep8

Question 3

Why do you need to do these shenanigans with twice the length and halving the integer? Is each track included twice in the results?

Question 4

Yeah, the stuff is doubled.

Question 5

Whenever you find yourself writing long (or sometimes even short) comments explaining a single line/a block of lines, you should ask yourself if this was not better placed in a function. Functions can be given a meaningful name and you can add a docstring to them (which can be considerably longer than a comment practically can). It also give you one obvious place to change if, for example, the Amazon Music website is changed at some point.

IMO, here the function names should already be self-explanatory enough, so I did not add any docstrings.

import requests
from bs4 import BeautifulSoup
from itertools import count
def get_soup(url):
 r = requests.get(url)
 r.raise_for_status()
 return BeautifulSoup(r.text, 'lxml')
def track_titles(soup):
 attrs = {'class': 'a-link-normal a-color-base TitleLink a-text-bold'}
 return [a.text for a in soup.find_all('a', attrs=attrs)[::2]]
def track_durations(soup):
 attrs = {'class': 'a-text-right a-align-center'}
 return [td.text.strip() for td in soup.find_all('td', attrs=attrs)]
def track_artist(soup):
 return soup.find('a', attrs={'id':'ProductInfoArtistLink'}).text
if __name__ == "__main__":
 url = input("Please enter an Amazon music url:")
 soup = get_soup(url)
 titles = track_titles(soup)
 durations = track_durations(soup)
 artist = track_artist(soup)
 for i, title, duration in zip(count(1), titles, durations):
 print(f"{i}. {title} - {artist} ({duration})")

Other things I changed:

Added a if __name__ == "__main__": guard to ensure you can import this module from another script.
Used the fact that input already returns a string (in Python 3).
Added r.raise_for_status() so that the code raises an exception if the request does not succeed.
Consistently followed Python's official style-guide, PEP8.
Used the usually faster lxml parser.
Iterate over the elements instead of the indices.
Used the recently introduced f-string to make the formatting easier.
Used str.strip instead of hardcoding indices for the duration.
Made the functions do all the cleanup.

Question 6

i wanted to make something that parsed from Amazon music and generated a tracklist for musicbrainz to help add to their database tracklists easier. A format that goes like this

Question 7

No. Title - Artist (mm:ss) (as in 1. Planet Telex - Radiohead (4:21))

Question 8

@DurianJaykin I have never used Amazon Music. Is it possible to post an example link or would I need an account for it to work properly?

Question 9

amazon.com/End-Time-EP-Jim-Yosef/dp/B01DPY73E8

Question 10

There is no need for an account, just view page source and you should find the html needed.

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2018-11-09 10:49:29Z

Whenever you find yourself writing long (or sometimes even short) comments explaining a single line/a block of lines, you should ask yourself if this was not better placed in a function. Functions can be given a meaningful name and you can add a docstring to them (which can be considerably longer than a comment practically can). It also give you one obvious place to change if, for example, the Amazon Music website is changed at some point.

IMO, here the function names should already be self-explanatory enough, so I did not add any docstrings.

import requests
from bs4 import BeautifulSoup
from itertools import count
def get_soup(url):
 r = requests.get(url)
 r.raise_for_status()
 return BeautifulSoup(r.text, 'lxml')
def track_titles(soup):
 attrs = {'class': 'a-link-normal a-color-base TitleLink a-text-bold'}
 return [a.text for a in soup.find_all('a', attrs=attrs)[::2]]
def track_durations(soup):
 attrs = {'class': 'a-text-right a-align-center'}
 return [td.text.strip() for td in soup.find_all('td', attrs=attrs)]
def track_artist(soup):
 return soup.find('a', attrs={'id':'ProductInfoArtistLink'}).text
if __name__ == "__main__":
 url = input("Please enter an Amazon music url:")
 soup = get_soup(url)
 titles = track_titles(soup)
 durations = track_durations(soup)
 artist = track_artist(soup)
 for i, title, duration in zip(count(1), titles, durations):
 print(f"{i}. {title} - {artist} ({duration})")

Other things I changed:

Added a if __name__ == "__main__": guard to ensure you can import this module from another script.
Used the fact that input already returns a string (in Python 3).
Added r.raise_for_status() so that the code raises an exception if the request does not succeed.
Consistently followed Python's official style-guide, PEP8.
Used the usually faster lxml parser.
Iterate over the elements instead of the indices.
Used the recently introduced f-string to make the formatting easier.
Used str.strip instead of hardcoding indices for the duration.
Made the functions do all the cleanup.

i wanted to make something that parsed from Amazon music and generated a tracklist for musicbrainz to help add to their database tracklists easier. A format that goes like this
No. Title - Artist (mm:ss) (as in 1. Planet Telex - Radiohead (4:21))
@DurianJaykin I have never used Amazon Music. Is it possible to post an example link or would I need an account for it to work properly?
There is no need for an account, just view page source and you should find the html needed.

Stack Exchange Network

Python webscrape Amazon Music, generate tracklist

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python webscrape Amazon Music, generate tracklist

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions