We all know that the famous twitch log site OverRustleLogs is getting shut down. So I decided to do some web scraping to download my favourite streamer's logs using BeautifulSoup
. How can make this code run more efficiently?
import requests
import os
import shutil
from bs4 import BeautifulSoup
URL = 'https://overrustlelogs.net/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
names_3 = [i.text.replace('\n ', '/').replace('\n', '') for i in soup[0].findAll('a',{'class':'list-group-item list-group-item-action'})]
for name in names_3:
url_n = 'https://overrustlelogs.net'+str(name)
page = requests.get(url_n)
soup_n = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
names_3_n = [i.text.replace('\n ', '/').replace('\n', '') for i in soup_n[0].findAll('a', {'class':'list-group-item list-group-item-action'})]
for file in names_3_n:
try:
os.makedirs(f'./files{name}{file}')
except:
if FileExistsError:
shutil.rmtree(f'./files{name}{file}')
os.makedirs(f'./files{name}{file}')
url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)
print(url_n1)
page = requests.get(url_n1)
soup_n1 = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
names_3_n1 = [i.text.replace('\n ', '/').replace('\n', '') for i in soup_n1[0].findAll('a', {'class':'list-group-item list-group-item-action'})]
for filename in names_3_n1:
f = open(f'./files{name}{file}{filename}', 'wb')
r = requests.get(url_n1+str(filename))
f.write(r.content)
1 Answer 1
Layout
The long lines makes the code harder to understand. The black program can be used to automatically format the code with more consistency.
Naming
The variable named names_3
is not very descriptive. For example,
if it is a list of websites, something like site_names
would be better.
Also, the _3
suffix is a little misleading. It could mean you always
have 3 of something, or it is a third version of something. The suffix
should be omitted.
The same is true for the variable named names_3_n
. From your usage,
it looks like a list of file names, in which case file_names
would be better.
DRY
This expression is repeated 4 times:
f'./files{name}{file}
You can store it in a variable and use the variable instead.
You created the URL
variable, but you didn't use it everywhere that you could have:
url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)
I think you can also simplify that line above using an f-string instead of concatenation operators.
Documentation
You could add a docstring at the top of the code to summarize its purpose.
Open
It is good practice to use with
to open a file:
with open(f'./files{name}{file}{filename}', 'wb') as input_file: