2
\$\begingroup\$

We all know that the famous twitch log site OverRustleLogs is getting shut down. So I decided to do some web scraping to download my favourite streamer's logs using BeautifulSoup. How can make this code run more efficiently?

import requests
import os
import shutil
from bs4 import BeautifulSoup
URL = 'https://overrustlelogs.net/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
names_3 = [i.text.replace('\n ', '/').replace('\n', '') for i in soup[0].findAll('a',{'class':'list-group-item list-group-item-action'})]
for name in names_3:
 url_n = 'https://overrustlelogs.net'+str(name)
 page = requests.get(url_n)
 soup_n = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
 names_3_n = [i.text.replace('\n ', '/').replace('\n', '') for i in soup_n[0].findAll('a', {'class':'list-group-item list-group-item-action'})]
for file in names_3_n:
 try:
 os.makedirs(f'./files{name}{file}')
 except:
 if FileExistsError:
 shutil.rmtree(f'./files{name}{file}')
 os.makedirs(f'./files{name}{file}')
 url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)
 print(url_n1)
 page = requests.get(url_n1)
 soup_n1 = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
 names_3_n1 = [i.text.replace('\n ', '/').replace('\n', '') for i in soup_n1[0].findAll('a', {'class':'list-group-item list-group-item-action'})]
 for filename in names_3_n1:
 f = open(f'./files{name}{file}{filename}', 'wb')
 r = requests.get(url_n1+str(filename))
 f.write(r.content)
toolic
14.9k5 gold badges29 silver badges206 bronze badges
asked Apr 19, 2020 at 5:05
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Layout

The long lines makes the code harder to understand. The black program can be used to automatically format the code with more consistency.

Naming

The variable named names_3 is not very descriptive. For example, if it is a list of websites, something like site_names would be better. Also, the _3 suffix is a little misleading. It could mean you always have 3 of something, or it is a third version of something. The suffix should be omitted.

The same is true for the variable named names_3_n. From your usage, it looks like a list of file names, in which case file_names would be better.

DRY

This expression is repeated 4 times:

f'./files{name}{file}

You can store it in a variable and use the variable instead.

You created the URL variable, but you didn't use it everywhere that you could have:

url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)

I think you can also simplify that line above using an f-string instead of concatenation operators.

Documentation

You could add a docstring at the top of the code to summarize its purpose.

Open

It is good practice to use with to open a file:

with open(f'./files{name}{file}{filename}', 'wb') as input_file:
answered Jan 27 at 11:39
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.