Web scraping code to import logs from a website which is about to die

Question 1

We all know that the famous twitch log site OverRustleLogs is getting shut down. So I decided to do some web scraping to download my favourite streamer's logs using BeautifulSoup. How can make this code run more efficiently?

import requests
import os
import shutil
from bs4 import BeautifulSoup
URL = 'https://overrustlelogs.net/'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
names_3 = [i.text.replace('\n ', '/').replace('\n', '') for i in soup[0].findAll('a',{'class':'list-group-item list-group-item-action'})]
for name in names_3:
 url_n = 'https://overrustlelogs.net'+str(name)
 page = requests.get(url_n)
 soup_n = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
 names_3_n = [i.text.replace('\n ', '/').replace('\n', '') for i in soup_n[0].findAll('a', {'class':'list-group-item list-group-item-action'})]
for file in names_3_n:
 try:
 os.makedirs(f'./files{name}{file}')
 except:
 if FileExistsError:
 shutil.rmtree(f'./files{name}{file}')
 os.makedirs(f'./files{name}{file}')
 url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)
 print(url_n1)
 page = requests.get(url_n1)
 soup_n1 = BeautifulSoup(page.content, 'html.parser').findAll('div', {'class': 'list-group'})
 names_3_n1 = [i.text.replace('\n ', '/').replace('\n', '') for i in soup_n1[0].findAll('a', {'class':'list-group-item list-group-item-action'})]
 for filename in names_3_n1:
 f = open(f'./files{name}{file}{filename}', 'wb')
 r = requests.get(url_n1+str(filename))
 f.write(r.content)

Question 2

Layout

The long lines makes the code harder to understand. The black program can be used to automatically format the code with more consistency.

Naming

The variable named names_3 is not very descriptive. For example, if it is a list of websites, something like site_names would be better. Also, the _3 suffix is a little misleading. It could mean you always have 3 of something, or it is a third version of something. The suffix should be omitted.

The same is true for the variable named names_3_n. From your usage, it looks like a list of file names, in which case file_names would be better.

DRY

This expression is repeated 4 times:

f'./files{name}{file}

You can store it in a variable and use the variable instead.

You created the URL variable, but you didn't use it everywhere that you could have:

url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)

I think you can also simplify that line above using an f-string instead of concatenation operators.

Documentation

You could add a docstring at the top of the code to summarize its purpose.

Open

It is good practice to use with to open a file:

with open(f'./files{name}{file}{filename}', 'wb') as input_file:

toolic toolic 14.9k5 gold badges29 silver badges206 bronze badges · Answer 1 · 2025-01-27 11:39:36Z

Layout

The long lines makes the code harder to understand. The black program can be used to automatically format the code with more consistency.

Naming

The variable named names_3 is not very descriptive. For example, if it is a list of websites, something like site_names would be better. Also, the _3 suffix is a little misleading. It could mean you always have 3 of something, or it is a third version of something. The suffix should be omitted.

The same is true for the variable named names_3_n. From your usage, it looks like a list of file names, in which case file_names would be better.

DRY

This expression is repeated 4 times:

f'./files{name}{file}

You can store it in a variable and use the variable instead.

You created the URL variable, but you didn't use it everywhere that you could have:

url_n1 = 'https://overrustlelogs.net'+str(name)+str(file)

I think you can also simplify that line above using an f-string instead of concatenation operators.

Documentation

You could add a docstring at the top of the code to summarize its purpose.

Open

It is good practice to use with to open a file:

with open(f'./files{name}{file}{filename}', 'wb') as input_file:

Stack Exchange Network

Web scraping code to import logs from a website which is about to die

1 Answer 1

Layout

Naming

DRY

Documentation

Open

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

1 Answer 1

Layout

Naming

Documentation

Open

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related