Selecting site to browse in python

Question 1

This code is for part of my first major contribution to open-source, for the socli project. The purpose is to allow the user to select a site from the stack exchange network to browse and fill two variables with the appropriate urls (used through the rest of the program). Since the stack exchange network is adding new sites all the time, I wanted to make it decently easy to update the stored list. Also, I'm lazy, and didn't want to manually create a dictionary =) So I use a text file with the list of sites copy-pasted into it and the code takes care of the rest.

Approach

The purpose of the code is to go through a text file named sites.txt filled with urls in a format like

https://codereview.stackexchange.com
https://physics.stackexchange.com

Taking the above, it would then produce a dictionary looking like

{'codereview':'https://codereview.stackexchange.com', 'physics':'https://physics.stackexchange.com'}

It would also print

Sites: codereview, physics

and allow the user to select one of the above, storing the url and a modified version of the url in a variable. I'd like to know if I've done this in the best possible way, and also get general feedback.

Code

import re
siteslst = {}
with open('sites.txt', 'r') as s:
 data = s.read()
 data = data.splitlines()
 for i in data:
 keyVal = re.search(r'(\w+)\.',str(i)).group(1)
 siteslst[str(keyVal)] = str(i)
testInput = siteslst.keys()
print('Sites: ', ', '.join(testInput), '\n')
validInput = False
while validInput == False:
 searchSite = input("which site would you like to search? ")
 if searchSite in testInput:
 url = siteslst[searchSite]
 urlSearch = siteslst[searchSite]+'/search?q='
 validInput = True
 else:
 validInput = False

Question 2

Here is a list of things I would improve:

variable naming - PEP8 recommends to use lower_case_with_underscores style of variable names
instead of a regular expression, you can use urlparse to parse the domain name
you can use a dictionary comprehension to construct a dictionary of sites
you don't actually need a boolean flag for your while loop - instead initialize an endless loop and break it once the input is valid
instead of calling .keys() and passing the "viewkeys" result to str.join() - you can pass your dictionary directly

Improved code:

from urllib.parse import urlparse
with open('sites.txt') as input_file:
 sites = {
 urlparse(url).netloc.split('.')[0]: url
 for url in input_file
 }
print('Sites: ', ', '.join(sites), '\n')
while True:
 search_site = input("Which site would you like to search?\n")
 if search_site in sites:
 url_search = sites[search_site]+'/search?q='
 print(url_search)
 break

Further thoughts:

make the code more modular and extract parts of the code into separate functions - at this moment, it is clear that you have 2 separate high-level things you do in the code - getting the dictionary of sites and getting the user's input
put the execution code to under the if __name__ == '__main__': to avoid the code to be executed on import
you don't actually need to handle the list of sites manually - StackExchange provides an API endpoint with that information

Question 3

I changed two things - one, in the if statement, I removed print(url_search) because I don't need that, and second, I added url = sites[search_site]. Doing this for some reason caused it to ask the question twice. Why might that be?

Question 4

@heather make sure that you definitely break inside the if search_site in sites:. Thanks.

alecxe alecxealecxe 17.5k8 gold badges52 silver badges93 bronze badges · Accepted Answer · 2017-07-10 20:40:35Z

Here is a list of things I would improve:

variable naming - PEP8 recommends to use lower_case_with_underscores style of variable names
instead of a regular expression, you can use urlparse to parse the domain name
you can use a dictionary comprehension to construct a dictionary of sites
you don't actually need a boolean flag for your while loop - instead initialize an endless loop and break it once the input is valid
instead of calling .keys() and passing the "viewkeys" result to str.join() - you can pass your dictionary directly

Improved code:

from urllib.parse import urlparse
with open('sites.txt') as input_file:
 sites = {
 urlparse(url).netloc.split('.')[0]: url
 for url in input_file
 }
print('Sites: ', ', '.join(sites), '\n')
while True:
 search_site = input("Which site would you like to search?\n")
 if search_site in sites:
 url_search = sites[search_site]+'/search?q='
 print(url_search)
 break

Further thoughts:

make the code more modular and extract parts of the code into separate functions - at this moment, it is clear that you have 2 separate high-level things you do in the code - getting the dictionary of sites and getting the user's input
put the execution code to under the if __name__ == '__main__': to avoid the code to be executed on import
you don't actually need to handle the list of sites manually - StackExchange provides an API endpoint with that information

I changed two things - one, in the if statement, I removed print(url_search) because I don't need that, and second, I added url = sites[search_site]. Doing this for some reason caused it to ask the question twice. Why might that be?
@heather make sure that you definitely break inside the if search_site in sites:. Thanks.

Stack Exchange Network

Selecting site to browse in python

Approach

Code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Selecting site to browse in python

Approach

Code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions