This code is for part of my first major contribution to open-source, for the socli project. The purpose is to allow the user to select a site from the stack exchange network to browse and fill two variables with the appropriate urls (used through the rest of the program). Since the stack exchange network is adding new sites all the time, I wanted to make it decently easy to update the stored list. Also, I'm lazy, and didn't want to manually create a dictionary =) So I use a text file with the list of sites copy-pasted into it and the code takes care of the rest.
Approach
The purpose of the code is to go through a text file named sites.txt
filled with urls in a format like
https://codereview.stackexchange.com
https://physics.stackexchange.com
Taking the above, it would then produce a dictionary looking like
{'codereview':'https://codereview.stackexchange.com', 'physics':'https://physics.stackexchange.com'}
It would also print
Sites: codereview, physics
and allow the user to select one of the above, storing the url and a modified version of the url in a variable. I'd like to know if I've done this in the best possible way, and also get general feedback.
Code
import re
siteslst = {}
with open('sites.txt', 'r') as s:
data = s.read()
data = data.splitlines()
for i in data:
keyVal = re.search(r'(\w+)\.',str(i)).group(1)
siteslst[str(keyVal)] = str(i)
testInput = siteslst.keys()
print('Sites: ', ', '.join(testInput), '\n')
validInput = False
while validInput == False:
searchSite = input("which site would you like to search? ")
if searchSite in testInput:
url = siteslst[searchSite]
urlSearch = siteslst[searchSite]+'/search?q='
validInput = True
else:
validInput = False
1 Answer 1
Here is a list of things I would improve:
- variable naming - PEP8 recommends to use lower_case_with_underscores style of variable names
- instead of a regular expression, you can use
urlparse
to parse the domain name - you can use a dictionary comprehension to construct a dictionary of sites
- you don't actually need a boolean flag for your
while
loop - instead initialize an endless loop and break it once the input is valid - instead of calling
.keys()
and passing the "viewkeys" result tostr.join()
- you can pass your dictionary directly
Improved code:
from urllib.parse import urlparse
with open('sites.txt') as input_file:
sites = {
urlparse(url).netloc.split('.')[0]: url
for url in input_file
}
print('Sites: ', ', '.join(sites), '\n')
while True:
search_site = input("Which site would you like to search?\n")
if search_site in sites:
url_search = sites[search_site]+'/search?q='
print(url_search)
break
Further thoughts:
- make the code more modular and extract parts of the code into separate functions - at this moment, it is clear that you have 2 separate high-level things you do in the code - getting the dictionary of sites and getting the user's input
- put the execution code to under the
if __name__ == '__main__':
to avoid the code to be executed on import - you don't actually need to handle the list of sites manually -
StackExchange
provides an API endpoint with that information
-
\$\begingroup\$ I changed two things - one, in the
if
statement, I removedprint(url_search)
because I don't need that, and second, I addedurl = sites[search_site]
. Doing this for some reason caused it to ask the question twice. Why might that be? \$\endgroup\$auden– auden2017年07月11日 00:48:35 +00:00Commented Jul 11, 2017 at 0:48 -
1\$\begingroup\$ @heather make sure that you definitely
break
inside theif search_site in sites:
. Thanks. \$\endgroup\$alecxe– alecxe2017年07月11日 00:54:06 +00:00Commented Jul 11, 2017 at 0:54