I have been working on a very simple code where I am I open a given file file which has a format of:
proxies.txt
hello:12345
hello:12345:apple:orange
test:4125
test:53953:hello:world
myserver:1337:usertest:passwordtest
what I want to do with these lines is that I want to return a format of e.g.
{'https': 'http://hello:12345:apple@orange'}
{'https': 'http://test:53953:hello@world'}
{'https': 'http://hello:12345'}
As we can see there is two scenarios where a line can be containing ip:port
and ip:port:username:password
.
import random
from typing import List, Dict
class ProxyManager(object):
def __init__(self, proxy_file_path=None):
self.proxies = self.load_proxies(proxy_file_path) if proxy_file_path else {}
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
"""
proxy_file_path contains:
hello:12345 # ip:port
hello:12345:apple:orange # ip:port:username:password
test:4125 # ip:port
test:53 # ip:port
myserver:1337:usertest:passwordtest # ip:port:username:password
"""
with open(proxy_file_path) as proxy_file:
proxies = []
for proxy in proxy_file.readlines():
split_string = proxy.strip('\n').split(':')
proxy_string = f'{split_string[0]}:{split_string[1]}'
# If there is 4 : in the file, that means it contains user:pass
if len(split_string) == 4:
proxy_string = f'{proxy_string}:{split_string[2]}@{split_string[3]}'
proxies.append({'https': f'http://{proxy_string}'})
return proxies
def random_proxy(self) -> Dict[str, str]:
return random.choice(self.proxies)
def full_list(self) -> List[Dict[str, str]]:
return self.proxies
p = ProxyManager("test.txt")
print(p.full_list())
print(p.random_proxy())
The application does work as intended but I wonder if there is anything I could improve, what I am mostly looking at is the function load_proxies
where I believe the splitting could be better and probably more small mistakes I would believe.
Looking forward to get some nice review from you! You guys are awesome!
2 Answers 2
Proxy manager to return proxy format
Good points
The task of your code is very simple, with main good points:
Using of the
typing
moduleUsing static methods
Using
with
-statement to operate on a file
Documentation
Your documentation of load_proxies
does not describe what the function does. You can also use doctest
s:
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
"""
Parse proxies into the format `http://ip:port` or `http://ip:port:user@password`.
>>> load_proxies(['hello:12345', 'hello:12345:apple:orange'])
[{'https': 'http://hello:12345'}, {'https': 'http://hello:12345:apple@orange'}]
>>> load_proxies(['test:4125', 'myserver:1337:usertest:passwordtest'])
[{'https': 'http://test:4125'}, {'https': 'http://myserver:1337:usertest@passwordtest'}]
"""
In the class constructor, note that proxies
will be an empty dictionary if proxy_file_path
is not passed else it will be a list of parsed proxies. Therefore, your declaration of full_list
is wrong if proxy_file_path
is not passed in the constructor. Since dictionaries can't be used inside a dictionary, just make the default value a list:
class ProxyManager(object):
def __init__(self, proxy_file_path=None):
self.proxies = self.load_proxies(proxy_file_path) if proxy_file_path else []
If you want to use sets, take a look at frozen sets and change the return type of full_list
accordingly (and also its name!).
Performance
When you do proxy_file.readlines()
, you're loading all the file into the memory as a list. If you have a small file, it'll work well, but it may raise a MemoryError for large files. So you can replace
for proxy in proxy_file.readlines():
with
for proxy in proxy_file:
Logic
Since 1) it's not an action (such as start
or create_proxy
) and 2) it doesn't receive arguments, you can use properties instead of methods for full_list
and random_proxy
:
@property
def random_proxy(self) -> Dict[str, str]:
return random.choice(self.proxies)
@property
def full_list(self) -> List[Dict[str, str]]:
return self.proxies
This allows you to call them
p = ProxyManager("test.txt")
print(p.full_list)
print(p.random_proxy)
Beware the name of full_list
: what if you need to change its implementation to another data structure (e.g. a set or a generator)? Existing code that called full_list
expecting it to return a list will break. If you want, you can simply name it proxies
.
Since proxy_string
will always start with "http://", you can already initialize the string with it, replacing
proxy_string = f'{split_string[0]}:{split_string[1]}'
# ...
proxies.append({'https': f'http://{proxy_string}'})
with
proxy_string = f'http://{split_string[0]}:{split_string[1]}'
# ...
proxies.append({'https': proxy_string})
This allows you to evaluate one f-string instead of two.
Since you will always use the first two elements of split_string
, you can use tuple unpacking to make the code more readable, replacing
split_string = proxy.strip('\n').split(':')
proxy_string = f'http://{split_string[0]}:{split_string[1]}'
with
ip, port, *args = proxy.strip('\n').split(':')
proxy_string = f'http://{ip}:{port}'
With this, you introduce a new args
variable: a list containing all the remaining strings from the split. So, you can check its length and add them to the proxy_string
, replacing
if len(split_string) == 4:
proxy_string = f'{proxy_string}:{split_string[2]}@{split_string[3]}'
with
if len(args) == 2:
user, password = args
proxy_string += f":{user}@{password}"
Note that you're doing two tasks inside one function: creating proxies strings that can be formatted as 1) ip:port
and 2) ip:post:user:password
. It's up to you, but you could extract the logic into two (private) separate functions:
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
def _load_ip_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}` string into a `http://{ip}:{port}` string.
"""
ip, port = value.split(":")
return f"http://{ip}:{port}"
def _load_ipup_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}:{user}:{password}` string into a `http://{ip}:{port}:{user}@{password}` string.
"""
ip, port, user, password = value.split(":")
return f"http://{ip}:{port}:{user}@{password}"
with open(proxy_file_path) as proxy_file:
proxies = []
for proxy in proxy_file:
if proxy.count(":") == 1:
proxies.append({'https': _load_ip_proxy(proxy.strip('\n'))})
elif proxy.count(":") == 3:
proxies.append({'https': _load_ipup_proxy(proxy.strip('\n'))})
return proxies
A variation of the above code using Python's "switch statements":
@staticmethod
def load_proxies(proxy_file_path) -> List[Dict[str, str]]:
# ...
# Make a dictionary holding the parsing functions according to their `:` counts
parsing_functions = {1: _load_ip_proxy, 3: _load_ipup_proxy}
with open(proxy_file_path) as proxy_file:
proxies = []
# Use `line` instead of `proxy`
for line in proxy_file:
# Define `proxy` here, defining the strip logic in only one place
proxy = line.strip('\n')
# Get the parsing function by the number of `:` in the proxy
parsing_function = parsing_functions.get(proxy.count(":"))
if parsing_function is None:
# Do nothing/raise an exception if proxy does not contain
# neither one nor three `:` characters
continue
proxies.append({'https': parsing_function(proxy)})
return proxies
Refactored code
import random
from typing import List, Dict, Optional
# `class ClassName(object)` is not needed in Python 3.x
# https://stackoverflow.com/a/1203997/9997212
class ProxyManager:
# What is the type of `proxy_file_path`? You can use Optional
# https://docs.python.org/3/library/typing.html#typing.Optional
def __init__(self, proxy_file_path: Optional[str] = None):
# Make `self.proxies` virtually private by adding a `_` prefix
# https://stackoverflow.com/a/1641236/9997212
self._proxies = self.load_proxies(proxy_file_path) if proxy_file_path else []
# Is this method intended to be available to the outside world?
# If not, make it "private" by adding a `_` prefix.
@staticmethod
# What's the type of `proxy_file_path`?
def load_proxies(proxy_file_path: str) -> List[Dict[str, str]]:
"""
Parse proxies into the format `http://ip:port` or `http://ip:port:user@password`.
>>> load_proxies(['hello:12345', 'hello:12345:apple:orange'])
[{'https': 'http://hello:12345'}, {'https': 'http://hello:12345:apple@orange'}]
>>> load_proxies(['test:4125', 'myserver:1337:usertest:passwordtest'])
[{'https': 'http://test:4125'}, {'https': 'http://myserver:1337:usertest@passwordtest'}]
"""
def _load_ip_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}` string into a `http://{ip}:{port}` string.
"""
ip, port = value.split(":")
return f"http://{ip}:{port}"
def _load_ipup_proxy(value: str) -> str:
"""
Transform a `{ip}:{port}:{user}:{password}` string into a `http://{ip}:{port}:{user}@{password}` string.
"""
ip, port, user, password = value.split(":")
return f"http://{ip}:{port}:{user}@{password}"
parsing_functions = {1: _load_ip_proxy, 3: _load_ipup_proxy}
with open(proxy_file_path) as proxy_file:
proxies = []
for line in proxy_file:
proxy = line.strip('\n')
parsing_function = parsing_functions.get(proxy.count(":"))
if parsing_function is None:
continue
proxies.append({'https': parsing_function(proxy)})
return proxies
@property
def random_proxy(self) -> Dict[str, str]:
return random.choice(self._proxies)
@property
def full_list(self) -> List[Dict[str, str]]:
return self._proxies
# If you import this file, this code will ALWAYS be executed, which
# may not be intended. Add a module condition:
# https://stackoverflow.com/q/419163/9997212
if __name__ == '__main__':
p = ProxyManager("test.txt")
print(p.full_list)
print(p.random_proxy)
```
-
\$\begingroup\$ In your refactored code you are missing the line about
parsing_functions
: For example:parsing_functions = {1: _load_ip_proxy, 3: _load_ipup_proxy}
It will throw a:NameError: name 'parsing_functions' is not defined
\$\endgroup\$Fanis Despoudis– Fanis Despoudis2021年08月22日 13:49:18 +00:00Commented Aug 22, 2021 at 13:49 -
\$\begingroup\$ Oh wow! I did not expected such a well code review. I feel very honored to get such a good review and also it was lots of good knowledge that I would never ever taught about at all! I have read everything multiple times and I do like the way you thought about this. Very well done and thanks for the help as well! \$\endgroup\$PythonNewbie– PythonNewbie2021年08月22日 17:00:46 +00:00Commented Aug 22, 2021 at 17:00
General Comments
I will second some of the @enzo comments regarding performance and logic. Additionally you will find the following observations helpful:
Compilation error if text file contains extra empty lines
When I copy paste your code in my editor and run it with the example file that contains more than 2 extra empty lines, I get an error:
Traceback (most recent call last):
File "/uncategorized/proxies.py", line 43, in <module>
p = ProxyManager("test.txt")
File "/uncategorized/proxies.py", line 7, in __init__
self.proxies = self.load_proxies(proxy_file_path) if proxy_file_path else {}
File "/uncategorized/proxies.py", line 26, in load_proxies
proxy_string = f'{split_string[0]}:{split_string[1]}'
IndexError: list index out of range
Returning a mutable proxy list
In the full_list
method or property or whatever you want to use there, you are returing the original proxies
list which is mutable by the client. This means that the client can take this list and modify it and send you for example another list:
def full_list(self) -> List[Dict[str, str]]:
return self.proxies
...
p = ProxyManager("test.txt")
l = p.full_list()
l.pop()
l.pop()
l.pop()
l.pop()
print(p.full_list()) # prints [{'https': 'http://hello:12345'}]
l.pop()
l.append({'https': 'http://goodbye:12345'})
print(p.full_list()) # prints [{'https': 'http://goodbye:12345'}]
This is not good if you want to just return something that the client cannot interfere with. Instead you should either return a frozen set or a copy of the proxies
list.
Method random_proxy
breaks if proxy_file_path
is empty
This breaks in @enzo answer as well. When you create a new instance of ProxyManager
without specifying the proxy_file_path
then calling the random_proxy
subsequently will throw an error:
p = ProxyManager()
print(p.random_proxy()) # Throws KeyError: 0
You should always check if the proxy list has been initialized before usage:
def random_proxy(self) -> Dict[str, str]:
return random.choice(self.proxies) if len(self.proxies) > 0 else {}
Catch exception when opening file
It doesn't hurt to catch an exception when opening files when they do not exist. I will let this answer explain it better.
-
\$\begingroup\$ Hello Fanis! Thank you so much for the help as well as I saw you commented @enzo as well. Both was very well written and I do see the improvements you have mentioned as well. However I was not able to reproduce the
Compilation error if text file contains extra empty lines
unless I did something wrong but I added 4 new lines after the last one and it never threw any error. Also do appreciate the review you gave and I agree. The catch exception is something I just added while I read ur review. You both are awesome! <3 \$\endgroup\$PythonNewbie– PythonNewbie2021年08月22日 17:03:21 +00:00Commented Aug 22, 2021 at 17:03