I am making a program that parses profiles from a website. Some profiles are "public" and others are "private" with limited information. Public profiles contain a lot more data and have ~15 more attributes.
Should I have one Profile
class and set all unobtainable attributes to None
? Or should I have 2 different classes and simply exclude unobtainable attributes if a profile is private?
Option #1
import requests
from bs4 import BeautifulSoup
from dataclasses import dataclass
@dataclass
class Profile:
url: str
is_private: bool
summary: str = None
level: int = None
class Parser:
def __init__(self):
self.session = requests.Session()
def get_profile(self, url):
resp = self.session.get(url)
soup = BeautifulSoup(resp.text, "html.parser")
profile_is_private = bool(soup.find("div", {"class": "profile_private_info"}))
if profile_is_private:
return Profile(url=url, is_private=profile_is_private)
summary=soup.find("div", {"class": "profile_summary"}).text,
level=int(soup.find("span", {"class": "friendPlayerLevelNum"}).text)
return Profile(url=url, is_private=profile_is_private, summary=summary, level=level)
parser = Parser()
public_profile = parser.get_profile("https://steamcommunity.com/id/afarnsworth")
print(public_profile)
private_profile = parser.get_profile("https://steamcommunity.com/id/private")
print(private_profile)
Option #2
import requests
from bs4 import BeautifulSoup
from dataclasses import dataclass
@dataclass
class BaseProfile:
url: str
is_private: bool
@dataclass
class ExtendedProfile(BaseProfile):
summary: str = None
level: int = None
class Parser:
def __init__(self):
self.session = requests.Session()
def get_profile(self, url):
resp = self.session.get(url)
soup = BeautifulSoup(resp.text, "html.parser")
profile_is_private = bool(soup.find("div", {"class": "profile_private_info"}))
if profile_is_private:
return BaseProfile(url=url, is_private=profile_is_private)
summary=soup.find("div", {"class": "profile_summary"}).text,
level=int(soup.find("span", class_="friendPlayerLevelNum").text)
return ExtendedProfile(url=url, is_private=profile_is_private, summary=summary, level=level)
parser = Parser()
public_profile = parser.get_profile("https://steamcommunity.com/id/afarnsworth")
print(public_profile)
private_profile = parser.get_profile("https://steamcommunity.com/id/private")
print(private_profile)
Both options seem to have downsides.
In option #1, the function returns a Profile
object that might have over 15 attributes set to None
because the profile was private and they couldn't be obtained.
In option #2, the function returns an entirely different object each time. So the user would have to manually check which object the function returned--either BaseProfile
or ExtendedProfile
Which is better practice for a public API? Or are there better alternatives for these kind of designs.
1 Answer 1
Definitely avoid #1. Something something billion dollar mistake: if you don't need nulls, eliminate them.
Option #2 is okay-ish. Why are summary
and level
still nullable? First, as you have them, their typehint is incorrect because it would be Optional[str]
instead; but I don't see how those would ever be None
(that's the whole reason you're using #2).
Don't bool(soup.find)
. Explicitly write soup.find() is not None
.
get_profile
should be hinted as get_profile(self, url: str) -> Union[BaseProfile, ExtendedProfile]
- or use a pipe if you're fancy.
-
\$\begingroup\$ Thank you! The only thing is that some attributes in
ExtendedProfile
might not be always present. For example, maybe a user has an empty summary. But I suppose I could just do an empty string then. \$\endgroup\$angel– angel2023年02月14日 01:53:56 +00:00Commented Feb 14, 2023 at 1:53 -
\$\begingroup\$ For that,
Optional
s withNone
s are probably unavoidable \$\endgroup\$Reinderien– Reinderien2023年02月14日 01:59:25 +00:00Commented Feb 14, 2023 at 1:59
Explore related questions
See similar questions with these tags.
get_friends
andget_comments
but instead decided to separate the profile object from the functions. Also yeah I was going to include the entire code but just tried to make it less verbose. Theis_private
just parses the HTML and checks the profile's privacy status and stores it in a bool. \$\endgroup\$