Im currently writing my own cached shortened urls where I at the start of application read the database that has all the stored url:uuid
from the database into a global dict value.
When a person enters a url. It checks if its already in the dict. If the url exists in the dict, then we re-use the uuid (instead of creating a new one). If it does not exists. Then we insert it to database and return the generated uuid.
My goal is to have a cached stored shortened urls so that it doesn't take any extra "hits" on the database and actually reuses the existed url:uuid.
from lib.database import Stores, Urls
SHORTENED_URLS: dict = {}
DOMAIN = 'https://helloworld.com/'
# add all uuid to url as a dict that are already stored in db
for i in Urls.get_all_by_store():
SHORTENED_URLS[i.url] = i.uuid
def generate_url(url):
# Check if the URL is in the dict
if url in SHORTENED_URLS:
# Return the uuid from the "cached" dict
return f'{DOMAIN}{SHORTENED_URLS[url]}'
# Else get the uuid from the database
# Database will try to insert, if duplicated then get the uuid
generated = Urls.get_uuid(url)
# Add the url : uuid to the database
SHORTENED_URLS[url] = generated
return f'{DOMAIN}{generated}'
if __name__ == '__main__':
get_url = generate_url('https://www.testing.com')
print(get_url)
DATABASE
# ------------------------------------------------------------------------------- #
# Redirect urls
# ------------------------------------------------------------------------------- #
class Urls(Model):
store_id = IntegerField(column_name='store_id')
url = TextField(column_name='url')
uuid = TextField(column_name='uuid')
store = ForeignKeyField(Stores, backref='urls')
class Meta:
database = postgres_pool
db_table = "urls"
@classmethod
def get_all_by_store(cls):
try:
return cls.select().where((cls.store_id == Stores.store_id))
except peewee.IntegrityError as err:
print(f"{type(err).__name__} at line {err.__traceback__.tb_lineno} of {__file__}, {url}: {err}")
return False
@classmethod
def get_uuid(cls, url):
try:
return cls.select().where((cls.store_id == Stores.store_id) & (cls.url == url)).get().uuid
except Urls.DoesNotExist:
while True:
try:
gen_uuid = ''.join(choices(string.ascii_letters + string.digits, k=8))
cls.insert(
store_id=Stores.store_id,
url=url,
uuid=gen_uuid
).execute()
return gen_uuid
except peewee.IntegrityError as err:
print(f"Duplicated key -> {err}")
postgres_pool.rollback()
sleep(1)
except peewee.IntegrityError as err:
print(f"{type(err).__name__} at line {err.__traceback__.tb_lineno} of {__file__}, {url}: {err}")
return False
My question is, is there anything I can do to improve the shortened url cached?
-
1\$\begingroup\$ I'm just guessing, but this has the look of code that is part of a web service. Why is that relevant? Most web services are deployed to take advantage of multiple processors on a host. In that context, caching URLs in a simple dict won't work as intended. How do you intend to use this code: on a single process or across many? \$\endgroup\$FMc– FMc2021年12月07日 17:29:18 +00:00Commented Dec 7, 2021 at 17:29
1 Answer 1
Re. for i in Urls.get_all_by_store()
, beware putting significant work like this in the global namespace. This bypasses your main check and will incur a delay whenever someone attempts to load your module.
As @FMc warns, this program - unless it's single-process - cannot scale. Caching is a very difficult and complicated thing to get right. As soon as there are multiple processes serving requests for your clients, how are you going to coordinate in-memory cache between them? There are off-the-shelf solutions for this, but broadly, I suspect that caching should not be your only concern when scaling. There's a whole constellation of decisions you need to make around service architecture that will influence which caching solution you need, or indeed if you need one at all.
-
1\$\begingroup\$ Thank for the answer and im sorry for late response! I want to wish you and everyone who have read this answer Merry Christmas! :) You are right as well as @FMc - I do feel like I did overthink and could easily just call the database whenever I actually do need it instead of creating my own service architecture around this when its probably most likely not needed as I wont be as big as bitly or any other webpages that does shortened urls. \$\endgroup\$PythonNewbie– PythonNewbie2021年12月12日 12:41:08 +00:00Commented Dec 12, 2021 at 12:41