Cached shortened urls

Question 1

Im currently writing my own cached shortened urls where I at the start of application read the database that has all the stored url:uuid from the database into a global dict value.

When a person enters a url. It checks if its already in the dict. If the url exists in the dict, then we re-use the uuid (instead of creating a new one). If it does not exists. Then we insert it to database and return the generated uuid.

My goal is to have a cached stored shortened urls so that it doesn't take any extra "hits" on the database and actually reuses the existed url:uuid.

from lib.database import Stores, Urls
SHORTENED_URLS: dict = {}
DOMAIN = 'https://helloworld.com/'
# add all uuid to url as a dict that are already stored in db
for i in Urls.get_all_by_store():
 SHORTENED_URLS[i.url] = i.uuid
def generate_url(url):
 # Check if the URL is in the dict
 if url in SHORTENED_URLS:
 # Return the uuid from the "cached" dict
 return f'{DOMAIN}{SHORTENED_URLS[url]}'
 # Else get the uuid from the database
 # Database will try to insert, if duplicated then get the uuid
 generated = Urls.get_uuid(url)
 # Add the url : uuid to the database
 SHORTENED_URLS[url] = generated
 return f'{DOMAIN}{generated}'
if __name__ == '__main__':
 get_url = generate_url('https://www.testing.com')
 print(get_url)

DATABASE

# ------------------------------------------------------------------------------- #
# Redirect urls
# ------------------------------------------------------------------------------- #
class Urls(Model):
 store_id = IntegerField(column_name='store_id')
 url = TextField(column_name='url')
 uuid = TextField(column_name='uuid')
 store = ForeignKeyField(Stores, backref='urls')
 class Meta:
 database = postgres_pool
 db_table = "urls"
 @classmethod
 def get_all_by_store(cls):
 try:
 return cls.select().where((cls.store_id == Stores.store_id))
 except peewee.IntegrityError as err:
 print(f"{type(err).__name__} at line {err.__traceback__.tb_lineno} of {__file__}, {url}: {err}")
 return False
 @classmethod
 def get_uuid(cls, url):
 try:
 return cls.select().where((cls.store_id == Stores.store_id) & (cls.url == url)).get().uuid
 except Urls.DoesNotExist:
 while True:
 try:
 gen_uuid = ''.join(choices(string.ascii_letters + string.digits, k=8))
 cls.insert(
 store_id=Stores.store_id,
 url=url,
 uuid=gen_uuid
 ).execute()
 return gen_uuid
 except peewee.IntegrityError as err:
 print(f"Duplicated key -> {err}")
 postgres_pool.rollback()
 sleep(1)
 except peewee.IntegrityError as err:
 print(f"{type(err).__name__} at line {err.__traceback__.tb_lineno} of {__file__}, {url}: {err}")
 return False

My question is, is there anything I can do to improve the shortened url cached?

Question 2

I'm just guessing, but this has the look of code that is part of a web service. Why is that relevant? Most web services are deployed to take advantage of multiple processors on a host. In that context, caching URLs in a simple dict won't work as intended. How do you intend to use this code: on a single process or across many?

Question 3

Re. for i in Urls.get_all_by_store(), beware putting significant work like this in the global namespace. This bypasses your main check and will incur a delay whenever someone attempts to load your module.

As @FMc warns, this program - unless it's single-process - cannot scale. Caching is a very difficult and complicated thing to get right. As soon as there are multiple processes serving requests for your clients, how are you going to coordinate in-memory cache between them? There are off-the-shelf solutions for this, but broadly, I suspect that caching should not be your only concern when scaling. There's a whole constellation of decisions you need to make around service architecture that will influence which caching solution you need, or indeed if you need one at all.

Question 4

Thank for the answer and im sorry for late response! I want to wish you and everyone who have read this answer Merry Christmas! :) You are right as well as @FMc - I do feel like I did overthink and could easily just call the database whenever I actually do need it instead of creating my own service architecture around this when its probably most likely not needed as I wont be as big as bitly or any other webpages that does shortened urls.

Reinderien Reinderien 71k5 gold badges76 silver badges256 bronze badges · Accepted Answer · 2021-12-08 14:42:49Z

Re. for i in Urls.get_all_by_store(), beware putting significant work like this in the global namespace. This bypasses your main check and will incur a delay whenever someone attempts to load your module.

As @FMc warns, this program - unless it's single-process - cannot scale. Caching is a very difficult and complicated thing to get right. As soon as there are multiple processes serving requests for your clients, how are you going to coordinate in-memory cache between them? There are off-the-shelf solutions for this, but broadly, I suspect that caching should not be your only concern when scaling. There's a whole constellation of decisions you need to make around service architecture that will influence which caching solution you need, or indeed if you need one at all.

Thank for the answer and im sorry for late response! I want to wish you and everyone who have read this answer Merry Christmas! :) You are right as well as @FMc - I do feel like I did overthink and could easily just call the database whenever I actually do need it instead of creating my own service architecture around this when its probably most likely not needed as I wont be as big as bitly or any other webpages that does shortened urls.

Stack Exchange Network

Cached shortened urls

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Cached shortened urls

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions