8
\$\begingroup\$

Coding challenge:

Convert a given long URL into a Tiny URL. The length of this Tiny URL should be fixed and the same for all URLs. You should return the full URL if given a shortened URL.

I am aware this code is technically flawed as there will be clashes owing to the limitations of the hash function in Python.

Therefore I am not specifically looking for coding suggestions on this specific library. Instead what I am curious to know is if this can be improved to use an even shorter URL extension (e.g: 6 characters) that is unique.

I considered using a different hashing function from the import hashing library but the output of these are usually 32-64 characters long and therefore certainly not a "Tiny" URL.

class TinyURL:
 def __init__(self):
 self.storedURLS = {}
 def encode(self, long):
 tiny = hex(hash(long))
 self.storedURLS[tiny] = long
 return tiny
 def decode(self, tiny):
 if tiny in self.storedURLS:
 return self.storedURLS[tiny]
 else:
 return "Tiny URL is not in the database"
t = TinyURL()
encoded = t.encode("google.com")
print(encoded)
decoded = t.decode(encoded)
print(decoded)
AlexV
7,3532 gold badges24 silver badges47 bronze badges
asked Jul 3, 2019 at 22:51
\$\endgroup\$

3 Answers 3

5
\$\begingroup\$

If you really want to cut down on the number of characters that you need to guarantee a certain amount of robustness, it would be wise to pick a greater pool of characters to choose from. At the moment you are basically left with 16 valid values (0, 1, ..., 9, A, ..., F). Case does not count here since 0xA is equal to 0xa. For 6 characters that leaves you with \16ドル^6=2^{24}\$ possible short-urls.

If you were to encode them using the base64 alphabet, you would have 64 characters to choose from, which grants you \64ドル^6=2^{36}\$ possible short-urls.

import base64
desired_length = 6
encoded_hash = base64.urlsafe_b64encode(
 str(hash("www.google.com")).encode("ascii")
)
minified_address = encoded_hash[:desired_length]

A character of a base64 string will represent 6bit instead of 4bit as a hex value does. As a consequence you would need to pick a length that is a multiple of two if you ever wanted to revert the encoded bytes, but that should not be necessary in your case. Take note that I opted for the url-safe version of the base64 alphabet which replaces / by _ and + by -.


It might also be worth to think about using a "stable" hash function, e.g. from hashlib or one of the general purpose hash functions presented here on this site, instead of the built-in hash(...) which uses a random seed to mitigate certain attacks (also here in the documentation of object.__hash__). But that depends on whether you want to map www.google.com to the same short-url every time your program is queried or not.

answered Jul 4, 2019 at 10:08
\$\endgroup\$
2
  • \$\begingroup\$ Is there not a risk that encoded_hash[:desired_length] would cause a clash as the first 6 characters may be the same but the last characters may be different? \$\endgroup\$ Commented Jul 4, 2019 at 10:10
  • 2
    \$\begingroup\$ Of course there is a risk. But the risk is also present if you just use a shorter hash. A hash of a given bit-length \$n\$ can only represent \2ドル^n\$ values without collision. If you were to encode them in hex and pick 6 characters you end up with \$n=24\$ (\$n=36\$ for base64). \$\endgroup\$ Commented Jul 4, 2019 at 10:19
6
\$\begingroup\$

I know this isn't really what you're asking for, but I have a couple suggestions about how the code is setup and written.

Coming from languages that use long as a type, a line like:

self.storedURLS[tiny] = long

Is a little jarring initially. No, long is not a type or builtin in Python, but I think it would be neater if a more descriptive name was used, and one that doesn't clash with other languages. I'd probably write it closer to

class TinyURL:
 def __init__(self):
 self.storedURLS = {}
 def encode(self, full_address):
 minified_address = hex(hash(full_address))
 self.storedURLS[minified_address] = full_address
 return minified_address
 def decode(self, minified_address):
 if minified_address in self.storedURLS:
 return self.storedURLS[minified_address]
 else:
 return "Tiny URL is not in the database"

I also disagree with a String message being returned in the result of a lookup failure in decode. What if the caller wants to check for failure? They'd have to check the return against a known error message, which is clunky.

I'd return None in the event of failure, then document that fact.

def decode(self, minified_address):
 """Returns the full address associated with the given minified address,
 or None in the event of an invalid minified_address.
 """
 return self.storedURLS.get(minified_address, None) # None is the default

That way, you can easily handle a failed lookup:

t = TinyURL()
full_address = t.decode("bad address")
if full_address:
 print("valid")
else:
 print("invalid")

And I think with the new assignment expression, you'll be able to write that even more succinctly:

if full_address := t.decode("bad address"):
 print("valid", full_address)
else:
 print("invalid")

Although apparently 3.8 isn't released yet? I think the above should work, although I don't think I can test it quite yet.

answered Jul 3, 2019 at 23:29
\$\endgroup\$
2
  • \$\begingroup\$ Thanks. I remember long being a builtin method in Visual Basic 6 (years and years ago!). I take your point though. Thanks \$\endgroup\$ Commented Jul 3, 2019 at 23:32
  • 2
    \$\begingroup\$ @EML Really, it's not a big deal, I just personally don't like the look of it. I switch back and forth between languages a fair amount, and that line gave me a short "they're assigning a type?" moment. I wouldn't be surprised if other Java/C++ writers feel similarly (although I'm curious if that's the case). \$\endgroup\$ Commented Jul 3, 2019 at 23:34
5
\$\begingroup\$

While I agree with the other answer that you should not return a special string denoting failure, in Python you should refrain from returning any special return value (though if you do, None is an OK choice).

Instead you usually want to raise an informative exception, which can then be handled by the caller. Here the code already raises a KeyError, which you could keep, or you could add a custom exception.

def decode(self, tiny_url):
 """Returns the full address associated with the given tiny url.
 Raises a `KeyError` in the event of an invalid `tiny_url`. """
 return self.urls[tiny_url]
...
try:
 decoded = t.decode(encoded)
except KeyError:
 print("Invalid")
print(decoded)

Note that I also changed the name of TinyURL.storedURLS in order to conform to Python's official style-guide, PEP8, which recommends using lower_case as variable and function names.

answered Jul 4, 2019 at 6:54
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.