I implemented the below URL Shortener class. Used ConcurrentHashMap
to handle concurrency.
Short URL building logic may not be optimal, but it ensures that only small letters will be created with short URL length specified by client.
Am I missing any important concepts here? Are there any improvements that can be done to improve performance and key builder logic? What could be some few followup questions/additional implementations that can be asked after implementing this in a live coding interview?
package com.java.learning;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.IntStream;
public class UrlShortener {
private final Map<String, String> urlMap = new ConcurrentHashMap<>();
private final int shortUrlLength;
public UrlShortener(int shortUrlLength) {
this.shortUrlLength = shortUrlLength;
}
public String shortenUrl(String longUrl) {
if (longUrl.isEmpty()) {
throw new IllegalArgumentException("long URL is empty");
}
String longUrlWithoutHttp = longUrl.replaceFirst("http://", "");
String longUrlWithoutHttps = longUrlWithoutHttp.replaceFirst("https://", "");
String longUrlWithoutWWW = longUrlWithoutHttps.replaceFirst("www.", "");
String actualLongUrl = longUrlWithoutWWW.replaceAll("[^a-zA-Z0-9]", "");
if (actualLongUrl.length() <= shortUrlLength) {
return actualLongUrl;
}
return urlMap.computeIfAbsent(actualLongUrl, s -> {
StringBuilder shortUrl = new StringBuilder(shortUrlLength);
IntStream.range(0, shortUrlLength).forEach(i -> {
int randomInt = ThreadLocalRandom.current().nextInt(62);
int numericVal = actualLongUrl.charAt(i);
char ch = (char) (Math.abs(randomInt - numericVal) % (26 - shortUrlLength) + 'a' + i);
shortUrl.append(ch);
});
return shortUrl.toString();
});
}
public static void main(String[] args) {
UrlShortener urlShortner = new UrlShortener(6);
System.out.println(urlShortner.shortenUrl("www.linkedin.com"));
System.out.println(urlShortner.shortenUrl("http://linkedin.com/"));
System.out.println(urlShortner.shortenUrl("https://linkedin.com/"));
}
@Override
public boolean equals(Object o)
{
if(this==o) return true;
return o instanceof UrlShortener other &&
Objects.equals(other.shortUrlLength,this.shortUrlLength);
}
@Override
public int hashCode()
{
return Objects.hash(this);
}
}
-
\$\begingroup\$ Is it correct that there is no "lengthen" or "unshorten" method? Maybe a copy-and-paste oversight? \$\endgroup\$Captain Man– Captain Man2024年08月26日 21:33:06 +00:00Commented Aug 26, 2024 at 21:33
2 Answers 2
Your class is not really a URL shortener. It is a component in a URL shortener that creates a hash code for a URL. Thus the name of the class should be changed to something like UrlHasher
or similar. This class would be used in the URL shortener web application to create URL's like "https://short.example.com/asdfgh"
However, the class makes several important and blatantly wrong assumptions. It assumes that "https://www.example.com/" would always be the same as "https://example.com/". This is often (but not always) the case for the "www" subdomain, and it is not how domain names work. It also assumes that all non-alphanumeric characters in a URL are meaningless. This means that it treats URL's like "https://test-11-1.example.com/url/path" and "https://test-1-11.example.com/urlpath" as the same URL, which they are not. All of the attempts to "normalize" the URL are therefore unnecessary and should be removed.
Using replaceFirst
to remove the "http://" prefix is error prone as it will remove the substring from anywhere from the URL, if the URL does not start with "http://". It is safer to use startsWith
to ensure that the string you are removing is located only in the exact place you want it to be removed from.
if (url.startsWith("http://") {
// remove first 7 characters
}
But this too makes the assumption that the server treats HTTP and HTTPS requests identically. Which is wrong. So you should not try to remove the protocol prefix.
You probably should check if the input parameter is null
too. Not just go straight into checking the length.
If this was just an exercise on how to use computeIfAbsent
on a ConcurrentHashMap
, then yes that's how one would enerally write the code.
Adding an equals
and hashCode
methods to this class seems completely unnecessary as it is a service, not a data object.
A follow-up question you should be prepared for is "when we put this to production and the hashing method produces an extremely vulgar word, what are you going to tell the customer?"
Hashing
The hash generation does not take into account possible collisions. When you're shortening strings that are possibly hundreds of characters long into six letter hashes, you're going to end up with collisions. There just aren't enough 6 character words to represent the full address space of the internet.
To detect collisions you will have to keep in storage each generated hash and the URL it was generated from. When you generate a hash you would first check if a stored hash exists for the URL and if it does, use it. Otherwise generate a new hash and check that the hash is free. If it is already used, you need to generate a new hash.
The hashing does not have to be connected to the URL in any way. You could just pick 6 random characters. The connection between the URL and the hash is stored in the collision map.
But an URL shortener is not just about making URLs shorter. A short URL is useless if you cannot expand it back to the long URL. So you need another map to from which you can retrieve the original long URL with the hash. The exercise seemed easy until now, because now you can't just rely on computeIfAbsent
anymore. You need to be able to synchronize the access to the two maps.
-
\$\begingroup\$ Thanks for the detailed review Torben. 1.The hash generation does not take into account possible collisions.--> how can i avoid possible collision? what would you suggest ? 2. when we put this to production and the hashing method produces an extremely vulgar word, what are you going to tell the customer? --> Should I create it as mix of numbers and characters and special characters? \$\endgroup\$Jill– Jill2024年08月25日 13:08:24 +00:00Commented Aug 25, 2024 at 13:08
-
2\$\begingroup\$ To detect collisions you will have to keep in storage each generated hash and the URL it was generated from. When you generate a hash you would first check if a stored hash exists for the URL and if it does, use it. Otherwise generate a new hash and check that the hash is free. If it is already used, you need to generate a new hash. The second point was a bit of a joke, but of course if this was a real product, it would be an issue. You should think about howto filter out "ugly" hashes. Adding special characters and numbers would help alleviate the problem. \$\endgroup\$TorbenPutkonen– TorbenPutkonen2024年08月26日 04:34:52 +00:00Commented Aug 26, 2024 at 4:34
-
\$\begingroup\$ In that case will need to implement another concurrent hash map with key as short URL and value as long URL. Hope this will solve collision. \$\endgroup\$Jill– Jill2024年08月26日 05:16:09 +00:00Commented Aug 26, 2024 at 5:16
-
\$\begingroup\$ Regarding the logic to create hash , Is it optimal ?Kindly requesting a feedback if I am over-engineering the logic here ? \$\endgroup\$Jill– Jill2024年08月26日 05:25:03 +00:00Commented Aug 26, 2024 at 5:25
-
2\$\begingroup\$ I think we're straying a bit too far away from the purpose of this site as we're now teaching you how to implement the task instead of reviewing the code you've produced. You should next take the feedback we have provided, improve your code and once you have got it working, submit it for a new review. If you needhelp in implementing any of the ideas, you should consult stackoverflow.com . \$\endgroup\$TorbenPutkonen– TorbenPutkonen2024年08月27日 11:05:53 +00:00Commented Aug 27, 2024 at 11:05
replaceFirst
doesn't guarantee that whatever it replaces is actually at the start of the input string. This may be a problem for an input like example.com/www.info?target=http://help.example.org
.
The input to replaceFirst
is a regular expression, meaning the .
in www.
tells it to replace any character. This is fine if that character is a .
, but for an input like www1.example.com
it'll result in .example.com
which is probably not desirable.
I see no obvious reason for the forEach
to make use of the characters from actualLongUrl
. Since it adjusts them by a random amount and then wraps them into an even smaller interval it doesn't look like we gain much by doing so that shortUrl.append('a' + ThreadLocalRandom.current().nextInt(26))
doesn't also accomplish.
Instead of using an IntStream.range
for a single forEach
and nothing else, wouldn't a regular for
loop be simpler?
hashCode
recurses unconditionally, causing StackOverflowError
s whenever it is used.
-
\$\begingroup\$ shortUrl.append('a' + ThreadLocalRandom.current().nextInt(26)) doesn't also accomplish. --> I agree. What would be the best way to achieve it ? \$\endgroup\$Jill– Jill2024年08月25日 17:19:57 +00:00Commented Aug 25, 2024 at 17:19
Explore related questions
See similar questions with these tags.