Reduce String Length With Thread Safety & Concurrency

Question 1

I implemented the below URL Shortener class. Used ConcurrentHashMap to handle concurrency.

Short URL building logic may not be optimal, but it ensures that only small letters will be created with short URL length specified by client.

Am I missing any important concepts here? Are there any improvements that can be done to improve performance and key builder logic? What could be some few followup questions/additional implementations that can be asked after implementing this in a live coding interview?

package com.java.learning;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.IntStream;
public class UrlShortener {
 private final Map<String, String> urlMap = new ConcurrentHashMap<>();
 private final int shortUrlLength;
 public UrlShortener(int shortUrlLength) {
 this.shortUrlLength = shortUrlLength;
 }
 public String shortenUrl(String longUrl) {
 if (longUrl.isEmpty()) {
 throw new IllegalArgumentException("long URL is empty");
 }
 String longUrlWithoutHttp = longUrl.replaceFirst("http://", "");
 String longUrlWithoutHttps = longUrlWithoutHttp.replaceFirst("https://", "");
 String longUrlWithoutWWW = longUrlWithoutHttps.replaceFirst("www.", "");
 String actualLongUrl = longUrlWithoutWWW.replaceAll("[^a-zA-Z0-9]", "");
 if (actualLongUrl.length() <= shortUrlLength) {
 return actualLongUrl;
 }
 return urlMap.computeIfAbsent(actualLongUrl, s -> {
 StringBuilder shortUrl = new StringBuilder(shortUrlLength);
 IntStream.range(0, shortUrlLength).forEach(i -> {
 int randomInt = ThreadLocalRandom.current().nextInt(62);
 int numericVal = actualLongUrl.charAt(i);
 char ch = (char) (Math.abs(randomInt - numericVal) % (26 - shortUrlLength) + 'a' + i);
 shortUrl.append(ch);
 });
 return shortUrl.toString();
 });
 }
 public static void main(String[] args) {
 UrlShortener urlShortner = new UrlShortener(6);
 System.out.println(urlShortner.shortenUrl("www.linkedin.com"));
 System.out.println(urlShortner.shortenUrl("http://linkedin.com/"));
 System.out.println(urlShortner.shortenUrl("https://linkedin.com/"));
 }
 @Override
 public boolean equals(Object o)
 {
 if(this==o) return true;
 return o instanceof UrlShortener other &&
 Objects.equals(other.shortUrlLength,this.shortUrlLength);
 }
 @Override
 public int hashCode()
 {
 return Objects.hash(this);
 }
}

Question 2

Is it correct that there is no "lengthen" or "unshorten" method? Maybe a copy-and-paste oversight?

Question 3

Your class is not really a URL shortener. It is a component in a URL shortener that creates a hash code for a URL. Thus the name of the class should be changed to something like UrlHasher or similar. This class would be used in the URL shortener web application to create URL's like "https://short.example.com/asdfgh"

However, the class makes several important and blatantly wrong assumptions. It assumes that "https://www.example.com/" would always be the same as "https://example.com/". This is often (but not always) the case for the "www" subdomain, and it is not how domain names work. It also assumes that all non-alphanumeric characters in a URL are meaningless. This means that it treats URL's like "https://test-11-1.example.com/url/path" and "https://test-1-11.example.com/urlpath" as the same URL, which they are not. All of the attempts to "normalize" the URL are therefore unnecessary and should be removed.

Using replaceFirst to remove the "http://" prefix is error prone as it will remove the substring from anywhere from the URL, if the URL does not start with "http://". It is safer to use startsWith to ensure that the string you are removing is located only in the exact place you want it to be removed from.

if (url.startsWith("http://") {
 // remove first 7 characters
}

But this too makes the assumption that the server treats HTTP and HTTPS requests identically. Which is wrong. So you should not try to remove the protocol prefix.

You probably should check if the input parameter is null too. Not just go straight into checking the length.

If this was just an exercise on how to use computeIfAbsent on a ConcurrentHashMap, then yes that's how one would enerally write the code.

Adding an equals and hashCode methods to this class seems completely unnecessary as it is a service, not a data object.

A follow-up question you should be prepared for is "when we put this to production and the hashing method produces an extremely vulgar word, what are you going to tell the customer?"

Hashing

The hash generation does not take into account possible collisions. When you're shortening strings that are possibly hundreds of characters long into six letter hashes, you're going to end up with collisions. There just aren't enough 6 character words to represent the full address space of the internet.

To detect collisions you will have to keep in storage each generated hash and the URL it was generated from. When you generate a hash you would first check if a stored hash exists for the URL and if it does, use it. Otherwise generate a new hash and check that the hash is free. If it is already used, you need to generate a new hash.

The hashing does not have to be connected to the URL in any way. You could just pick 6 random characters. The connection between the URL and the hash is stored in the collision map.

But an URL shortener is not just about making URLs shorter. A short URL is useless if you cannot expand it back to the long URL. So you need another map to from which you can retrieve the original long URL with the hash. The exercise seemed easy until now, because now you can't just rely on computeIfAbsent anymore. You need to be able to synchronize the access to the two maps.

Question 4

Thanks for the detailed review Torben. 1.The hash generation does not take into account possible collisions.--> how can i avoid possible collision? what would you suggest ? 2. when we put this to production and the hashing method produces an extremely vulgar word, what are you going to tell the customer? --> Should I create it as mix of numbers and characters and special characters?

Question 5

To detect collisions you will have to keep in storage each generated hash and the URL it was generated from. When you generate a hash you would first check if a stored hash exists for the URL and if it does, use it. Otherwise generate a new hash and check that the hash is free. If it is already used, you need to generate a new hash. The second point was a bit of a joke, but of course if this was a real product, it would be an issue. You should think about howto filter out "ugly" hashes. Adding special characters and numbers would help alleviate the problem.

Question 6

In that case will need to implement another concurrent hash map with key as short URL and value as long URL. Hope this will solve collision.

Question 7

Regarding the logic to create hash , Is it optimal ?Kindly requesting a feedback if I am over-engineering the logic here ?

Question 8

I think we're straying a bit too far away from the purpose of this site as we're now teaching you how to implement the task instead of reviewing the code you've produced. You should next take the feedback we have provided, improve your code and once you have got it working, submit it for a new review. If you needhelp in implementing any of the ideas, you should consult stackoverflow.com .

Question 9

replaceFirst doesn't guarantee that whatever it replaces is actually at the start of the input string. This may be a problem for an input like example.com/www.info?target=http://help.example.org.

The input to replaceFirst is a regular expression, meaning the . in www. tells it to replace any character. This is fine if that character is a ., but for an input like www1.example.com it'll result in .example.com which is probably not desirable.

I see no obvious reason for the forEach to make use of the characters from actualLongUrl. Since it adjusts them by a random amount and then wraps them into an even smaller interval it doesn't look like we gain much by doing so that shortUrl.append('a' + ThreadLocalRandom.current().nextInt(26)) doesn't also accomplish.

Instead of using an IntStream.range for a single forEach and nothing else, wouldn't a regular for loop be simpler?

hashCode recurses unconditionally, causing StackOverflowErrors whenever it is used.

Question 10

shortUrl.append('a' + ThreadLocalRandom.current().nextInt(26)) doesn't also accomplish. --> I agree. What would be the best way to achieve it ?

TorbenPutkonen TorbenPutkonen 9,23921 silver badges29 bronze badges · Accepted Answer · 2024-08-25 12:39:11Z

Your class is not really a URL shortener. It is a component in a URL shortener that creates a hash code for a URL. Thus the name of the class should be changed to something like UrlHasher or similar. This class would be used in the URL shortener web application to create URL's like "https://short.example.com/asdfgh"

However, the class makes several important and blatantly wrong assumptions. It assumes that "https://www.example.com/" would always be the same as "https://example.com/". This is often (but not always) the case for the "www" subdomain, and it is not how domain names work. It also assumes that all non-alphanumeric characters in a URL are meaningless. This means that it treats URL's like "https://test-11-1.example.com/url/path" and "https://test-1-11.example.com/urlpath" as the same URL, which they are not. All of the attempts to "normalize" the URL are therefore unnecessary and should be removed.

Using replaceFirst to remove the "http://" prefix is error prone as it will remove the substring from anywhere from the URL, if the URL does not start with "http://". It is safer to use startsWith to ensure that the string you are removing is located only in the exact place you want it to be removed from.

if (url.startsWith("http://") {
 // remove first 7 characters
}

But this too makes the assumption that the server treats HTTP and HTTPS requests identically. Which is wrong. So you should not try to remove the protocol prefix.

You probably should check if the input parameter is null too. Not just go straight into checking the length.

If this was just an exercise on how to use computeIfAbsent on a ConcurrentHashMap, then yes that's how one would enerally write the code.

Adding an equals and hashCode methods to this class seems completely unnecessary as it is a service, not a data object.

A follow-up question you should be prepared for is "when we put this to production and the hashing method produces an extremely vulgar word, what are you going to tell the customer?"

Hashing

The hash generation does not take into account possible collisions. When you're shortening strings that are possibly hundreds of characters long into six letter hashes, you're going to end up with collisions. There just aren't enough 6 character words to represent the full address space of the internet.

To detect collisions you will have to keep in storage each generated hash and the URL it was generated from. When you generate a hash you would first check if a stored hash exists for the URL and if it does, use it. Otherwise generate a new hash and check that the hash is free. If it is already used, you need to generate a new hash.

The hashing does not have to be connected to the URL in any way. You could just pick 6 random characters. The connection between the URL and the hash is stored in the collision map.

But an URL shortener is not just about making URLs shorter. A short URL is useless if you cannot expand it back to the long URL. So you need another map to from which you can retrieve the original long URL with the hash. The exercise seemed easy until now, because now you can't just rely on computeIfAbsent anymore. You need to be able to synchronize the access to the two maps.

Thanks for the detailed review Torben. 1.The hash generation does not take into account possible collisions.--> how can i avoid possible collision? what would you suggest ? 2. when we put this to production and the hashing method produces an extremely vulgar word, what are you going to tell the customer? --> Should I create it as mix of numbers and characters and special characters?
To detect collisions you will have to keep in storage each generated hash and the URL it was generated from. When you generate a hash you would first check if a stored hash exists for the URL and if it does, use it. Otherwise generate a new hash and check that the hash is free. If it is already used, you need to generate a new hash. The second point was a bit of a joke, but of course if this was a real product, it would be an issue. You should think about howto filter out "ugly" hashes. Adding special characters and numbers would help alleviate the problem.
In that case will need to implement another concurrent hash map with key as short URL and value as long URL. Hope this will solve collision.
Regarding the logic to create hash , Is it optimal ?Kindly requesting a feedback if I am over-engineering the logic here ?
I think we're straying a bit too far away from the purpose of this site as we're now teaching you how to implement the task instead of reviewing the code you've produced. You should next take the feedback we have provided, improve your code and once you have got it working, submit it for a new review. If you needhelp in implementing any of the ideas, you should consult stackoverflow.com .

Stack Exchange Network

Reduce String Length With Thread Safety & Concurrency

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Reduce String Length With Thread Safety & Concurrency

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions