Eliminate duplicates from strings

Question 1

I have written a piece of code for removing duplicated values from a string. Any suggestions on improvements?

public static String removeDuplicate(String s)
{
 char [] temp = s.toCharArray();
 int length =temp.length; 
 for (int i=0;i<length;i++)
 {
 for (int j = i+1; j<length;j++)
 {
 if(temp[i]==temp[j])
 {
 int test =j;
 for(int k=j+1; k<length ; k++)
 {
 temp[test] = temp[k];
 test++;
 }
 length--;
 j--;
 }
 }
 }
 return String.copyValueOf(temp).substring(0,length);
}

Question 2

Yusshi's code is perfectly fine. If you know that all the characters are in a specific range (e.g., ASCII code 0 to 255), you could also use either of the following codes which is faster. They uses an array rather than a hash set. The time complexity is O(n) and the space complexity is O(1). You cannot go lower in either case.

This one sorts the characters in the output string:

public static String removeDuplicates(String str) {
 int charsCount[] = new int[256];
 for (int i = 0; i < str.length(); i++) {
 char ch = str.charAt(i);
 charsCount[ch]++;
 }
 StringBuilder sb = new StringBuilder(charsCount.length);
 for (int i = 0; i < charsCount.length; i++) {
 if (charsCount[i] > 0) {
 sb.append((char)i);
 }
 }
 return sb.toString();
}

This one preserves the order of the characters:

public static String removeDuplicates(String str) {
 boolean seen[] = new boolean[256];
 StringBuilder sb = new StringBuilder(seen.length);
 for (int i = 0; i < str.length(); i++) {
 char ch = str.charAt(i);
 if (!seen[ch]) {
 seen[ch] = true;
 sb.append(ch);
 }
 }
 return sb.toString();
}

Question 3

This can be simplified a great deal with an auxiliary data structure. Basically, we want to scan the string, and only add characters that we have not yet seen. Hence, if we use a data structure with fast insert/lookup, we can quickly tell if we've already seen that character without having to rescan over the string. In this case, this suggests a Set:

import java.util.Set;
import java.util.HashSet;
public static String removeDuplicate(String s)
{
 StringBuilder sb = new StringBuilder();
 Set<Character> seen = new HashSet<Character>();
 for(int i = 0; i < s.length(); ++i) {
 char c = s.charAt(i);
 if(!seen.contains(c)) {
 seen.add(c);
 sb.append(c);
 }
 }
 return sb.toString();
}

This trades time for space, so in this case, it should make the algorithm \$O(N)\$ time (as opposed to \$O(N^3)\$) at the cost of using \$O(N)\$ extra space.

Question 4

You can completely avoid the contain check by using LinkedHashSet instead of HashedSet. Don't forget your range-based for when extracting chars from a String.

Question 5

Sir in the above answer can you please tell what charsCount[ch]++; does?

Question 6

Style

Java Code-style conventions put:

the { brace at the end of the line that introduces the compound statement block, not on the next line.
spaces between variables and operators: for (int i=0;i<length;i++)

Your code should look like:

public static String removeDuplicate(String s) {
 char[] temp = s.toCharArray();
 int length = temp.length;
 for (int i = 0; i < length; i++) {
 for (int j = i + 1; j < length; j++) {
 if (temp[i] == temp[j]) {
 int test = j;
 for (int k = j + 1; k < length; k++) {
 temp[test] = temp[k];
 test++;
 }
 length--;
 j--;
 }
 }
 }
 return String.copyValueOf(temp).substring(0, length);
}

Algorithm

You have three nested loops. This is inefficient.

A slight nit-pick is that you also use an inefficient way to get the resulting String using the copyValueOf(...) and then the substring(...).

I believe if you take an 'additive' approach to the process you will have a faster system. By extracting the onle loop in to a function it is also more readable:

private static final boolean foundIn(char[] temp, int size, char c) {
 for (int i = 0; i < size; i++) {
 if (temp[i] == c) {
 return true;
 }
 }
 return false;
}
public static String removeDuplicateY(String s) {
 char[] temp = s.toCharArray();
 int size = 0; //how many unique chars found so far
 for (int i = 0; i < temp.length; i++) {
 if (!foundIn(temp, size, temp[i])) {
 // move the first-time char to the end of the return value
 temp[size] = temp[i];
 size++;
 }
 }
 return new String(temp, 0, size);
}

Helium Helium 2961 silver badge4 bronze badges · Accepted Answer · 2014-04-10 09:11:21Z

Yusshi's code is perfectly fine. If you know that all the characters are in a specific range (e.g., ASCII code 0 to 255), you could also use either of the following codes which is faster. They uses an array rather than a hash set. The time complexity is O(n) and the space complexity is O(1). You cannot go lower in either case.

This one sorts the characters in the output string:

public static String removeDuplicates(String str) {
 int charsCount[] = new int[256];
 for (int i = 0; i < str.length(); i++) {
 char ch = str.charAt(i);
 charsCount[ch]++;
 }
 StringBuilder sb = new StringBuilder(charsCount.length);
 for (int i = 0; i < charsCount.length; i++) {
 if (charsCount[i] > 0) {
 sb.append((char)i);
 }
 }
 return sb.toString();
}

This one preserves the order of the characters:

public static String removeDuplicates(String str) {
 boolean seen[] = new boolean[256];
 StringBuilder sb = new StringBuilder(seen.length);
 for (int i = 0; i < str.length(); i++) {
 char ch = str.charAt(i);
 if (!seen[ch]) {
 seen[ch] = true;
 sb.append(ch);
 }
 }
 return sb.toString();
}

Stack Exchange Network

Eliminate duplicates from strings

3 Answers 3

Style

Algorithm

Hot Network Questions

Eliminate duplicates from strings

3 Answers 3

Style

Algorithm

Related

Hot Network Questions