I have written a piece of code for removing duplicated values from a string. Any suggestions on improvements?
public static String removeDuplicate(String s)
{
char [] temp = s.toCharArray();
int length =temp.length;
for (int i=0;i<length;i++)
{
for (int j = i+1; j<length;j++)
{
if(temp[i]==temp[j])
{
int test =j;
for(int k=j+1; k<length ; k++)
{
temp[test] = temp[k];
test++;
}
length--;
j--;
}
}
}
return String.copyValueOf(temp).substring(0,length);
}
3 Answers 3
Yusshi's code is perfectly fine. If you know that all the characters are in a specific range (e.g., ASCII code 0 to 255), you could also use either of the following codes which is faster. They uses an array rather than a hash set. The time complexity is O(n) and the space complexity is O(1). You cannot go lower in either case.
This one sorts the characters in the output string:
public static String removeDuplicates(String str) {
int charsCount[] = new int[256];
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
charsCount[ch]++;
}
StringBuilder sb = new StringBuilder(charsCount.length);
for (int i = 0; i < charsCount.length; i++) {
if (charsCount[i] > 0) {
sb.append((char)i);
}
}
return sb.toString();
}
This one preserves the order of the characters:
public static String removeDuplicates(String str) {
boolean seen[] = new boolean[256];
StringBuilder sb = new StringBuilder(seen.length);
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
if (!seen[ch]) {
seen[ch] = true;
sb.append(ch);
}
}
return sb.toString();
}
This can be simplified a great deal with an auxiliary data structure. Basically, we want to scan the string, and only add characters that we have not yet seen. Hence, if we use a data structure with fast insert/lookup, we can quickly tell if we've already seen that character without having to rescan over the string. In this case, this suggests a Set
:
import java.util.Set;
import java.util.HashSet;
public static String removeDuplicate(String s)
{
StringBuilder sb = new StringBuilder();
Set<Character> seen = new HashSet<Character>();
for(int i = 0; i < s.length(); ++i) {
char c = s.charAt(i);
if(!seen.contains(c)) {
seen.add(c);
sb.append(c);
}
}
return sb.toString();
}
This trades time for space, so in this case, it should make the algorithm \$O(N)\$ time (as opposed to \$O(N^3)\$) at the cost of using \$O(N)\$ extra space.
-
\$\begingroup\$ You can completely avoid the contain check by using
LinkedHashSet
instead ofHashedSet
. Don't forget your range-basedfor
when extractingchar
s from aString
. \$\endgroup\$Snowhawk– Snowhawk2015年07月05日 09:14:16 +00:00Commented Jul 5, 2015 at 9:14 -
\$\begingroup\$ Sir in the above answer can you please tell what charsCount[ch]++; does? \$\endgroup\$Shantanu Nandan– Shantanu Nandan2015年07月19日 06:06:59 +00:00Commented Jul 19, 2015 at 6:06
Style
Java Code-style conventions put:
- the
{
brace at the end of the line that introduces the compound statement block, not on the next line. - spaces between variables and operators:
for (int i=0;i<length;i++)
Your code should look like:
public static String removeDuplicate(String s) { char[] temp = s.toCharArray(); int length = temp.length; for (int i = 0; i < length; i++) { for (int j = i + 1; j < length; j++) { if (temp[i] == temp[j]) { int test = j; for (int k = j + 1; k < length; k++) { temp[test] = temp[k]; test++; } length--; j--; } } } return String.copyValueOf(temp).substring(0, length); }
Algorithm
You have three nested loops. This is inefficient.
A slight nit-pick is that you also use an inefficient way to get the resulting String using the copyValueOf(...)
and then the substring(...)
.
I believe if you take an 'additive' approach to the process you will have a faster system. By extracting the onle loop in to a function it is also more readable:
private static final boolean foundIn(char[] temp, int size, char c) {
for (int i = 0; i < size; i++) {
if (temp[i] == c) {
return true;
}
}
return false;
}
public static String removeDuplicateY(String s) {
char[] temp = s.toCharArray();
int size = 0; //how many unique chars found so far
for (int i = 0; i < temp.length; i++) {
if (!foundIn(temp, size, temp[i])) {
// move the first-time char to the end of the return value
temp[size] = temp[i];
size++;
}
}
return new String(temp, 0, size);
}