Counting the number of occurrences of chars in one string, in another string

Question 1

This program counts the total number of times any character from the source string can be found in the target string.

E.g Source String - "Hello World"
Target String - "llo"
Output = 5 (as there are 3 l's and 2 o's)

Source String - "Hello World"
Target String - "ooood"
Output = 3 (as there are 2 o's and 1 d)

Is there any better way of write the below program?

public class CountOccurence {
public static int countChars(String target, String source){
 char a[] = source.toCharArray();
 int loc = 0, count = 0;
 boolean charRepeat = false;
 for (loc = 0; loc < source.length(); loc++) {
 charRepeat = false;
 for (int z = 0; z < loc; z++) {
 if (a[loc] == a[z]) {
 charRepeat = true;
 break;
 }
 }
 if (charRepeat == false) {
 int i = 0;
 while ((i = target.indexOf(a[loc], i)) != -1) {
 count++;
 i++;
 }
 }
 }
 return count;
}
public static void main(String[] args) {
 String target = "Hello World";
 String source = "oood";
 int count = countChars(target, source);
 System.out.println(count);
 source = "lld";
 count = countChars(target, source);
 System.out.println(count);
}
}

Question 2

to reduce confusion, the source string should just be a set of characters.

Question 3

Style

There are a few bad habits in here that should be resolved. Here's your core method:

public static int countChars(String target, String source){
 char a[] = source.toCharArray();
 int loc = 0, count = 0;
 boolean charRepeat = false;
 for (loc = 0; loc < source.length(); loc++) {
 charRepeat = false;
 for (int z = 0; z < loc; z++) {
 if (a[loc] == a[z]) {
 charRepeat = true;
 break;
 }
 }
 if (charRepeat == false) {
 int i = 0;
 while ((i = target.indexOf(a[loc], i)) != -1) {
 count++;
 i++;
 }
 }
 }
 return count;
}

let's go through some things that are style-based.

source comes before target. The natural thinking is that you go from the source to the target, your naming, and the order of the parameters, is awkward. I would call the source something like searchFor, and the target I would call searchIn. Note, you have confused these values so much that your description does not match your code. In your description you have:
```
E.g Source String - "Hello World"
Target String - "ooood"
Output = 3
```
but in your code you have:
```
String target = "Hello World";
String source = "oood";
int count = countChars(target, source);
```
Note how you have swapped the source/target between the example and the code.

source and target are basically bad names to have...
Your code uses a number of small variables that are unconventional. loc is not a terrible name, it is clearly short for location, but using the fill location would be better. I have a habit of using pos so I can't really complain. On the other hand, the variables a for the char array, and z for an index, are bad names. You already use charRepeat, so you know how to use meaningful names, use them. The z is interesting because it is common to use x, y, and z as names for coordinate space, so seeing z automatically implies there's an x and y too. You should rather just use the standard index variables i, j, and k (but don't use j unless you are already using i, and don't use k unless you are using j too).

Algorithm

Your algorithm is a bit messed up too. Basically, you scan the searchFor chars, and find the first occurrence of each char value. If it's the first one, you then use that to scan the searchIn value, and you count the 'hits' in there. There are two issues here. The first issue is the performance of the 'first check'. The performance is \$O(n^2)\$ where \$n\$ is the number of chars in the searchFor value.

Then, for each unique char, you then do a loop through all the searchIn chars. The loop through all those are disguised in two steps, first the while loop, and then inside that, the indexOf search.

The net result is that you have a performance of \$O(nm)\$ or \$O(n^2)\$ depending on whether there are more characters in the searchIn or searchFor strings.

There is a better solution for both the uniqueness testing, and for the searching. Consider sorting the chars in each string:

char[] lookFor = searchFor.toCharArray();
char[] lookIn = searchIn.toCharArray();
Arrays.sort(lookFor);
Arrays.sort(lookIn);

The two sorts are \$O(n \log n)\$ performance, so this is, so far, significantly less complicated... now, how do we compare? Well, with a 'simple' loop over all the lookIn characters:

int forpos = 0;
int count = 0;
for (char c : lookIn) {
 // advance the searchFor cursor to the next usable position
 while (forpos < lookFor.length && lookFor[forpos] < c) {
 forpos++;
 }
 if (forpos >= lookFor.length) {
 //nothing left to look for.
 break;
 }
 if (c == lookFor[forpos]) {
 count++;
 }
}
return count;

With the above code, you do two sorts, and 1 loop through each value. The net result is that your performance complexity is simply \$O(n \log n)\$ where \$n\$ is the longer of the searchIn or searchFor variables.

By sorting each side, you give yourself a significant algorithmic advantage.

Question 4

Use a set containing the characters of the target string:

public static int countChars(String target, String source)
{
 int count = 0;
 Set<Character> charSet = new HashSet<Character>();
 for (char c : target.toCharArray())
 if (!charSet.contains(c))
 charSet.add(c);
 for (char c : source.toCharArray())
 if (charSet.contains(c))
 count++;
 return count;
}

Question 5

There is no point in checking charSet.contains(c) before doing charSet.add(c). Sets will automatically deduplicate members.

Question 6

@200_success: I know that. I added it there in order to make it easier for OP to understand.

Question 7

your way of doing this seems kinda weird to me:

Source "hello world" target "llo"

first create a set for the target, which contains only characters from the target, in this example l and o.

then just iterate once over the source and check for each char if it is in the set! if so, add one to a counter

Question 8

It would be awesome if you could format your pseudocode in a more readable way than plaintext... from skimming it it looks like a relatively simple and correct approach though.

Question 9

You could do something like this...

int count = 0;
for (int i = 0; i < target.length(); i++) {
 char c = target.charAt(i);
 if (c == '#') // If token then do not count
 continue;
 String cStr = String.valueOf(c);
 count += source.length() - source.replaceAll(cStr, "").length();
 target = target.replaceAll(cStr, "#"); // Trade letter for token
}

The char '#' acts like a token, telling that that letter was already counted in the target string.

Update for discussion with @Volgel612

int count = 0;
char last = '0円';
char[] chars = target.toCharArray();
Arrays.sort(chars);
for (int i = 0; i < chars.length; i++) {
 char c = chars[i];
 if (c == last)
 continue;
 String cStr = String.valueOf(c);
 count += source.length() - source.replaceAll(cStr, "").length();
 last = c;
}

Question 10

Don't you think that replacing the handled character is kind of... useless? You don't get aroind to handling it again either way....

Question 11

In the case of the example provided, 'llo', the char 'l' appear two times, so the replacement was necessary to avoid using control arrays. @rolfl idea of first sorting the target string would resolve this more easily, though.

Question 12

You could circumvent this by modifying a copy of source and updating that copy after every operation you perform. Counting characters would then be as simple as: source.length() - copy.length(). This approach would probably be faster too, as you have less newly created Strings. Additionally there's no guarantees that # isn't a valid input ...

Question 13

Nah, sorting is better. With sorting I just need to check if the character repeated, if it repeated then just ignore. No need for replacing the target string (or copy). The # token could be anything, that's was just an example. With sorting it is unnecessary.

Question 14

I'll update the answer so you can see.

Question 15

If you like a compact solution using regular expressions try this:

public static void main(String[] args) {
 System.out.println(countChars("ooood", "Hello World"));
 System.out.println(countChars("llo", "Hello World"));
}
public static int countChars(String source, String target) {
 if (source.isEmpty()) return 0;
 String regex = "[" + source.replaceAll("\\\\", "\\\\\\\\").replaceAll("\\]", "\\\\]") + "]";
 return target.length() - target.replaceAll(regex, "").length();
}

It creates a regular expression with a character class containing all characters from target. Then it replaces all those characters in source with nothing and returns how many characters have been removed.

Question 16

I don't understand in the slightest what you are trying to accomplish with the string-replace when creating the regex. Fancy adding a little explanation??

Question 17

Regular expressions seems like the totally wrong tool to use here.

rolfl rolfl 98.1k17 gold badges219 silver badges419 bronze badges · Accepted Answer · 2014-11-03 21:09:36Z

Style

There are a few bad habits in here that should be resolved. Here's your core method:

public static int countChars(String target, String source){
 char a[] = source.toCharArray();
 int loc = 0, count = 0;
 boolean charRepeat = false;
 for (loc = 0; loc < source.length(); loc++) {
 charRepeat = false;
 for (int z = 0; z < loc; z++) {
 if (a[loc] == a[z]) {
 charRepeat = true;
 break;
 }
 }
 if (charRepeat == false) {
 int i = 0;
 while ((i = target.indexOf(a[loc], i)) != -1) {
 count++;
 i++;
 }
 }
 }
 return count;
}

let's go through some things that are style-based.

source comes before target. The natural thinking is that you go from the source to the target, your naming, and the order of the parameters, is awkward. I would call the source something like searchFor, and the target I would call searchIn. Note, you have confused these values so much that your description does not match your code. In your description you have:
```
E.g Source String - "Hello World"
Target String - "ooood"
Output = 3
```
but in your code you have:
```
String target = "Hello World";
String source = "oood";
int count = countChars(target, source);
```
Note how you have swapped the source/target between the example and the code.

source and target are basically bad names to have...
Your code uses a number of small variables that are unconventional. loc is not a terrible name, it is clearly short for location, but using the fill location would be better. I have a habit of using pos so I can't really complain. On the other hand, the variables a for the char array, and z for an index, are bad names. You already use charRepeat, so you know how to use meaningful names, use them. The z is interesting because it is common to use x, y, and z as names for coordinate space, so seeing z automatically implies there's an x and y too. You should rather just use the standard index variables i, j, and k (but don't use j unless you are already using i, and don't use k unless you are using j too).

Algorithm

Your algorithm is a bit messed up too. Basically, you scan the searchFor chars, and find the first occurrence of each char value. If it's the first one, you then use that to scan the searchIn value, and you count the 'hits' in there. There are two issues here. The first issue is the performance of the 'first check'. The performance is \$O(n^2)\$ where \$n\$ is the number of chars in the searchFor value.

Then, for each unique char, you then do a loop through all the searchIn chars. The loop through all those are disguised in two steps, first the while loop, and then inside that, the indexOf search.

The net result is that you have a performance of \$O(nm)\$ or \$O(n^2)\$ depending on whether there are more characters in the searchIn or searchFor strings.

There is a better solution for both the uniqueness testing, and for the searching. Consider sorting the chars in each string:

char[] lookFor = searchFor.toCharArray();
char[] lookIn = searchIn.toCharArray();
Arrays.sort(lookFor);
Arrays.sort(lookIn);

The two sorts are \$O(n \log n)\$ performance, so this is, so far, significantly less complicated... now, how do we compare? Well, with a 'simple' loop over all the lookIn characters:

int forpos = 0;
int count = 0;
for (char c : lookIn) {
 // advance the searchFor cursor to the next usable position
 while (forpos < lookFor.length && lookFor[forpos] < c) {
 forpos++;
 }
 if (forpos >= lookFor.length) {
 //nothing left to look for.
 break;
 }
 if (c == lookFor[forpos]) {
 count++;
 }
}
return count;

With the above code, you do two sorts, and 1 loop through each value. The net result is that your performance complexity is simply \$O(n \log n)\$ where \$n\$ is the longer of the searchIn or searchFor variables.

By sorting each side, you give yourself a significant algorithmic advantage.

Stack Exchange Network

Counting the number of occurrences of chars in one string, in another string

5 Answers 5

Style

Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Counting the number of occurrences of chars in one string, in another string

5 Answers 5

Style

Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions