This program counts the total number of times any character from the source string can be found in the target string.
E.g Source String - "Hello World"
Target String - "llo"
Output = 5
(as there are 3 l
's and 2 o
's)
Source String - "Hello World"
Target String - "ooood"
Output = 3
(as there are 2 o
's and 1 d
)
Is there any better way of write the below program?
public class CountOccurence {
public static int countChars(String target, String source){
char a[] = source.toCharArray();
int loc = 0, count = 0;
boolean charRepeat = false;
for (loc = 0; loc < source.length(); loc++) {
charRepeat = false;
for (int z = 0; z < loc; z++) {
if (a[loc] == a[z]) {
charRepeat = true;
break;
}
}
if (charRepeat == false) {
int i = 0;
while ((i = target.indexOf(a[loc], i)) != -1) {
count++;
i++;
}
}
}
return count;
}
public static void main(String[] args) {
String target = "Hello World";
String source = "oood";
int count = countChars(target, source);
System.out.println(count);
source = "lld";
count = countChars(target, source);
System.out.println(count);
}
}
-
\$\begingroup\$ to reduce confusion, the source string should just be a set of characters. \$\endgroup\$Woot4Moo– Woot4Moo2014年11月03日 12:35:01 +00:00Commented Nov 3, 2014 at 12:35
5 Answers 5
Style
There are a few bad habits in here that should be resolved. Here's your core method:
public static int countChars(String target, String source){ char a[] = source.toCharArray(); int loc = 0, count = 0; boolean charRepeat = false; for (loc = 0; loc < source.length(); loc++) { charRepeat = false; for (int z = 0; z < loc; z++) { if (a[loc] == a[z]) { charRepeat = true; break; } } if (charRepeat == false) { int i = 0; while ((i = target.indexOf(a[loc], i)) != -1) { count++; i++; } } } return count; }
let's go through some things that are style-based.
source comes before target. The natural thinking is that you go from the source to the target, your naming, and the order of the parameters, is awkward. I would call the
source
something likesearchFor
, and thetarget
I would callsearchIn
. Note, you have confused these values so much that your description does not match your code. In your description you have:E.g Source String - "Hello World" Target String - "ooood" Output = 3
but in your code you have:
String target = "Hello World"; String source = "oood"; int count = countChars(target, source);
Note how you have swapped the source/target between the example and the code.
source
andtarget
are basically bad names to have...Your code uses a number of small variables that are unconventional.
loc
is not a terrible name, it is clearly short forlocation
, but using the filllocation
would be better. I have a habit of usingpos
so I can't really complain. On the other hand, the variablesa
for the char array, andz
for an index, are bad names. You already usecharRepeat
, so you know how to use meaningful names, use them. Thez
is interesting because it is common to use x, y, and z as names for coordinate space, so seeingz
automatically implies there's an x and y too. You should rather just use the standard index variablesi
,j
, andk
(but don't usej
unless you are already usingi
, and don't usek
unless you are usingj
too).
Algorithm
Your algorithm is a bit messed up too. Basically, you scan the searchFor
chars, and find the first occurrence of each char value. If it's the first one, you then use that to scan the searchIn
value, and you count the 'hits' in there. There are two issues here. The first issue is the performance of the 'first check'. The performance is \$O(n^2)\$ where \$n\$ is the number of chars in the searchFor
value.
Then, for each unique char, you then do a loop through all the searchIn
chars. The loop through all those are disguised in two steps, first the while loop, and then inside that, the indexOf
search.
The net result is that you have a performance of \$O(nm)\$ or \$O(n^2)\$ depending on whether there are more characters in the searchIn
or searchFor
strings.
There is a better solution for both the uniqueness testing, and for the searching. Consider sorting the chars in each string:
char[] lookFor = searchFor.toCharArray();
char[] lookIn = searchIn.toCharArray();
Arrays.sort(lookFor);
Arrays.sort(lookIn);
The two sorts are \$O(n \log n)\$ performance, so this is, so far, significantly less complicated... now, how do we compare? Well, with a 'simple' loop over all the lookIn characters:
int forpos = 0;
int count = 0;
for (char c : lookIn) {
// advance the searchFor cursor to the next usable position
while (forpos < lookFor.length && lookFor[forpos] < c) {
forpos++;
}
if (forpos >= lookFor.length) {
//nothing left to look for.
break;
}
if (c == lookFor[forpos]) {
count++;
}
}
return count;
With the above code, you do two sorts, and 1 loop through each value. The net result is that your performance complexity is simply \$O(n \log n)\$ where \$n\$ is the longer of the searchIn
or searchFor
variables.
By sorting each side, you give yourself a significant algorithmic advantage.
Use a set containing the characters of the target string:
public static int countChars(String target, String source)
{
int count = 0;
Set<Character> charSet = new HashSet<Character>();
for (char c : target.toCharArray())
if (!charSet.contains(c))
charSet.add(c);
for (char c : source.toCharArray())
if (charSet.contains(c))
count++;
return count;
}
-
2\$\begingroup\$ There is no point in checking
charSet.contains(c)
before doingcharSet.add(c)
. Sets will automatically deduplicate members. \$\endgroup\$200_success– 200_success2014年11月03日 23:57:16 +00:00Commented Nov 3, 2014 at 23:57 -
\$\begingroup\$ @200_success: I know that. I added it there in order to make it easier for OP to understand. \$\endgroup\$barak manos– barak manos2014年11月04日 08:08:32 +00:00Commented Nov 4, 2014 at 8:08
your way of doing this seems kinda weird to me:
Source "hello world" target "llo"
first create a set for the target, which contains only characters from the target, in this example l and o.
then just iterate once over the source and check for each char if it is in the set! if so, add one to a counter
-
1\$\begingroup\$ It would be awesome if you could format your pseudocode in a more readable way than plaintext... from skimming it it looks like a relatively simple and correct approach though. \$\endgroup\$Vogel612– Vogel6122014年11月03日 22:05:07 +00:00Commented Nov 3, 2014 at 22:05
You could do something like this...
int count = 0;
for (int i = 0; i < target.length(); i++) {
char c = target.charAt(i);
if (c == '#') // If token then do not count
continue;
String cStr = String.valueOf(c);
count += source.length() - source.replaceAll(cStr, "").length();
target = target.replaceAll(cStr, "#"); // Trade letter for token
}
The char '#' acts like a token, telling that that letter was already counted in the target string.
Update for discussion with @Volgel612
int count = 0;
char last = '0円';
char[] chars = target.toCharArray();
Arrays.sort(chars);
for (int i = 0; i < chars.length; i++) {
char c = chars[i];
if (c == last)
continue;
String cStr = String.valueOf(c);
count += source.length() - source.replaceAll(cStr, "").length();
last = c;
}
-
1\$\begingroup\$ Don't you think that replacing the handled character is kind of... useless? You don't get aroind to handling it again either way.... \$\endgroup\$Vogel612– Vogel6122014年11月03日 22:10:11 +00:00Commented Nov 3, 2014 at 22:10
-
\$\begingroup\$ In the case of the example provided, 'llo', the char 'l' appear two times, so the replacement was necessary to avoid using control arrays. @rolfl idea of first sorting the target string would resolve this more easily, though. \$\endgroup\$Dalton– Dalton2014年11月03日 23:21:00 +00:00Commented Nov 3, 2014 at 23:21
-
\$\begingroup\$ You could circumvent this by modifying a copy of source and updating that copy after every operation you perform. Counting characters would then be as simple as:
source.length() - copy.length()
. This approach would probably be faster too, as you have less newly created Strings. Additionally there's no guarantees that # isn't a valid input ... \$\endgroup\$Vogel612– Vogel6122014年11月04日 22:52:40 +00:00Commented Nov 4, 2014 at 22:52 -
\$\begingroup\$ Nah, sorting is better. With sorting I just need to check if the character repeated, if it repeated then just ignore. No need for replacing the target string (or copy). The # token could be anything, that's was just an example. With sorting it is unnecessary. \$\endgroup\$Dalton– Dalton2014年11月05日 11:51:25 +00:00Commented Nov 5, 2014 at 11:51
-
\$\begingroup\$ I'll update the answer so you can see. \$\endgroup\$Dalton– Dalton2014年11月05日 12:15:57 +00:00Commented Nov 5, 2014 at 12:15
If you like a compact solution using regular expressions try this:
public static void main(String[] args) {
System.out.println(countChars("ooood", "Hello World"));
System.out.println(countChars("llo", "Hello World"));
}
public static int countChars(String source, String target) {
if (source.isEmpty()) return 0;
String regex = "[" + source.replaceAll("\\\\", "\\\\\\\\").replaceAll("\\]", "\\\\]") + "]";
return target.length() - target.replaceAll(regex, "").length();
}
It creates a regular expression with a character class containing all characters from target. Then it replaces all those characters in source with nothing and returns how many characters have been removed.
-
1\$\begingroup\$ I don't understand in the slightest what you are trying to accomplish with the string-replace when creating the regex. Fancy adding a little explanation?? \$\endgroup\$Vogel612– Vogel6122014年11月03日 22:06:36 +00:00Commented Nov 3, 2014 at 22:06
-
\$\begingroup\$ Regular expressions seems like the totally wrong tool to use here. \$\endgroup\$Simon Forsberg– Simon Forsberg2014年11月04日 00:12:29 +00:00Commented Nov 4, 2014 at 0:12