Producing residues of combinatorics on strings

Question 1

Basically, this code does multiplication of chosen operations.

public static String[] gen8String(String[] pattern1, String[] pattern2){
 String[] combinedSubset = new String[90]; //emty array for the subset of 90 strings
 String combinedString = ""; //string holder for each combined string
 int index = 0; //used for combinedSubset array
 int present = 0; //used to check if all 6 characters are present
 for(int i = 0; i < 15; i++){
 for(int j = 0; j < 15; j++){
 combinedString = pattern1[i] + pattern2[j]; //combine both 4 letter strings into 8 char length string
 char[] parsedString = combinedString.toCharArray(); //parse into array
 //check if all 6 characters are present
 for(int k = 1; k <= 6; k++)
 {
 if(new String(parsedString).contains(k+"")) {
 present++;
 }
 else
 break;
 //if all 6 are present, then add it to combined subset
 if(present == 6)
 combinedSubset[index++] = combinedString;
 }
 present = 0;
 }
 }
 return combinedSubset;
}

Let's look at an example:

Let's says I have 2 strings ABCDEF and ACDEBF. Now before I call gen8string, I will first perform a \${6 \choose 4}\$ deletion operation on each string. For instance, I will take ABCDEF delete any two characters in that string and I am left with substring of length 4. How many different such strings can I have (given all the letters are unique)? \15ドル = {6 \choose 4}\$. Now, I will do the same for both strings. Now I am left with 2 string arrays, each of size 15.

In gen8String, I am passing in the 2 above mentioned arrays and I am combining the substrings under the following constraints:

I can only combine 2 strings to make one 8-length string (4+4).
I can only combine them if there are no overlapping missing characters.

What do I mean by the 2nd point? Well, let's say I one of the inputs in pattern is ABCD and in pattern2 one of the inputs is ACDE. These 2 substrings cannot be combined because there are overlapping missing characters, i.e. the F. However, if I have ABCD in pattern1 and CDEF, these can be combined because all 6 characters are present in the 8-length string at least once. So, no overlapping missing characters. This can be seen by tracing through the code.

As a concluding note, this function essentially does a

$${{6}\choose{4}} * {{4}\choose{2}} = 90$$

How can I optimize/generalize this code? Ways to improve it?

Question 2

Generalize the code in what way? Please succinctly state what the task is. What should gen8String("ACFFG", "XXZ") do, for example, and why? Also, is this Java? Please tag the question accordingly.

Question 3

Hello, I have taken both of your suggestions and implemented them. Is it good now? Also, @200_success, I had already stated exactly what gen8string does. Was my explanation not sufficient?

Question 4

Thanks for adding the explanation. However, it looks like the code isn't working as intended yet. In other words, you are asking for a solution to a problem rather than an open-ended critique of code that already works, which makes your question off-topic for this site.

Question 5

What do you mean? I have written code and I am asking how to make it more general?

Question 6

@200_success I have fixed the code and made it more appropriate for Codereview. Does it meet the requirements now?

Question 7

Think about alternatives

Let's says I have 2 strings ABCDEF and ACDEBF. Now before I call gen8string, I will first perform a \$\binom 6 4\$ deletion operation on each string. For instance, I will take ABCDEF delete any two characters in that string and I am left with substring of length 4. How many different such strings can I have (given all the letters are unique)? \15ドル=\binom 6 4\$. Now, I will do the same for both strings. Now I am left with 2 string arrays, each of size 15.

Why?

You want to generate 90 strings. Why generate 255 strings and then mask out 165 of them? Why not just generate 90 strings directly?

ABEFCDEF 001122
ABDFCDEF 001212
ABDECDEF 001221
...
ABCFBDEF 020112
...
ABEFABCD 221100

The six numbers are which string has the letter. So 001122 means AB in the first string, CD in the second string, and EF in both. Iterate from 001122 to 221100 or vice versa. Then you don't need gen8string at all, as all it does is eliminate invalid strings like 2222-- and 22210-.

Of course, this would happen outside the code that you posted.

Working with what we have here

You have

public static String[] gen8String(String[] pattern1, String[] pattern2){
 String[] combinedSubset = new String[90]; //emty array for the subset of 90 strings
 String combinedString = ""; //string holder for each combined string
 int index = 0; //used for combinedSubset array
 int present = 0; //used to check if all 6 characters are present
 for(int i = 0; i < 15; i++){
 for(int j = 0; j < 15; j++){
 combinedString = pattern1[i] + pattern2[j]; //combine both 4 letter strings into 8 char length string
 char[] parsedString = combinedString.toCharArray(); //parse into array
 //check if all 6 characters are present
 for(int k = 1; k <= 6; k++)
 {
 if(new String(parsedString).contains(k+"")) {
 present++;
 }
 else
 break;
 //if all 6 are present, then add it to combined subset
 if(present == 6)
 combinedSubset[index++] = combinedString;
 }
 present = 0;
 }
 }
 return combinedSubset;
}

Consider

public static String[] gen8String(String[] patterns1, String[] patterns2) {
 String[] results = new String[90];
 int resultCount = 0;
 for (String pattern1 : patterns1) { 
 for (String pattern2 : patterns2) {
 String combined = pattern1 + pattern2;
 //check if all 6 characters are present
 for (int k = 1; combined.contains(k+""); k++) {
 //if all 6 are present, then add it to the results and end
 if (k >= 6) {
 results[resultCount++] = combined;
 break;
 }
 }
 }
 }
 return results;
}

This changes the names of the parameters. They are collections of patterns, not the patterns themselves.

I changed the names of combinedSubset and index to be clearer about how they relate to the results. This allowed me to drop the comments as obsolete.

Switching from C-style for loops to range based for loops gets rid of the unnecessary i and j variables. It also gets rid of the magic number 15.

I renamed combinedString to combined and moved the declaration to where it was set. Since a single value is never used across iterations of the outer two loops, the scope doesn't need to be any wider.

We don't need present, as we already have k. If we make it to the sixth iteration, we can update the results and stop the loop.

If we gate the loop on the contains check, we can just check the iterations once inside the loop. The original code checked twice on each iteration.

Note that your example suggested inputs like ABCD and CDEF. The patterns here are actually like 1234 and 3456.

Generalizing

Your code only works for four element subsets of a six element set. Consider rewriting so the results are a List. Then you wouldn't have to give a result set size or maintain your own count of the results.

The largest value for k (6 here) should probably be a parameter. Or pass in an array or list over which we could iterate.

I already modified the for loops to be based on the size of the input parameters, so that's already generalized.

And of course the name would need to change.

Precalculate

Because the current results are so narrow, you could just set them manually. One ninety element array.

Question 8

Your suggestion for not using gen8string and just computing the 90 in the first place seems very helpful. Can you please provide a code example? I am not able to see how I can just compute the 90 without masking out the 165.

mdfst13 mdfst13 22.4k6 gold badges34 silver badges70 bronze badges · Accepted Answer · 2016-11-06 02:01:13Z

Think about alternatives

Let's says I have 2 strings ABCDEF and ACDEBF. Now before I call gen8string, I will first perform a \$\binom 6 4\$ deletion operation on each string. For instance, I will take ABCDEF delete any two characters in that string and I am left with substring of length 4. How many different such strings can I have (given all the letters are unique)? \15ドル=\binom 6 4\$. Now, I will do the same for both strings. Now I am left with 2 string arrays, each of size 15.

Why?

You want to generate 90 strings. Why generate 255 strings and then mask out 165 of them? Why not just generate 90 strings directly?

ABEFCDEF 001122
ABDFCDEF 001212
ABDECDEF 001221
...
ABCFBDEF 020112
...
ABEFABCD 221100

The six numbers are which string has the letter. So 001122 means AB in the first string, CD in the second string, and EF in both. Iterate from 001122 to 221100 or vice versa. Then you don't need gen8string at all, as all it does is eliminate invalid strings like 2222-- and 22210-.

Of course, this would happen outside the code that you posted.

Working with what we have here

You have

public static String[] gen8String(String[] pattern1, String[] pattern2){
 String[] combinedSubset = new String[90]; //emty array for the subset of 90 strings
 String combinedString = ""; //string holder for each combined string
 int index = 0; //used for combinedSubset array
 int present = 0; //used to check if all 6 characters are present
 for(int i = 0; i < 15; i++){
 for(int j = 0; j < 15; j++){
 combinedString = pattern1[i] + pattern2[j]; //combine both 4 letter strings into 8 char length string
 char[] parsedString = combinedString.toCharArray(); //parse into array
 //check if all 6 characters are present
 for(int k = 1; k <= 6; k++)
 {
 if(new String(parsedString).contains(k+"")) {
 present++;
 }
 else
 break;
 //if all 6 are present, then add it to combined subset
 if(present == 6)
 combinedSubset[index++] = combinedString;
 }
 present = 0;
 }
 }
 return combinedSubset;
}

Consider

public static String[] gen8String(String[] patterns1, String[] patterns2) {
 String[] results = new String[90];
 int resultCount = 0;
 for (String pattern1 : patterns1) { 
 for (String pattern2 : patterns2) {
 String combined = pattern1 + pattern2;
 //check if all 6 characters are present
 for (int k = 1; combined.contains(k+""); k++) {
 //if all 6 are present, then add it to the results and end
 if (k >= 6) {
 results[resultCount++] = combined;
 break;
 }
 }
 }
 }
 return results;
}

This changes the names of the parameters. They are collections of patterns, not the patterns themselves.

I changed the names of combinedSubset and index to be clearer about how they relate to the results. This allowed me to drop the comments as obsolete.

Switching from C-style for loops to range based for loops gets rid of the unnecessary i and j variables. It also gets rid of the magic number 15.

I renamed combinedString to combined and moved the declaration to where it was set. Since a single value is never used across iterations of the outer two loops, the scope doesn't need to be any wider.

We don't need present, as we already have k. If we make it to the sixth iteration, we can update the results and stop the loop.

If we gate the loop on the contains check, we can just check the iterations once inside the loop. The original code checked twice on each iteration.

Note that your example suggested inputs like ABCD and CDEF. The patterns here are actually like 1234 and 3456.

Generalizing

Your code only works for four element subsets of a six element set. Consider rewriting so the results are a List. Then you wouldn't have to give a result set size or maintain your own count of the results.

The largest value for k (6 here) should probably be a parameter. Or pass in an array or list over which we could iterate.

I already modified the for loops to be based on the size of the input parameters, so that's already generalized.

And of course the name would need to change.

Precalculate

Because the current results are so narrow, you could just set them manually. One ninety element array.

Your suggestion for not using gen8string and just computing the 90 in the first place seems very helpful. Can you please provide a code example? I am not able to see how I can just compute the 90 without masking out the 165.

Stack Exchange Network

Producing residues of combinatorics on strings

1 Answer 1

Think about alternatives

Working with what we have here

Generalizing

Precalculate

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Producing residues of combinatorics on strings

1 Answer 1

Think about alternatives

Working with what we have here

Generalizing

Precalculate

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions