Let us first describe a
SimpleFunction
:SimpleFunction( string A,string B ){ int answer = 0; for( int dig = 1; dig <= 9; dig++ ){ if(string A contains digit dig and string B contains digit dig){ answer = answer * 10 + dig; } return answer; }
So basically this function receives two strings as an input and returns a value.
Akshara recently gained interest in coding and she came across this interesting question. She had 2 baskets. Each basket contained a few strings. She wanted to find out what is the probability of picking up 2 strings, 1st string from the first basket and the 2nd string from the second basket, such that the value returned from the Simple function would be an even value.
Since she is not that good at Mathematics, she turns to you for help. Please help her in solving this problem!
Input: The first line contains an integer \$T\$ denoting the number of test cases. The first line of each test case contains 2 integers \$N1\$ and \$N2\$ denoting the size of first basket and second basket respectively. This is followed up by \$(N1 + N2)\$ lines. Each line contains a string. The first \$N1\$ string correspond to the first basket while the remaining to the second basket.
Output:
For each test case output the required probability correct up to 3 decimal places.
Constraints:
- \1ドル \le T \le 10\$
- \1ドル \le N1, N2 \le 1000\$
- \1ドル \le \text{Length of String} \le 1000\$
Every String is composed of digits from
[1 - 9]
Sample input
1 2 2 234526 8345 333564 98847675
Sample output
0.750
Explanation
The output of all the combination will be:
Simple(234526,333564) = 3456 Simple(8345,333564) = 345 Simple(8345,98847675) = 458 Simple(234526,98847675) = 456
Since 3 of them are even, probability is \$\frac{3}{4}\$.
Time Limit: 1.0 sec(s) for each input file.
Memory Limit: 256 MB
My introduction of the algorithm:
I used a C# Dictionary
to do memorization to cut time, and also process the input string to the best, but still time out in last case; therefore, I had to study the code in C++ using self-defined hashing function. I also wrote a C# solution using the same ideas, and passed all test cases, no more timeout. Here is the code link.
I am looking for advice on code style, object-oriented programming, hash function insights, self-defined hash function comparison to a C# Dictionary
class.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace SimpleFunction_StudyCode
{
class Program
{
public static int DIGITS = 10;
public static int NO_BASKETS = 2;
public static int NUMBER_DIGITS = 1000;
public static int[][][] hashingResults = new int[NO_BASKETS][][]; // 2, 1000, 10 - 80KB
internal class QueryData
{
public int n1, n2;
public string[] numbers1, numbers2;
}
static void Main(string[] args)
{
ProcessSimpleFunction();
//RunSampleTestcase();
}
public static void RunSampleTestcase()
{
IList<QueryData> input = new List<QueryData>();
QueryData one = new QueryData();
one.n1 = 2;
one.n2 = 2;
one.numbers1 = new string[] {"234526", "8345" };
one.numbers2 = new string[] {"333564", "98847675" };
input.Add(one);
IList<string> result = CalculateProbabilty(input);
Debug.Assert(result[0].CompareTo("0.750") == 0);
}
public static void ProcessSimpleFunction()
{
IList<QueryData> input = ProcessInput();
IList<string> result = CalculateProbabilty(input);
foreach (string s in result)
Console.WriteLine(s);
return;
}
/*
* 1. calculate even number's count
* 2. calculate probability
* 3. convert to string - a decimal data type with 3 decimals
* 4. put the return in a list
*/
public static IList<string> CalculateProbabilty(IList<QueryData> input)
{
IList<string> output = new List<string>();
if (input == null || input.Count == 0)
{
output.Add(0.ToString("N3"));
return output;
}
HashingResults_Declaration();
foreach(QueryData data in input)
{
HashingResults_Initialization();
int n1 = data.n1;
int n2 = data.n2;
for (int i = 0; i < n1; i++)
{
char[] current = data.numbers1[i].ToCharArray();
Hash(current, i, 0);
}
for (int i = 0; i < n2; i++)
{
char[] current = data.numbers2[i].ToCharArray();
Hash(current, i, 1);
}
long count = 0;
for (int i = 0; i < n1; i++)
{
for (int j = 0; j < n2; j++)
{
count += CheckNumberIsEven(i, j);
}
}
float probability = (float)((float)(count) / (float)(n1 * n2));
output.Add(probability.ToString("N3"));
}
return output;
}
/*
* return: IList<QueryData>
*/
private static IList<QueryData> ProcessInput()
{
int queries = Convert.ToInt32(Console.ReadLine());
IList<QueryData> input = new List<QueryData>();
while (queries > 0)
{
QueryData data = new QueryData();
int[] baskets = Array.ConvertAll(Console.ReadLine().Split(' '), int.Parse);
data.n1 = baskets[0];
data.n2 = baskets[1];
data.numbers1 = new string[data.n1];
data.numbers2 = new string[data.n2];
for (int i = 0; i < data.n1; i++)
{
data.numbers1[i] = Console.ReadLine();
}
for (int i = 0; i < data.n2; i++)
{
data.numbers2[i] = Console.ReadLine();
}
input.Add(data);
queries--;
}
return input;
}
/*
* hashing function will be applied to the data, the result is a jagged array called hashingResults
* hashing results:
* int[NO_BASKETS][][]; // 2, 1000, 10 - 80KB
*/
private static void HashingResults_Declaration()
{
for (int i = 0; i < NO_BASKETS; i++)
{
hashingResults[i] = new int[NUMBER_DIGITS][];
for (int j = 0; j < NUMBER_DIGITS; j++)
{
hashingResults[i][j] = new int[DIGITS];
}
}
}
/*
* hashing results:
* int[NO_BASKETS][][]; // 2, 1000, 10 - 80KB
*/
private static void HashingResults_Initialization()
{
for (int i = 0; i < NO_BASKETS; i++)
{
for (int j = 0; j < NUMBER_DIGITS; j++)
{
for (int k = 0; k < DIGITS; k++)
{
hashingResults[i][j][k] = 0;
}
}
}
}
/*
*
* @cache - char[]
* @nthNumber - each basket has n numbers, n from 0 to N-1
* @serialNo - two baskets, so two values: 0, 1
*
* hash function -
* map data of arbitrary size to data of fixed size
* int[][][] cache = new int[2][][]; // 2, 1000, 10
* size of cache is 2 * 1000 * 10 * 4 bytes = 80KB
*
* '0' ascii value is 48
*/
private static void Hash(char[] cache, int nthNumber, int serialNo)
{
for (int i = 0; i < cache.Length; i++)
{
int digit = cache[i] - 48;
hashingResults[serialNo][nthNumber][digit] = 1;
}
}
/*
* digits from 1 to 9
* return: 1 - even number 0 - odd number
*/
private static int CheckNumberIsEven(int first, int second)
{
int digitValue = 9;
while (digitValue > 0)
{
if (hashingResults[0][first][digitValue] == 1 &&
hashingResults[1][second][digitValue] == 1)
{
break;
}
digitValue--;
}
return (digitValue % 2 == 0)? 1 : 0;
}
}
}
Edit:
After comment from @paparazzi - the solution is hard for follow
In order to understand the algorithm, I had to write down my calculation, and then, people can help me quickly for this hash algorithm, compute all options of dictionary choices before writing code, have some idea of reasonable size of space, continue to reduce the space from around \2ドル MB\$ to \80ドル KB\$.
For example, use C# Dictionary to construct a key for a simple(N1, N2), assuming that the key is constructed as \$(s1.CompareTo(s2) <= 0) ? (s1 + "-" + s2) : (s2 + "-" + s1)\,ドル how many possible strings? \$s1\$ represents integer \$N1\,ドル since \$N1 >= 0\$ and \$n1 <= 1000\,ドル so the choice is \9ドル*9*9 + 1 = 730\,ドル same applies to \$s2\$; therefore, possible key string is \730ドル * 730 = 532900\,ドル assuming that each key string is at most 9 chars, one char uses one byte, so the processing space is \523900ドル * 9 = 4823100 Bytes\$ (around \4ドル.8MB\$).
Actually, the digits's order in integer does not matter when to calculate simple function, therefore, no need to record the order. So, N1>= 0 and n1 <= 1000, the possible choice \9ドル*8*7 + 1 = 505\,ドル the key used in the dictionary can have \505ドル * 505 = 255025\$ choices, then \255025ドル * 9 = 2295225 Bytes\$ (around \2ドル.2MB\$), reduce half of the space.
But, in order to be able to make it one second for six test cases, each test case has \0ドル.16\$ seconds, dictionary size will be good enough around \4ドル\$ percent of \2ドル.2MB\,ドル around \80ドルK\$.
Here is the solution introduced by the customized hash function - space: hashingResults = new int[NO_BASKETS][][], each basket will have \1000ドル\$ integer at most, each integer will have array of \10ドル\$ to store \0ドル\$ or \1ドル\$ for digit from \1ドル\$ to \9ドル\,ドル therefore, the hash dictionary size is \2ドル * 1000 * 10\,ドル each one is integer - four bytes, the size is \80ドルK\$ Bytes. (actually if using bool, then, size at most \20ドルKB\$.)
For example, \$N1 = 12\,ドル we order number from \1ドル\$ to \1000ドル\$ starting index = \0ドル\,ドル so \12ドル\$ is \11ドルth\$ number, then \$hashingResults[0][11]\$ is an array: { \0,1,1,0,0,0,0,0,0,0ドル\$ };
if \$N2 = 122\,ドル \121ドルth\$ number in basket two, then the array \$hashingResults[1][121]\$ is an array: { \0,1,1,0,0,0,0,0,0,0ドル\$ }. As we know, we do not record how many \2ドル\$ in the number \122ドル\,ドル this reduces the space as well. we only record 2 is existing in the number, same as 1. \121ドル\$ and \112ドル\$ will hash to same array, the order of distinct digit is no longer needed to save in hashed result.
Also,the space is not so critical since there is \256ドルMB\$ memory requirement, but calculation of key takes time, more space means more time. My question seems to figure out the target dictionary size, around \80ドルKB\,ドル not \2ドルMB\,ドル considering all other factors, need help.
1 Answer 1
(I am under the impression this question justifies three largely independent answer parts: 1) documentation/description of the approach chosen (and the analysis leading to it) 2) problem analysis 3) coding (and maybe 4) asking CR_). I don't quite feel like doing every part justice.)
In the first revision, there was no description of your overall approach; in particular, not in the code presented -
try again, top down as many levels as needed, using what documentation tool support there is for C#. For "everything" in the code, document what it is to accomplish.
If any one description needs more than "one screen" (cannot easily be viewed without scrolling), it (or the entity described) probably is too involved an should be broken down/simplified/shortened. (A "page" of about 65 lines by 80 characters is a lot, a FORTH screen used to have 16*64 (my #1: polymorphism support).)
For lack of documentation, your approach and analysis has to be second guessed from the code. (I will use signature where you used hash (which I imagine different from the use I have seen here).) For each test case (not intending to give iteration through test cases notable attention):
foreach input string of digits
compute a signature and add to its basket
foreach pair of digit strings from different baskets
evaluate (a simplification of) SimpleFunction, counting the results
output proportion as required
You noticed that just the last bit of the value of SimpleFunction is used in the described check, and implemented CheckNumberIsEven()
using an "early-out" encountering the largest common digit.
Apparently, this uses O(N1 * N2) evaluations of SimpleFunction. (I failed to find an analytical approach promising to be much faster for just shy of a million pairs.)
The evaluation of SimpleFunction better be fast - you seem to plan to cache function results, an do back-of-the-envelope calculations of the resources required.
The impact of memoisation depends on the time taken to (re-)compute a result and to retrieve it, respectively - let's try to make computation fast.
The signatures for the digit strings are the sets of digits used, in a normalised representation. As long as DIGITS
is no larger than the number of bits handled in a basic operation, this can be handled as a "bit set", allowing fast intersection (bitwise and), union (or), ....
With nine digit values allowed, there are 29(-1) possible combinations: one could represent the baskets with 2*511 10-bit counts, one for each possible set of digits, instead of 2*1000 9-bit signatures.
To check if SimpleFunction for a pair of signatures would be even, get "the and" representing digits common to both strings, find the highest common digit using Integer.low/highestOneBit()
or some such and map to even/odd.
Where is the code documentation?
Everything not in the code will get separated - when code is copied and pasted into a different context, if not before. (Another thing python got right: the doc strings are between essential parts of the code.)
QueryData
/ProcessInput()
: Dispensable - whenever you can, process as you go.
Giving it a try (with presentational abbr.), in Java for lack of a C# environment:
/** Hacker Earth SimpleFunction.
* For each test case, output the relative frequency of
* pairs with an even highest common digit
*/
class Program
/** (natural) multiplicity with each member */
static class MultiSet<V> extends java.util.AbstractSet<V> {
...
}
static final int
DIGITS = 10,
EVEN_DIGITS = 0b10101010, // doesn't even (include '0')
ODD_DIGITS = 0b101010101;
/** maps each digit's int value to the bit used in digit bit sets */
static final int DIGIT_BIT[] = new int[DIGITS];
static {
for (int i = DIGITS ; -1 < --i ; )
DIGIT_BIT[i] = 1<<i;
}
/** @return true if no common digits or highest even */
static boolean NoOddHighestCommonDigit(
int digits_1, int digits_a) {
int both = digits_1 & digits_a;
//or (both^(both&(both-1))) or ...
return 0 == (Long.lowestOneBit(both)
& ODD_DIGITS);
}
/** keep what's needed of <code>digits</code> with basket */
static void tally(String digits, MultiSet<Integer> basket) {
int digit_set = 0;
for (char d: digits.toCharArray())
if ('0' </*=*/ d && d <= '9')
digit_set |= DIGIT_BIT[d-'0'];
basket.add(digit_set);
}
/** establish result histogram */
static double even_proportion(MultiSet<Integer> basket_a,
MultiSet<Integer> basket_b) {
long even = 0, odd = 0;
for (Map.Entry<Integer, Number> a:
basket_a.multiplicity.entrySet())
for (Map.Entry<Integer, Number> b:
basket_a.multiplicity.entrySet()) {
long pairs = a.getValue().intValue()
* b.getValue().intValue();
if (NoOddHighestCommonDigit(
a.getKey(), b.getKey()))
even += pairs;
else
odd += pairs;
}
return (double)even / (even + odd);
}
public static void main(String[] args) {
MultiSet<Integer> basket1 = new MultiSet<>(9),
basket2 = new MultiSet<>(9);
tally("234526", basket1);
tally("8345", basket1);
tally("333564", basket2);
tally("98847675", basket2);
System.out.println(even_proportion(basket1, basket2));
}
}
-
\$\begingroup\$ your analysis of algorithm is very clear to me, foreach pair of digit strings from different baskets, up to 1 million pairs, anything inside the loop has to be scrutinized to avoid timeout, to get largest digit, computing fast is the big concern (using "bit set", counting sort idea), memorization is not bottleneck here. \$\endgroup\$Jianmin Chen– Jianmin Chen2017年01月08日 20:39:03 +00:00Commented Jan 8, 2017 at 20:39
-
\$\begingroup\$ can you help me out your idea of "try again, top down as many levels as needed, using what documentation tool support there is for C#. "(my #1: polymorphism support). Can you give more detail about #1: polymorphism support to look up? \$\endgroup\$Jianmin Chen– Jianmin Chen2017年01月08日 20:41:59 +00:00Commented Jan 8, 2017 at 20:41
-
1\$\begingroup\$ @JianminChen (
more detail about #1: polymorphism support to look up
: that was an uncalled-for remark about an amazing piece of FORTH to not just model a problem and its solution OO, but code it that whay - entirely unrelated to your question or the rest of the answer. I'll try and dig it up - here we are: Mini-OOF.) \$\endgroup\$greybeard– greybeard2017年01月08日 20:44:54 +00:00Commented Jan 8, 2017 at 20:44
Explore related questions
See similar questions with these tags.
Since 3 of [4 numbers in an example] are even, probability is ¾
make my day. \$\endgroup\$