Mark has a dictionary,
S
, containingn
distinct strings. He defines the benefit value of a string as the sum of the ASCII values of its characters.Mark calls some string
A
and some stringB
prefix neighbors if both of the following conditions are satisfied:
String
A
is a prefix of stringB
.No other string
C
exists in the dictionary that is both smaller in length than stringB
and has stringA
as a prefix.For example, in
S = {"AA", "AB", "ABD", "ABDE"}
, the stringsAB
andABD
are prefix neighbors becauseAB
is a prefix ofABD
and no other string of length< 3
inS
shares that prefix. The stringsABD
andABDE
are also prefix neighbors, butAB
andABDE
are not prefix neighbors because stringC
exists (i.e.,ABD
).Mark wants to select a subset of strings from
S
such that the following conditions are both satisfied:
- No two strings in the subset are prefix neighbors.
- The sum of the benefit value of the selected strings is maximal. Given
S
, help Mark by finding and printing the maximum benefit value he can obtain from a subset of non-prefix-neighbors inS
.Input Format
The first line contains an integer denoting
n
(the number of strings in the dictionary).The second line contains
n
space-separated strings.Constraints
1 < n <= 4 * \10ドル^5\$
1 <= \$S_i\$'s length <= 11
Each string contains only uppercase letters.
Output Format
Print the maximum benefit value that Mark can obtain from a subset of non-prefix-neighbors in
S
.Sample Input 0
3
A B AE
Sample Output 0
200
Explanation 0
{"A", "B", "AE"}
Strings
A
andAE
are prefix neighbors, so they cannot both be in Mark's subset ofS
. StringB
has no prefix neighbor, so we include it in Mark's subset.To maximize the benefit value, we choose
AE
andB
for our subset. We then calculate the following benefit values for the chosen subset:Benefit value of
AE = 65 + 69 = 134
Benefit value of
B = 66
We then calculate and print the total benefit value of our subset, which is
134 + 66 = 200
.
My introduction of the algorithm
This is the algorithm to apply trie knowledge and also use maximum weighted independent set knowledge. And the algorithm is one of medium level in February 2017 RookieRank 2 contest.
In the contest, I implemented the trie algorithm but failed test case 10 afterwards.
So, after the contest, I studied the algorithm and one of submissions, write C# code to ask for review. Most of my time is to learn how to implement a trie, and also help myself understand prefix neighbor. The return statement in the function AddWordToTrieByOneCharATime
is so confusing, for the test case of string dictionary {"A","ABC"}
, the stack of return statements and when the prefix neighbor is set, definitely need better solution in my opinion.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
/*
* Problem statement:
* https://www.hackerrank.com/contests/rookierank-2/challenges/prefix-neighbors
*
* Solution notes:
* This problem can be solved using Trie and DP. Create a trie from the
* given set of strings.
* Find the prefix neighbor of each string. Now create a graph such that
* each string is a node and there exist a bidirectional edge between two
* nodes only if they are prefix neighbors. Now find the maximum weighted
* independent set.
*
* time complexity: O(N*max_length_of_string)
* required knowledge: Trie
*
* Maximum weighted independent set:
* https://en.wikipedia.org/wiki/Independent_set_(graph_theory)
*/
class Solution
{
/*
* code review:
* How to set up a Trie with 26 children?
* n distinct strings are stored in a dictionary, using a data structure Trie.
* If the string is in the dictionary, then we mark the string in the Trie as
* IsInDictionary, WordInDictionary.
* @IsInDictionary - the word is in the original strings
* @WordInDictionary - any string in the orginal strings
*/
internal class Trie
{
public Dictionary<char, Trie> Children { get; set; }
public bool IsInDictionary { get; set; }
public string LastVisitedWord { get; set; }
public Trie()
{
Children = new Dictionary<char, Trie>();
IsInDictionary = false;
LastVisitedWord = "";
}
/*
* AddWordToTrieByOneCharATime
* function is designed to complete the following tasks:
* Add one word in the dictionary to Trie using recursive function
* Add one char a time by scanning a word from left to right.
*
* Tricky part is to set prefix neighbor in the process.
*
* @index
* @charArray
* @word -
* @neighbor - prefix neighbor, it is empty string if there is no prefix neighbor
*
* function return:
* Tuple<string, string> - string and its prefix neighbor
*/
public Tuple<string, string> AddWordToTrieByOneCharATime(
int scanIndex,
char[] charArray,
string word,
string neighbour)
{
bool isEndOfString = scanIndex == charArray.Length;
if (isEndOfString)
{
IsInDictionary = true;
LastVisitedWord = word;
return new Tuple<string, string>(LastVisitedWord, neighbour);
}
char visiting = charArray[scanIndex];
if (!Children.ContainsKey(visiting))
{
Children[visiting] = new Trie();
}
// update neighbor string - if IsInDictionary is true, then it is
// to set as a prefix neighbor
return Children[visiting].AddWordToTrieByOneCharATime(
scanIndex + 1,
charArray,
word,
IsInDictionary ? LastVisitedWord : neighbour);
}
}
/*
* study LINQ - GroupBy
* https://msdn.microsoft.com/en-us/library/bb534304(v=vs.110).aspx
*
* 1. Group string by first char, 26 variations from 'A', 'B', ..., 'Z'
* 2. For each group, sort strings by string's length in ascending order
* 3. For example, group of strings starting from char 'A',
* "A","AB","ACD"
* 4. benefit value is to add all chars' ascii value.
*/
static long Process(string[] dict)
{
var benefitValue = 0L;
var groupped = dict.GroupBy(x => x[0]);
// maximum 26 groups, 'A','B', ..., 'Z'
foreach (var group in groupped)
{
// sort by string's length in ascending order
var sortedStrings = group.OrderBy(x => x.Length);
var trie = new Trie();
var banned = new HashSet<string>();
var stack = new Stack<Tuple<string, string>>();
foreach (var word in sortedStrings)
{
stack.Push(trie.AddWordToTrieByOneCharATime(0, word.ToCharArray(), word, ""));
}
// Enumerate the stack, the longest string will be iterated first.
// Maximum independent set is kind of greedy as well.
foreach (var tuple in stack)
{
if (!banned.Contains(tuple.Item1))
{
benefitValue += tuple.Item1.ToCharArray().Aggregate(0L, (val, next) => val + (long)next);
banned.Add(tuple.Item2);
}
}
}
return benefitValue;
}
static void Main(String[] args)
{
//ProcessInput();
RunSampleTestcase2();
}
/*
* Trie
* A B
* AB
* ABC
* ABCD
*
* Things to learn through the debug:
* How many tries are used in the coding?
* Two tries, one is for A started strings: {"A","ABCD"}
* second one is for B started strings: {"B"}
*/
static void RunSampleTestcase()
{
string[] dict = new string[] { "A", "ABCD", "B" };
var points = Process(dict);
Console.WriteLine(points);
}
/*
* input string[] = new string[]{"A","ABC","AC"}
* Trie
* A
* AB AC
* ABC
*
* Try to debug the code and figure out how to find prefix neighbor of "ABC".
* How many return calls for "ABC" in the function:
* AddWordToTrieByOneCharATime
*/
static void RunSampleTestcase2()
{
string[] dictionary = new string[] { "A", "ABC", "AC" };
var points = Process(dictionary);
Console.WriteLine(points);
}
static void ProcessInput()
{
var n = Convert.ToInt32(Console.ReadLine());
var dict = Console.ReadLine().Split(' ');
var points = Process(dict);
Console.WriteLine(points);
}
}
1 Answer 1
Indentation
Consistent indentation helps readability. The indentation of this code seems to mix tabstops of 1, 2, 3, and 4 spaces.
Prefer interfaces to implementations
public Dictionary<char, Trie> Children { get; set; }
Is there any good reason why this should not be an IDictionary<char, Trie>
?
Try to expose an API rather than the entire implementation
public Dictionary<char, Trie> Children { get; set; } public bool IsInDictionary { get; set; } public string LastVisitedWord { get; set; }
Looking past the syntactic sugar, that's six public methods (three getters and three setters). How many of them should really be public? I think that at minimum the setters should all be private, and I see no good reason to expose LastVisitedWord
at all or to allow an external class to get Children
and then mutate it. I would prefer to publicly expose just the equivalent (see below...) of
public bool IsInDictionary { get; }
public IEnumerable<KeyValuePair<char, Trie>> Children { get; }
public Tuple<string, string> AddWordToTrieByOneCharATime( int scanIndex, char[] charArray, string word, string neighbour)
exposes far more than is necessary. If you make that private then a public method AddWord(string)
can call it with the appropriate initial values.
Names
public bool IsInDictionary { get; set; }
public string LastVisitedWord { get; set; }
The first of these names makes sense in the right context, but when you're reading the code you see it in the context of Dictionary<char, Trie> Children { get; set; }
. How about IsWord
?
The second makes sense only in the middle of a call to AddWordToTrieByOneCharATime
. I think it should be LongestPrefix
or something similar.
Separation of responsibilities
If the trie is intended for general use, I think that the way finding the prefix is integrated into it violates the principle of separation of responsibilities. If it's intended to be single-purpose and separation of responsibilities is intentionally sacrificed for better performance, it doesn't go far enough: the benefit calculation could be rolled into the Trie
constructor to avoid recomputing it for each common prefix.
I think that you should refactor to separate trie building from prefix graph building for one simple reason: the documentation states that the complexity is O(N*max_length_of_string), but that's not true because the merging of responsibilities in trie building means that the strings have to be sorted before insertion, giving Theta(N lg N).
KISS
var groupped = dict.GroupBy(x => x[0]); // maximum 26 groups, 'A','B', ..., 'Z' foreach (var group in groupped) { // sort by string's length in ascending order var sortedStrings = group.OrderBy(x => x.Length); var trie = new Trie();
Why? What's wrong with inserting them all into one trie?
-
\$\begingroup\$ Excellent review. First, indentation, that is my mistake when I imported C# source code, I should have tried some ways to use Visual studio to adjust the indentation first to get 4 space for every line of code; I appreciated your review here. \$\endgroup\$Jianmin Chen– Jianmin Chen2017年02月15日 04:09:39 +00:00Commented Feb 15, 2017 at 4:09
-
\$\begingroup\$ Prefer interfaces to implementations. I agree with you. Use IDictionary instead of Dictionary. \$\endgroup\$Jianmin Chen– Jianmin Chen2017年02月15日 04:10:44 +00:00Commented Feb 15, 2017 at 4:10
-
\$\begingroup\$ I like the review advice #3, so I only need to set one API which is public, and also only takes two arguments, public Tuple<string, string> AddWordToTrie, one private API public Tuple<string, string> AddWordToTrieByOneCharATime. \$\endgroup\$Jianmin Chen– Jianmin Chen2017年02月15日 04:13:54 +00:00Commented Feb 15, 2017 at 4:13
-
\$\begingroup\$ Advice #4 Names, you are correct. member variable IsInDictionary is better to call IsWord, LastVisitedWord is better to call prefix, the longest one of prefixes will be the prefix neighbor. \$\endgroup\$Jianmin Chen– Jianmin Chen2017年02月15日 04:22:35 +00:00Commented Feb 15, 2017 at 4:22
-
\$\begingroup\$ Advice #5, Separation of responsibility, "the benefit calculation could be rolled into the Trie constructor to avoid recomputing it for each common prefix." Maybe, you have good insights here. I did not notice that there are some redundant work here to compute common prefix. \$\endgroup\$Jianmin Chen– Jianmin Chen2017年02月15日 04:26:43 +00:00Commented Feb 15, 2017 at 4:26
Explore related questions
See similar questions with these tags.