Hackerrank: Prefix neighbors

Question 1

Mark has a dictionary, S, containing n distinct strings. He defines the benefit value of a string as the sum of the ASCII values of its characters.

Mark calls some string A and some string B prefix neighbors if both of the following conditions are satisfied:

String A is a prefix of string B.

No other string C exists in the dictionary that is both smaller in length than string B and has string A as a prefix.

For example, in S = {"AA", "AB", "ABD", "ABDE"}, the strings AB and ABD are prefix neighbors because AB is a prefix of ABD and no other string of length < 3 in S shares that prefix. The strings ABD and ABDE are also prefix neighbors, but AB and ABDE are not prefix neighbors because string C exists (i.e., ABD).

Mark wants to select a subset of strings from S such that the following conditions are both satisfied:

No two strings in the subset are prefix neighbors.

The sum of the benefit value of the selected strings is maximal. Given S, help Mark by finding and printing the maximum benefit value he can obtain from a subset of non-prefix-neighbors in S.

Input Format

The first line contains an integer denoting n (the number of strings in the dictionary).

The second line contains n space-separated strings.

Constraints

1 < n <= 4 * \10ドル^5\$

1 <= \$S_i\$'s length <= 11

Each string contains only uppercase letters.

Output Format

Print the maximum benefit value that Mark can obtain from a subset of non-prefix-neighbors in S.

Sample Input 0

3

A B AE

Sample Output 0

200

Explanation 0

{"A", "B", "AE"}

Strings A and AE are prefix neighbors, so they cannot both be in Mark's subset of S. String B has no prefix neighbor, so we include it in Mark's subset.

To maximize the benefit value, we choose AE and B for our subset. We then calculate the following benefit values for the chosen subset:

Benefit value of AE = 65 + 69 = 134

Benefit value of B = 66

We then calculate and print the total benefit value of our subset, which is 134 + 66 = 200.

My introduction of the algorithm

This is the algorithm to apply trie knowledge and also use maximum weighted independent set knowledge. And the algorithm is one of medium level in February 2017 RookieRank 2 contest.

In the contest, I implemented the trie algorithm but failed test case 10 afterwards.

So, after the contest, I studied the algorithm and one of submissions, write C# code to ask for review. Most of my time is to learn how to implement a trie, and also help myself understand prefix neighbor. The return statement in the function AddWordToTrieByOneCharATime is so confusing, for the test case of string dictionary {"A","ABC"}, the stack of return statements and when the prefix neighbor is set, definitely need better solution in my opinion.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
/*
 * Problem statement:
 * https://www.hackerrank.com/contests/rookierank-2/challenges/prefix-neighbors
 * 
 * Solution notes:
 * This problem can be solved using Trie and DP. Create a trie from the 
 * given set of strings. 
 * Find the prefix neighbor of each string. Now create a graph such that 
 * each string is a node and there exist a bidirectional edge between two 
 * nodes only if they are prefix neighbors. Now find the maximum weighted
 * independent set.
 * 
 * time complexity: O(N*max_length_of_string)
 * required knowledge: Trie
 * 
 * Maximum weighted independent set:
 * https://en.wikipedia.org/wiki/Independent_set_(graph_theory)
 */
class Solution
{
 /*
 * code review: 
 * How to set up a Trie with 26 children? 
 * n distinct strings are stored in a dictionary, using a data structure Trie. 
 * If the string is in the dictionary, then we mark the string in the Trie as 
 * IsInDictionary, WordInDictionary. 
 * @IsInDictionary - the word is in the original strings
 * @WordInDictionary - any string in the orginal strings 
 */
 internal class Trie
 {
 public Dictionary<char, Trie> Children { get; set; }
 public bool IsInDictionary { get; set; }
 public string LastVisitedWord { get; set; }
 public Trie()
 {
 Children = new Dictionary<char, Trie>();
 IsInDictionary = false;
 LastVisitedWord = "";
 }
 /*
 * AddWordToTrieByOneCharATime
 * function is designed to complete the following tasks:
 * Add one word in the dictionary to Trie using recursive function
 * Add one char a time by scanning a word from left to right. 
 * 
 * Tricky part is to set prefix neighbor in the process. 
 * 
 * @index
 * @charArray
 * @word - 
 * @neighbor - prefix neighbor, it is empty string if there is no prefix neighbor
 * 
 * function return: 
 * Tuple<string, string> - string and its prefix neighbor
 */
 public Tuple<string, string> AddWordToTrieByOneCharATime(
 int scanIndex, 
 char[] charArray, 
 string word, 
 string neighbour)
 {
 bool isEndOfString = scanIndex == charArray.Length;
 if (isEndOfString)
 {
 IsInDictionary = true;
 LastVisitedWord = word;
 return new Tuple<string, string>(LastVisitedWord, neighbour);
 }
 char visiting = charArray[scanIndex];
 if (!Children.ContainsKey(visiting))
 {
 Children[visiting] = new Trie();
 }
 // update neighbor string - if IsInDictionary is true, then it is 
 // to set as a prefix neighbor
 return Children[visiting].AddWordToTrieByOneCharATime(
 scanIndex + 1, 
 charArray, 
 word, 
 IsInDictionary ? LastVisitedWord : neighbour); 
 }
 }
 /*
 * study LINQ - GroupBy
 * https://msdn.microsoft.com/en-us/library/bb534304(v=vs.110).aspx
 * 
 * 1. Group string by first char, 26 variations from 'A', 'B', ..., 'Z'
 * 2. For each group, sort strings by string's length in ascending order
 * 3. For example, group of strings starting from char 'A', 
 * "A","AB","ACD"
 * 4. benefit value is to add all chars' ascii value. 
 */
 static long Process(string[] dict)
 {
 var benefitValue = 0L; 
 var groupped = dict.GroupBy(x => x[0]);
 // maximum 26 groups, 'A','B', ..., 'Z'
 foreach (var group in groupped) 
 {
 // sort by string's length in ascending order
 var sortedStrings = group.OrderBy(x => x.Length);
 var trie = new Trie(); 
 var banned = new HashSet<string>();
 var stack = new Stack<Tuple<string, string>>();
 foreach (var word in sortedStrings)
 {
 stack.Push(trie.AddWordToTrieByOneCharATime(0, word.ToCharArray(), word, ""));
 }
 // Enumerate the stack, the longest string will be iterated first. 
 // Maximum independent set is kind of greedy as well. 
 foreach (var tuple in stack)
 {
 if (!banned.Contains(tuple.Item1))
 {
 benefitValue += tuple.Item1.ToCharArray().Aggregate(0L, (val, next) => val + (long)next);
 banned.Add(tuple.Item2);
 }
 }
 }
 return benefitValue;
 }
 static void Main(String[] args)
 {
 //ProcessInput();
 RunSampleTestcase2(); 
 }
 /*
 * Trie
 * A B
 * AB
 * ABC
 * ABCD 
 * 
 * Things to learn through the debug:
 * How many tries are used in the coding? 
 * Two tries, one is for A started strings: {"A","ABCD"}
 * second one is for B started strings: {"B"}
 */
 static void RunSampleTestcase()
 {
 string[] dict = new string[] { "A", "ABCD", "B" };
 var points = Process(dict);
 Console.WriteLine(points);
 }
 /*
 * input string[] = new string[]{"A","ABC","AC"}
 * Trie
 * A 
 * AB AC
 * ABC
 * 
 * Try to debug the code and figure out how to find prefix neighbor of "ABC". 
 * How many return calls for "ABC" in the function: 
 * AddWordToTrieByOneCharATime
 */
 static void RunSampleTestcase2()
 {
 string[] dictionary = new string[] { "A", "ABC", "AC" };
 var points = Process(dictionary);
 Console.WriteLine(points);
 }
 static void ProcessInput()
 {
 var n = Convert.ToInt32(Console.ReadLine());
 var dict = Console.ReadLine().Split(' ');
 var points = Process(dict);
 Console.WriteLine(points);
 }
}

Question 2

Indentation

Consistent indentation helps readability. The indentation of this code seems to mix tabstops of 1, 2, 3, and 4 spaces.

Prefer interfaces to implementations

public Dictionary<char, Trie> Children { get; set; }

Is there any good reason why this should not be an IDictionary<char, Trie>?

Try to expose an API rather than the entire implementation

public Dictionary<char, Trie> Children { get; set; }
public bool IsInDictionary { get; set; }
public string LastVisitedWord { get; set; }

Looking past the syntactic sugar, that's six public methods (three getters and three setters). How many of them should really be public? I think that at minimum the setters should all be private, and I see no good reason to expose LastVisitedWord at all or to allow an external class to get Children and then mutate it. I would prefer to publicly expose just the equivalent (see below...) of

public bool IsInDictionary { get; }
public IEnumerable<KeyValuePair<char, Trie>> Children { get; }

public Tuple<string, string> AddWordToTrieByOneCharATime(
 int scanIndex, 
 char[] charArray, 
 string word, 
 string neighbour)

exposes far more than is necessary. If you make that private then a public method AddWord(string) can call it with the appropriate initial values.

Names

public bool IsInDictionary { get; set; }
public string LastVisitedWord { get; set; }

The first of these names makes sense in the right context, but when you're reading the code you see it in the context of Dictionary<char, Trie> Children { get; set; }. How about IsWord?

The second makes sense only in the middle of a call to AddWordToTrieByOneCharATime. I think it should be LongestPrefix or something similar.

Separation of responsibilities

If the trie is intended for general use, I think that the way finding the prefix is integrated into it violates the principle of separation of responsibilities. If it's intended to be single-purpose and separation of responsibilities is intentionally sacrificed for better performance, it doesn't go far enough: the benefit calculation could be rolled into the Trie constructor to avoid recomputing it for each common prefix.

I think that you should refactor to separate trie building from prefix graph building for one simple reason: the documentation states that the complexity is O(N*max_length_of_string), but that's not true because the merging of responsibilities in trie building means that the strings have to be sorted before insertion, giving Theta(N lg N).

KISS

var groupped = dict.GroupBy(x => x[0]);
// maximum 26 groups, 'A','B', ..., 'Z'
foreach (var group in groupped) 
{
 // sort by string's length in ascending order
 var sortedStrings = group.OrderBy(x => x.Length);
 var trie = new Trie();

Why? What's wrong with inserting them all into one trie?

Question 3

Excellent review. First, indentation, that is my mistake when I imported C# source code, I should have tried some ways to use Visual studio to adjust the indentation first to get 4 space for every line of code; I appreciated your review here.

Question 4

Prefer interfaces to implementations. I agree with you. Use IDictionary instead of Dictionary.

Question 5

I like the review advice #3, so I only need to set one API which is public, and also only takes two arguments, public Tuple<string, string> AddWordToTrie, one private API public Tuple<string, string> AddWordToTrieByOneCharATime.

Question 6

Advice #4 Names, you are correct. member variable IsInDictionary is better to call IsWord, LastVisitedWord is better to call prefix, the longest one of prefixes will be the prefix neighbor.

Question 7

Advice #5, Separation of responsibility, "the benefit calculation could be rolled into the Trie constructor to avoid recomputing it for each common prefix." Maybe, you have good insights here. I did not notice that there are some redundant work here to compute common prefix.

Peter Taylor Peter Taylor 24.5k1 gold badge49 silver badges94 bronze badges · Accepted Answer · 2017-02-14 15:32:50Z

Indentation

Consistent indentation helps readability. The indentation of this code seems to mix tabstops of 1, 2, 3, and 4 spaces.

Prefer interfaces to implementations

public Dictionary<char, Trie> Children { get; set; }

Is there any good reason why this should not be an IDictionary<char, Trie>?

Try to expose an API rather than the entire implementation

public Dictionary<char, Trie> Children { get; set; }
public bool IsInDictionary { get; set; }
public string LastVisitedWord { get; set; }

Looking past the syntactic sugar, that's six public methods (three getters and three setters). How many of them should really be public? I think that at minimum the setters should all be private, and I see no good reason to expose LastVisitedWord at all or to allow an external class to get Children and then mutate it. I would prefer to publicly expose just the equivalent (see below...) of

public bool IsInDictionary { get; }
public IEnumerable<KeyValuePair<char, Trie>> Children { get; }

public Tuple<string, string> AddWordToTrieByOneCharATime(
 int scanIndex, 
 char[] charArray, 
 string word, 
 string neighbour)

exposes far more than is necessary. If you make that private then a public method AddWord(string) can call it with the appropriate initial values.

Names

public bool IsInDictionary { get; set; }
public string LastVisitedWord { get; set; }

The first of these names makes sense in the right context, but when you're reading the code you see it in the context of Dictionary<char, Trie> Children { get; set; }. How about IsWord?

The second makes sense only in the middle of a call to AddWordToTrieByOneCharATime. I think it should be LongestPrefix or something similar.

Separation of responsibilities

If the trie is intended for general use, I think that the way finding the prefix is integrated into it violates the principle of separation of responsibilities. If it's intended to be single-purpose and separation of responsibilities is intentionally sacrificed for better performance, it doesn't go far enough: the benefit calculation could be rolled into the Trie constructor to avoid recomputing it for each common prefix.

I think that you should refactor to separate trie building from prefix graph building for one simple reason: the documentation states that the complexity is O(N*max_length_of_string), but that's not true because the merging of responsibilities in trie building means that the strings have to be sorted before insertion, giving Theta(N lg N).

KISS

var groupped = dict.GroupBy(x => x[0]);
// maximum 26 groups, 'A','B', ..., 'Z'
foreach (var group in groupped) 
{
 // sort by string's length in ascending order
 var sortedStrings = group.OrderBy(x => x.Length);
 var trie = new Trie();

Why? What's wrong with inserting them all into one trie?

Excellent review. First, indentation, that is my mistake when I imported C# source code, I should have tried some ways to use Visual studio to adjust the indentation first to get 4 space for every line of code; I appreciated your review here.
Prefer interfaces to implementations. I agree with you. Use IDictionary instead of Dictionary.
I like the review advice #3, so I only need to set one API which is public, and also only takes two arguments, public Tuple<string, string> AddWordToTrie, one private API public Tuple<string, string> AddWordToTrieByOneCharATime.
Advice #4 Names, you are correct. member variable IsInDictionary is better to call IsWord, LastVisitedWord is better to call prefix, the longest one of prefixes will be the prefix neighbor.
Advice #5, Separation of responsibility, "the benefit calculation could be rolled into the Trie constructor to avoid recomputing it for each common prefix." Maybe, you have good insights here. I did not notice that there are some redundant work here to compute common prefix.

Stack Exchange Network

Hackerrank: Prefix neighbors

1 Answer 1

Indentation

Prefer interfaces to implementations

Try to expose an API rather than the entire implementation

Names

Separation of responsibilities

KISS

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Hackerrank: Prefix neighbors

1 Answer 1

Indentation

Prefer interfaces to implementations

Try to expose an API rather than the entire implementation

Names

Separation of responsibilities

KISS

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions