Problem: Write a function that takes a
String document
and aString[] keywords
and returns the smallest substring ofdocument
that contains all of the strings inkeywords
.Notes:
document
will contain at least 1 worddocument
will separate words by a single space- A word can only contain
[a-z]
and can appear multiple times in the documentkeywords
will be a distinct list of words- Matches must be perfect (ie
bat
andbatman
do not match)- If there are multiple shortest substrings, pick the first one
- keywords can appear in the substring in any order
- The substring length is counted in words, not characters
- Must be written in Java 7 and provide a class called
Solution
with a methodsolution
Examples
Input:
document: "a b c d a" keywords: ["a", "c", "d"]
Output:
"c d a"
Input:
document: "world there hello hello where world" keywords: ["hello", "world"]
Output:
"world there hello"
My attempt:
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
//Time Complexity: O(w + ws)
//Space Complexity: O(s)
//Where w is the number of words in the document, and s is the number of search terms
public class Solution {
private Map<String, Integer> snippetDataPoints = new HashMap<String, Integer>();
private String[] words, searchTerms;
private int shortestSnippetStart = 0, shortestSnippetEnd, currentSnippetStart = 0;
public static String solution(String document, String[] searchTerms) {
Solution solution = new Solution (document.split(" "), searchTerms);//document.split isnt the most efficient, but we are already over O(n), and this keeps it simple
return solution.solve();
}
private Solution(String[] words, String[] searchTerms){
this.words = words;
this.searchTerms = searchTerms;
shortestSnippetEnd=words.length;
}
private String solve(){
for(int i=0;i<words.length;i++){
if(searchTermsContains(words[i])){
addToSnippet(words[i], i);
}
}
StringBuilder snippet = new StringBuilder();
for(int i = shortestSnippetStart; i<=shortestSnippetEnd; i++){
snippet.append(words[i] + " ");
}
snippet.deleteCharAt(snippet.length()-1);
return snippet.toString();
}
private void addToSnippet(String word, int position) {
Integer previousPosition = snippetDataPoints.put(word, position);
if(previousPosition == null || previousPosition <= currentSnippetStart){
currentSnippetStart = Collections.min(snippetDataPoints.values());
}
if(snippetDataPoints.size() == searchTerms.length){
determineShortestSnippet(position);
}
}
private void determineShortestSnippet(int currentPositionEnd) {
if(shortestSnippetEnd - shortestSnippetStart > currentPositionEnd - currentSnippetStart ){
shortestSnippetStart = currentSnippetStart;
shortestSnippetEnd = currentPositionEnd;
}
}
private boolean searchTermsContains(String word) {
for(String searchTerm : searchTerms){
if(searchTerm.equals(word)) return true;
}
return false;
}
}
My Analysis:
This was written with a time constraint, so it manages to keep simplicity without losing too much efficiency. The document.split()
is unnecessary - I could have done the parsing on the fly, but it would have been more complicated. Similarly, I could have created a better implementation of searchTermsContains
, perhaps using a tree of character nodes to minimize search time. However, it would be really great if I could significantly reduce the usage of searchTermsContains
altogether since its adding a ws
to my time complexity, but I cant think of anyway.
I also could have used a more efficient data structure for the snippetDataPoints
. I think a parallel array would have worked just as well and used significantly less memory (though still technically O(s)
?). And finally, a lot of my code runs under the assumption that all input is perfectly valid - something Im willing to accept given time constraints and the nature of the problem.
-
\$\begingroup\$ Was this, by any chance, an interview-question, like this one? \$\endgroup\$200_success– 200_success2015年01月14日 06:32:56 +00:00Commented Jan 14, 2015 at 6:32
-
\$\begingroup\$ @200_success not an interview question, but basically just a java variant of the problem \$\endgroup\$David says Reinstate Monica– David says Reinstate Monica2015年01月14日 13:12:24 +00:00Commented Jan 14, 2015 at 13:12
1 Answer 1
Problems / Bugs
The following calls will all throw an
ArrayIndexOutOfBoundsException
String document = "a b c d"; System.out.println(Solution.solution(document, new String[]{"g"})); System.out.println(Solution.solution(document, new String[]{"g", "a", "s", "s"})); System.out.println(Solution.solution(document, new String[]{"a", "b" ,"c", "d"}));
because of
for (int i = shortestSnippetStart; i <= shortestSnippetEnd; i++) {
Assuming you have fixed the above by changing to
for (int i = shortestSnippetStart; i < shortestSnippetEnd; i++) {
The above calls will return
a b c d
a b c d
a b cThe following will throw an exception if
snippet.length==0
snippet.deleteCharAt(snippet.length()-1);
General
you shouldn't do string concatenation inside
StringBuilder.append()
method. Theappend()
method returns aStringBuilder
so you should just stack the calls likesnippet.append(words[i]) .append(" ");
you shouldn't declare multiple variables in a single line, because this removes readability of the code.
give your conditions/variables the possibility to breathe
for(int i = shortestSnippetStart; i<=shortestSnippetEnd; i++){
should be
for(int i = shortestSnippetStart; i <= shortestSnippetEnd; i++) {
-
\$\begingroup\$ Sorry, I should have been more clear by adding it in the notes, but my last line about perfect inputs implies that all keywords will be in the doc, and keywords and doc have at least size 1 \$\endgroup\$David says Reinstate Monica– David says Reinstate Monica2015年01月14日 17:21:55 +00:00Commented Jan 14, 2015 at 17:21