Spell check interview problem in Bash

Question 1

Thanks for taking the time to read this. I submitted a code screening for a company I really wanted to work for and spent a decent amount of time on their programming assignment. Pretty bummed when they just said thanks but no thanks.

Could you guys take a look and see why that might be? I was hoping to at least talk to an engineer. Thank you again.

DICTIONARY_INPUT="/usr/share/dict/words"
declare -A DICTIONARY
while read word; do
 word=${word,,}
 DICTIONARY[$word]=1
done < $DICTIONARY_INPUT
#for j in "${!DICTIONARY[@]}"; do
# echo "$j"
#done
while read -p"> " input_word; do
 #echo "You entered $input_word"
 input_word=${input_word,,}
 #echo "Lowercased: $input_word"
 found=false
 # If the input is a word in the dictionary, you're done!
 if [[ ${DICTIONARY[$input_word]} ]]; then
 echo "$input_word";
 found=true
 else
 # Else, use DUPLICATE as a queue and enqueue permutations of reduced
 #+words. Use ALL_DUPLICATE as a set which removes duplicate words.
 declare -a DUPLICATE=($input_word)
 declare -A ALL_DUPLICATE
 ALL_DUPLICATE=()
 ALL_DUPLICATE[$input_word]=$input_word
 while [ ${#DUPLICATE[@]} -gt 0 ]; do
 for (( i=0; i<${#DUPLICATE}-1; i++ )); do
 if [[ ${DUPLICATE:$i:1} == ${DUPLICATE:$i+1:1} ]]; then
 twochars=${DUPLICATE:$i:2}
 #echo $twochars
 removed_dup=${DUPLICATE:0:$i}${DUPLICATE:$((i+1))}
 #echo $removed_dup
 ALL_DUPLICATE[$removed_dup]=$removed_dup
 if [[ ${DICTIONARY[$removed_dup]} ]]; then
 echo "$removed_dup"
 found=true
 break 2
 else
 DUPLICATE=( "${DUPLICATE[@]}" $removed_dup)
 #echo "New size is ${#DUPLICATE[@]}"
 fi
 fi
 done
 DUPLICATE=(${DUPLICATE[@]:1})
 #echo "Removed size is ${#DUPLICATE[@]}"
 done
 #for j in "${!ALL_DUPLICATE[@]}"; do
 # echo "$j"
 #done
 # If we didn't find the word yet, go through the set of duplicates
 #+and stack words with changed vowels in VOWELS. 
 # Each stack in VOWELS belongs to a particular word in ALL_DUPLICATES.
 # Iterate through the length of the word since we don't need to check 
 #+earlier letters of newly created words. 
 if ! $found; then
 #echo "Found: $found"
 declare -a VOWELS
 for j in "${!ALL_DUPLICATE[@]}"; do
 VOWELS=( $j )
 num_letters=${#VOWELS}
 for (( i=0; i<$num_letters; i++ )); do
 for k in "${VOWELS[@]}"; do
 if [[ ${k:$i:1} == [aeiou] ]]; then
 change_a=${k:0:$i}a${k:$((i+1))}
 change_e=${k:0:$i}e${k:$((i+1))} 
 change_i=${k:0:$i}i${k:$((i+1))}
 change_o=${k:0:$i}o${k:$((i+1))}
 change_u=${k:0:$i}u${k:$((i+1))}
 #echo $change_a
 #echo $change_e
 #echo $change_i
 #echo $change_o
 #echo $change_u
 VOWELS=( "${VOWELS[@]}" $change_a )
 VOWELS=( "${VOWELS[@]}" $change_e )
 VOWELS=( "${VOWELS[@]}" $change_i )
 VOWELS=( "${VOWELS[@]}" $change_o )
 VOWELS=( "${VOWELS[@]}" $change_u )
 if [[ ${DICTIONARY[$change_a]} ]]; then
 echo "$change_a"
 found=true
 break 3
 elif [[ ${DICTIONARY[$change_e]} ]]; then
 echo "$change_e"
 found=true
 break 3
 elif [[ ${DICTIONARY[$change_i]} ]]; then
 echo "$change_i"
 found=true
 break 3
 elif [[ ${DICTIONARY[$change_o]} ]]; then
 echo "$change_o"
 found=true
 break 3
 elif [[ ${DICTIONARY[$change_u]} ]]; then
 echo "$change_u"
 found=true
 break 3
 fi
 fi
 done
 done
 VOWELS=($VOWELS[@]:1})
 done 
 fi 
 fi
 if ! $found; then
 echo "NO SUGGESTION"
 fi
done

https://github.com/mrlamroger/spellcheck

Question 2

What was the specification of the task?

Question 3

why bash? Did they ask for it in bash?

Question 4

@GarethRees The task was this "Write a program that reads a large list of English words (e.g. from /usr/share/dict/words on a unix system) into memory, and then reads words from stdin, and prints either the best spelling suggestion, or "NO SUGGESTION" if no suggestion can be found. The program should print ">" as a prompt before reading each word, and should loop until killed." More info can be found in the link about.

Question 5

@WinstonEwert I've been playing around with bash recently and wanted to learn more about it. In hindsight, Python would have been cleaner.

Question 6

For the purpose of screening job candidates, the quality of the code was not the deciding factor. Picking the wrong tool made your solution impractical, leading to immediate disqualification. Had you proposed a Bash solution that involved piping the words to aspell -a, then claimed that you had achieved a superior (but incompatible) product with less code, an enlightened interviewer might like your ingenuity.

Question 7

First, I'd start with

#!/usr/bin/env bash

to indicate the shell.

Next, I don't know how much bash you must use, but I think you can at least replace a serious portion of the code by using some other tools (i.e. more UNIX style).

E.g. check if a word is in the dictionary:

if grep "^${input_word}$" $DICTIONARY_INPUT; then
 found=true
fi

^${input_word}$ will match exactly that word (use the -i flag for case insensitivity). grep returns 1 if the regular expression is not found in the input file, else 0, which is why this if-expression works.

Then, a bit further you seem to try to replace all vowels with all other vowels; a oneliner for this could be

eval echo $(echo "$input_word" | sed 's/[aeiou]/{a,e,i,o,u}/g')

This makes use of a very useful feature in bash: bracket expansion. E.g. "a{b,c}d" expands to "ab cd". So here, we use a regular expression that replaces every vowel in a word with {a,e,i,o,u}. E.g. hello gets sed'd to h{a,e,i,o,u}ll{a,e,i,o,u}. Applying bracket expansion results in a lot of words, all the combinations of vowels.

The eval applies this bracket expansion; the echo is needed to pass the string to eval (else it would interpret it as a command).

So, in short, this can be replaced by

for word in $(eval echo $(echo $input_word | sed 's/[aeiou]/{a,e,i,o,u}/g')); do
 if grep "^${word}$" $DICTIONARY_INPUT; then
 found=true
 break
 fi
done

You also have another section of code that checks for duplicate letters (I first wrote previous section, so that's why the order differs with your code). It can also be replaced by something like

eval echo $(echo "$input_word" | sed 's/\([a-z]\)1円/{1,円1円1円}/g')

It works similarly to previous code, but now we use backreferences in sed to check for duplicate letters.

These results can be combined with the for loop above to get all your combinations. Or, at least I think so :)

I think this code is better because

it uses some specialized tools like sed and grep which is more UNIXy
it has less number of lines, and I think more clear (after some study) to understand and modify
I think performance should be better, as grep is quite fast at reading input and filtering (just a guess, not really tested it)

Question 8

Thanks @jerous! I'll implement changes tonight and compare performance.

jerous jerous 563 bronze badges · Accepted Answer · 2013-12-10 15:04:52Z

First, I'd start with

#!/usr/bin/env bash

to indicate the shell.

Next, I don't know how much bash you must use, but I think you can at least replace a serious portion of the code by using some other tools (i.e. more UNIX style).

E.g. check if a word is in the dictionary:

if grep "^${input_word}$" $DICTIONARY_INPUT; then
 found=true
fi

^${input_word}$ will match exactly that word (use the -i flag for case insensitivity). grep returns 1 if the regular expression is not found in the input file, else 0, which is why this if-expression works.

Then, a bit further you seem to try to replace all vowels with all other vowels; a oneliner for this could be

eval echo $(echo "$input_word" | sed 's/[aeiou]/{a,e,i,o,u}/g')

This makes use of a very useful feature in bash: bracket expansion. E.g. "a{b,c}d" expands to "ab cd". So here, we use a regular expression that replaces every vowel in a word with {a,e,i,o,u}. E.g. hello gets sed'd to h{a,e,i,o,u}ll{a,e,i,o,u}. Applying bracket expansion results in a lot of words, all the combinations of vowels.

The eval applies this bracket expansion; the echo is needed to pass the string to eval (else it would interpret it as a command).

So, in short, this can be replaced by

for word in $(eval echo $(echo $input_word | sed 's/[aeiou]/{a,e,i,o,u}/g')); do
 if grep "^${word}$" $DICTIONARY_INPUT; then
 found=true
 break
 fi
done

You also have another section of code that checks for duplicate letters (I first wrote previous section, so that's why the order differs with your code). It can also be replaced by something like

eval echo $(echo "$input_word" | sed 's/\([a-z]\)1円/{1,円1円1円}/g')

It works similarly to previous code, but now we use backreferences in sed to check for duplicate letters.

These results can be combined with the for loop above to get all your combinations. Or, at least I think so :)

I think this code is better because

it uses some specialized tools like sed and grep which is more UNIXy
it has less number of lines, and I think more clear (after some study) to understand and modify
I think performance should be better, as grep is quite fast at reading input and filtering (just a guess, not really tested it)

Thanks @jerous! I'll implement changes tonight and compare performance.

Stack Exchange Network

Spell check interview problem in Bash

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Spell check interview problem in Bash

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions