Revision 4f75ce1b-7ca5-4323-ab26-d75ae73dfa84 - Code Golf Stack Exchange

## Shell - Bytes: 2572, False +ves: 1, False -ves: 0 = Score: 2582

Code: 54 bytes, Dictionary: 2518 bytes

<!-- language-all: lang-bash -->

 [ "1ドル" = "${1#?*[A-Z]}" ]&&unlzma -c<z|grep -qix "1ドル"

If I were guaranteed to only get tests consisting of letters, I could strip out all quotes and reduce the code size to 48 (final score: 2576).

&nbsp;
### Examples:

In shell, an exit code of `0` is true (anything else is false). `$?` is the previous command's exit code.

 $ sh in997 colour; echo $?
 0
 $ sh in997 Colour; echo $?
 1
 $ sh in997 coLoUr; echo $?
 1
 $ sh in997 Color; echo $?
 1
 $ sh in997 I; echo $?
 0
 $ sh in997 i; echo $?
 0 # false positive

&nbsp;
### Ungolfed and explained:

 if [ "1ドル" = "${1#?*[A-Z]}" ]; then
 cat z | unlzma -c | grep -q -i -x "1ドル"
 else
 exit 1
 fi

The conditional compares the argument with itself after a string manipulation that alters the string if it matches any character followed by any number of characters followed by an uppercase letter (equivalent to `s/^..*?[A-Z]//` – ignore the `?` if you don't understand it), which would compare `coLoUr` to `oUr` and `Colour` to `Colour`. This controls for mixed case, though because it skips the first letter, it doesn't reject `i`.

If the conditional is false, evaluation is [short-circuited](https://en.wikipedia.org/wiki/Short-circuit_evaluation) and the program immediately exits FALSE. Otherwise, `unlzma` is run on the contents of the file named `z`, which puts the uncompressed dictionary into standard ouput. `grep` then queries for the argument quietly (`-q`), without regard to case—we've already controlled for case (`-i`), and on a whole line (`-x`).

The exit code of `grep` is TRUE when there was a match and FALSE otherwise. This being the last command in the script, it is also the exit code of the script.

There is one false positive: `i`. To resolve that, I had originally been clever and called the `grep` query with `"${1%i}"`, which strips the trailing `i` if present, since I noticed that only `taxi` ends in a lowercase `i`. Unfortunately, `tax` isn't in this dictionary, so I had to abandon this. I found that the penalty for the false positive is less than the shortest solution for removing it.

Here is a version without errors:

 [ "1ドル" != i -a "1ドル" = "${1#?*[A-Z]}" ]&&unlzma -c<z|grep -qix "1ドル"

It is 66 bytes. 66 + 2518 = 2584, two bytes greater than 54 + 2518 + 10 = 2582.

&nbsp;
### Dictionary creation:

Wow. There are some awesome dictionary compression schemes in other answers here. I didn't do anything that fancy. I just took the stock dictionary, capitalized just the first letter, and compressed the output. I chose [LZMA](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm) because it compressed the best.

Here's my code:

 wget -qqO- 'http://pastebin.com/raw.php?i=wh6yxrqp' \
 | perl -ne '
 s/\r//;
 ($a, $b) = /^(.)(.*)/;
 $a =~ y/[a-z]/[A-Z]/;
 print "$a$b\n"
 ' | lzma -c > z

`wget` flags `-qqO-` suppress all output and sends the HTML content to standard output. That's parsed by `perl`, whose first task is to remove the DOS line breaks (`\r`). Then I separate the first letter from the rest of the line, capitalize the first letter, and print the reconstructed word. That's grabbed by `lzma` which then outputs the compressed dictionary into a pipe to file `z`.

Note that `unlzma` will refuse to operate on a file lacking the `.lzma` extension, but it's fine with a pipeline. That means `unlzma z` fails and `unzlma<z` succeeds.