Here is a list of 1000 basic English words. Yes, I'm aware that there are only 997 words in the list! It was taken from Wiktionary so blame them!
Your task is to write a program that accepts a word and returns TRUE or FALSE (or 0 or 1) depending on whether the word is in the list or not. You must use the spelling as it appears in the list, so it is colour, not color (sorry Americans!). As far as I am aware, there is only one word in the list that uses a capital letter: I. This means that i (Roman numeral for 1) should return FALSE. All other words may be spelt with an initial capital letter, i.e. Colour should return TRUE.
Rules for letter cases: Your program should return TRUE for any word that has just the first letter capitalised. If the entire word is capitalised, e.g. COLOUR it's your choice whether it is correct or not, as long as it is found in the word list in its lowercase form. If you accept COLOUR you must accept all other words entirely in capitals. If you don't accept COLOUR you must not accept any other word entirely in capitals. Any other mixture of lower and upper case e.g. coLoUR should return FALSE. As stated above only I is correct, not i. Briefly, both lowercase colour and titlecase Colour must be TRUE, uppercase COLOUR may be either TRUE or FALSE and any other mixed case must be FALSE.
Standard rules apply, you may not use external libraries or spelling libraries built into your language, etc. If you use any external files, such as reading the word list via the internet, you must include that number of bytes in your score. Reading the uncompressed word list will add 5942 to your score.
Your code must be at least 95% accurate, including both false positives and false negatives. So if you have 30 false positives and 20 false negatives there are 50 errors in total which is acceptable (even though there are only 997 words, 94.98% accurate if you're being pedantic). One more false positive or false negative invalidates your code. You must post all the errors for verification. Don't make people guess where the errors are!
Shortest code in bytes wins. Add 10 points for each error. Please post a non-golfed version of your code as well so that other people can try to break it / verify that it works. If you can provide a link to a working online version that would be great! The list is 5942 bytes long so ideally you want your code to be shorter than that.
Start your post with the following:
<Language> - Bytes: <num>, False +ves: <num>, False -ves: <num> = Score: <num>
Score = bytes + 10 ×ばつ (false_pos + false_neg)
I might provide a list of false positives to check against as people post answers.
7 Answers 7
Python - Bytes: 4411, False +ves: 0, False -ves: 0 = Score: 4411
import base64,zlib,sys;T=eval(zlib.decompress(base64.b64decode("eNqdXEF24zYMPY43XHTdG/QYskjbGlGUSlFONH25e8cehySAD8rtLPIyji2JIPDx8QH6n9Nfpz//OZ3+/OPLnLrXr+bUP36Ljx/z48eafzze+OufOaX83uHx2/3x4/nb8w97ea85ueqDv/57zpfd8vuffzW/r+LIfezjf7Z6z/Xxe5dvF+o7xfx5Bx/A/L7v60qXvMhyOSsX6fLb8usDfcmcpmyNsrBAV/f8oH/89yPfcpd2LUvo8vPs4CLVcgJZ8MtC+YIhP5plb4lip+b8yDe48l9vWfMKgF32fK+ULTqJ7bjRzbvSG4Al/f7D8vjvQs0T6+t66T1rfqCx2q2Yn+v5p53/qXg0vZrLH+mqjyS2HWXzLV1AcfctP0CQe5XqNSVq0ed9n8E50mszn8pX7PPrI13Py1y+eoF5CjDnUpa68svF/GnmV7XdyjUBpFyyabyEAGbs18pucn9itiEECWrmEv2+tvqGwoNFfq/YSOLfjT5MjRvCiiFf2nHMcxKCerqx1Jl87YI1sAS2phJOC9+0Vd54zgvpRWT3Gdp9vmYSofx9kRUEefkcdxBTwclZoDtFA1Otsc4qUa6EBdNr7y0AB7RdqXbC2mZ8rR1PdU9nv6B89b3SwPPLlK1Lkt65rLpKAgu1ihPureW24tIJ7M8lPwOLszMMhy/gigQnFQDvQe6cyU4ZCV2WcpABmauEqEIhyiub3JdI/VvNTVEmocShMuqLraEYXK8YbhVbSzKIARimh0OkoXerL7uzvQPovdUAUrO84oQW7QNx4w6G5JknFAvIjpbQRg63I3VwyVJZWKPUXyHNRh0OJ7EvBkmOJwKG03P+MdHNu2KvZWFJvbbkn7vwlwGSFW7L165/CCwsnNnpJUL+HUKOqdgkTE+D9B8BvBbSH0oYREKteRAkVoZjgKke0wriLPz2g9nPyvSDgoNaznLnH7ibz/y+AXKz4lhkHy3elKCGw+v/nt9EuOvLAPWjVjjyyZ8PeWhNiG4yRdpcPxCnvua3J7GkK6YOY3aTC4VBloSgCzrA8yWGHhVXL3MFiaplx60CyiutxoNOs79AvRMU1xMcvgObUTP/xP/kufMGjsY7M6gWAVu+dXeY4bMPXHXChRjGcLhD0LO++dmC5I9vI6wQSZZ8vU6TDHparBTmXZb7MuuPbA4pv0ziZUjHrjLQWJ2ECpQvkNIeL/2dr9vJqm6lG7qAS0qmUksMhWFYaXNVRQggUq6o4q5ipd62TxoHiZP0ctU3/PRFHL2mdjh4n5D9CQsVNKmhArinTNnJ5OSoQ4MSu1xjRQ+QKBkpKpCXWbr4Y8cgIcLqjEWhlKneKKQjVSFXASBMg3tdoAesR6qPnWoWR8psQzjgQimQleEyMrCMKOJp5Pj8lkXlVczNDgqnXeTUoKClYwmqohBFKbirW1gQu4BeYdZJgx+HkbTAwjP6P3ji2/Qq1lTUGhDIAo4fnKZhOjNiDjLx3Li2SrBF8NQeljsiq+4yozEWCsSkCSofUmVPTIJxqsomeeREkwzZOGzNi+B4qn5kAGEv9VzCxeVMtaBShyLVygoswM+BfcJShaclwn/VJLX5RuZgCTNxJ8rJgosL208rCjfigrIo7SQOU3YnypmJUt4dP9j6BsXyrVgolGLiKOAbMrzaGEFNCVP5tFf6CRYrWlXteG91oZRaxxNAAWLZgIuUapO4yKJJRsKyDjdH7pBKiSI8oUaWhD/h0Bs3/QcubGdFCP6USdlIAuGy7a7C31exC1eVpBmgQFXZfWotzHNSMlPR70x9zqsVrBeINWAKYhBrq4tTqaStvBQsCtpPblSrqV0z8+IgCSlrqkVFtG1WdYGuA1/o66BnFgANGyAJtZQnEristn/RpHaF2wTQvT9go3dQB5BdBAZfVQYsWSmr3GEXiERP8bVO66JG3AUp7r++2XubaF0Sm/rnBebrmrgvrHQBJErV+ShmKiqthcXMDUwhIEddcfZVdG0jxzKahIGDSgk3FBGaiuIZ4yi1i1Ala5BeQQOM6nDnY48STesEdzwq9fIVd4xfPn7OPwibmSBhKZsN+vkoP5PSsd7uhYZ3h7oSSh9pg4Qq6r7rVWI40qfoKRIHwKS9GmwG6LfcMRfhMAOV/sinhYKtq/jPD/3IkKb0wFcoCy3ixh6KIXV/PfAWGPEbpsUOmuAlLoUHQRxMajv3MSDLICojR4tSq+e864i0tlJ7AlwGNNpEdTdQASug5FQGDTTG0BJLai18BpgzMMpJq8S6RT1qIuaG87bli9WGeTzqp+vaz0xzkGgceIxsgEHM/KOBrnPXQxz2ZnFp55hDqANnG01Y9TO1xohkB4i6yB1jquierEDVZ63xHRZKloGfVR5k4pM9gr9gA17xAq7C/sQHDcK0n2JbhxxdkdbNO6iqK+SXle6Fo9cIYT1gJFtp6QRIWwEuEENKWdfBCLnDQkDyyyt7OR2lwyS7NgRPjIxoMVmIJI0NN8Vxmmx0oaWOc23NwFWo/qHOwcBh4Zmm5E6u+pwbZdrojYSgC7BUwG9QOIEm3vhWVR9kcWCAKRdaK65qSceqOpKqS74NKPQb6LKxcRhWh+FC8RA8P/TYAYm4hBuccVDl6wGXkkr9r5XzrWkD1x5UhGApIssBKQbMC77ZuGbj95FLHDD6CpIsoGMrrnHH0tunMNJGRzoRCM1U1/K4OFYm4VwLZ9S5BSn4YvKgxDsvgP4GJcPY4qiKGLMek05UUpUdu3ILDMrUrAGuXI2Vyp2F9LU284jUA0Vi39UeEmLXx7MqTiSvRUnnTEIbZMuqOegZaR6YRcNkxYDTnkphmpoHmFSQexbj2LxnrKKbwwa7IvcYhcZWNOg5u1qkWHFTBoN5078/6DUaHgm93ETYbCKusVLWapX6Dsk5vTpGSuBIbnTAfRqcV9JRG5p2dRbaHu+0szvDMYqU92lT3aArkWhXZpZ4qEUMXv5dP0GgBtLPd45TUDxdD4CJVAA75p9Bm1Y/g/H4XZm2nyTd2pSnn/TRGxWhU4vz3ttUjY1GgJjDsh3sMX4/4iBUQ9EySuKvHSt/LjCjiW6yULtSqxwWddoOB4BWGugRTFq65sTcjU56+2Y/uVcpCj14E5RZLtHpPlRaWbEDOzgR9zbv7dEPQ6P2k++fYEi49RGUsRI2QNBJ2FlaklhjyWCgtVmY9FCboEK2fm5RU0sTGKIW2Ed6X8uhRhJZ68Wp1jeovhrQWFvSR9KJV/KZuMS99CIIHH+a1NCpExzvsK0TNgnL/3UvQ1xabQ1zBJnU9nwNX67RMCCmGBUnHeCBPtkhkScC2GzUwnFm1CewGN7ic8CrPJ3y308DYxG52nbZgtiQn930TJBfkiZik81RG+BVipCorhnPneKzCjQdASB0PLMdYD6lmEmXUBU1u6buJTtqJ6KyHLeC8QaYbX406yTbml+84iPjw3uHGA+GBfQhECHPRfDgBXtjY6aaNbeTOi4t2veiAdBg/8glwIQqKD0EAfO47K6edAcDTzT1TNqpE5xT2DEFIGNu1Hu0UZ8aLZNYBxCCNtpvt6Ijtx/VGTnXDHwWuPUoUZtn6DRXwu1d0hmdlYLWceC5vUMY/t+BLNEns7Ap7Q+VVNf6RgdsQ9KZYac6LD5a05Yv74rUJGgLmlRSRngRGcUQY6omemu0taMR0Zo1Y8ETQN8oaMP/37Cv9dZtcz7BoNlNfExjRPMTBn8/Bzy+yAnDis8zoenImt8qCVDMHdbP7nFfnx2GQpnT4s0awP0TOOroBN113Izy2XzjaP5nvrv0ZvRFHqYWuKbDLnV8Q6vylPneaOZDh84G2krDZ7cac6ADEAMGVrhYONngW+e3Ex5ywUVzhG1+Vcllo1VsfBTPdxM3P/5yk0gDuXFonddPjROKpGDh5u1w59m9ccRiok+HO4v07l6tLUiw4/FHWrluDc2nw/N2ATMh/F0ifCTfS+BrzvU7VSOrN+sd+TiIhWhNtR2eE62/+0hgKZ51G+GcT+21NySDEdtOPHB7OPAetTkeNuY2giOB5uCruByUyWyznj6ev+3pMD5QFpqny2pIsvC8llOHHjRIBUOvSLdIre9fUahqPPhGH/Q9dSoqEyi+tSYsU2NERITZoahEv9aOqdEJy4+x8Z0q8jgbPJpX14rIP2PjdMGsOShKv5L0eq2nzQ+D7vDw7ztfW9T6pit6MryDiQ+NVmlfU8KCpzgdJbwGWuwnONNMzFyXc/8CxiN8aA==")));t=T;f=1
for c in sys.argv[1]:
C=c.lower()
if c in t:t=t[c]
elif f and C in t:f=0;t=t[C]
else:t={}
print""in t
This is a python program that expects one command line argument, the word. From the list of words I constructed a trie, zlib'ed the argument and finally encoded it with base64. If the word is in the trie, the program prints True, otherwise it prints False. I wanted to create a spell checker that has 100% accuracy.
I could substantially decrease the number of characters, by not encoding the zlib'ed string (specifically, zlib'ed string is 1113 characters shorter than base 64 encoded), but I wanted to have a code that you can copy/paste directly.
Examples
$python check.py colour
True
$python check.py Colour
True
$python check.py coLoUr
False
$python check.py color
False
$python check.py Color
False
$python check.py I
True
$python check.py i
False
The trie is traversed with characters of the word, each time getting a new trie from the key of current character. If the word is in the trie, then a "" must be present in the final trie.
Ungolfed version
import base64, zlib, sys
TRIE = eval(zlib.decompress(base64.b64decode("eNqdXEF24zYMPY43XHTdG/QYskjbGlGUSlFONH25e8cehySAD8rtLPIyji2JIPDx8QH6n9Nfpz//OZ3+/OPLnLrXr+bUP36Ljx/z48eafzze+OufOaX83uHx2/3x4/nb8w97ea85ueqDv/57zpfd8vuffzW/r+LIfezjf7Z6z/Xxe5dvF+o7xfx5Bx/A/L7v60qXvMhyOSsX6fLb8usDfcmcpmyNsrBAV/f8oH/89yPfcpd2LUvo8vPs4CLVcgJZ8MtC+YIhP5plb4lip+b8yDe48l9vWfMKgF32fK+ULTqJ7bjRzbvSG4Al/f7D8vjvQs0T6+t66T1rfqCx2q2Yn+v5p53/qXg0vZrLH+mqjyS2HWXzLV1AcfctP0CQe5XqNSVq0ed9n8E50mszn8pX7PPrI13Py1y+eoF5CjDnUpa68svF/GnmV7XdyjUBpFyyabyEAGbs18pucn9itiEECWrmEv2+tvqGwoNFfq/YSOLfjT5MjRvCiiFf2nHMcxKCerqx1Jl87YI1sAS2phJOC9+0Vd54zgvpRWT3Gdp9vmYSofx9kRUEefkcdxBTwclZoDtFA1Otsc4qUa6EBdNr7y0AB7RdqXbC2mZ8rR1PdU9nv6B89b3SwPPLlK1Lkt65rLpKAgu1ihPureW24tIJ7M8lPwOLszMMhy/gigQnFQDvQe6cyU4ZCV2WcpABmauEqEIhyiub3JdI/VvNTVEmocShMuqLraEYXK8YbhVbSzKIARimh0OkoXerL7uzvQPovdUAUrO84oQW7QNx4w6G5JknFAvIjpbQRg63I3VwyVJZWKPUXyHNRh0OJ7EvBkmOJwKG03P+MdHNu2KvZWFJvbbkn7vwlwGSFW7L165/CCwsnNnpJUL+HUKOqdgkTE+D9B8BvBbSH0oYREKteRAkVoZjgKke0wriLPz2g9nPyvSDgoNaznLnH7ibz/y+AXKz4lhkHy3elKCGw+v/nt9EuOvLAPWjVjjyyZ8PeWhNiG4yRdpcPxCnvua3J7GkK6YOY3aTC4VBloSgCzrA8yWGHhVXL3MFiaplx60CyiutxoNOs79AvRMU1xMcvgObUTP/xP/kufMGjsY7M6gWAVu+dXeY4bMPXHXChRjGcLhD0LO++dmC5I9vI6wQSZZ8vU6TDHparBTmXZb7MuuPbA4pv0ziZUjHrjLQWJ2ECpQvkNIeL/2dr9vJqm6lG7qAS0qmUksMhWFYaXNVRQggUq6o4q5ipd62TxoHiZP0ctU3/PRFHL2mdjh4n5D9CQsVNKmhArinTNnJ5OSoQ4MSu1xjRQ+QKBkpKpCXWbr4Y8cgIcLqjEWhlKneKKQjVSFXASBMg3tdoAesR6qPnWoWR8psQzjgQimQleEyMrCMKOJp5Pj8lkXlVczNDgqnXeTUoKClYwmqohBFKbirW1gQu4BeYdZJgx+HkbTAwjP6P3ji2/Qq1lTUGhDIAo4fnKZhOjNiDjLx3Li2SrBF8NQeljsiq+4yozEWCsSkCSofUmVPTIJxqsomeeREkwzZOGzNi+B4qn5kAGEv9VzCxeVMtaBShyLVygoswM+BfcJShaclwn/VJLX5RuZgCTNxJ8rJgosL208rCjfigrIo7SQOU3YnypmJUt4dP9j6BsXyrVgolGLiKOAbMrzaGEFNCVP5tFf6CRYrWlXteG91oZRaxxNAAWLZgIuUapO4yKJJRsKyDjdH7pBKiSI8oUaWhD/h0Bs3/QcubGdFCP6USdlIAuGy7a7C31exC1eVpBmgQFXZfWotzHNSMlPR70x9zqsVrBeINWAKYhBrq4tTqaStvBQsCtpPblSrqV0z8+IgCSlrqkVFtG1WdYGuA1/o66BnFgANGyAJtZQnEristn/RpHaF2wTQvT9go3dQB5BdBAZfVQYsWSmr3GEXiERP8bVO66JG3AUp7r++2XubaF0Sm/rnBebrmrgvrHQBJErV+ShmKiqthcXMDUwhIEddcfZVdG0jxzKahIGDSgk3FBGaiuIZ4yi1i1Ala5BeQQOM6nDnY48STesEdzwq9fIVd4xfPn7OPwibmSBhKZsN+vkoP5PSsd7uhYZ3h7oSSh9pg4Qq6r7rVWI40qfoKRIHwKS9GmwG6LfcMRfhMAOV/sinhYKtq/jPD/3IkKb0wFcoCy3ixh6KIXV/PfAWGPEbpsUOmuAlLoUHQRxMajv3MSDLICojR4tSq+e864i0tlJ7AlwGNNpEdTdQASug5FQGDTTG0BJLai18BpgzMMpJq8S6RT1qIuaG87bli9WGeTzqp+vaz0xzkGgceIxsgEHM/KOBrnPXQxz2ZnFp55hDqANnG01Y9TO1xohkB4i6yB1jquierEDVZ63xHRZKloGfVR5k4pM9gr9gA17xAq7C/sQHDcK0n2JbhxxdkdbNO6iqK+SXle6Fo9cIYT1gJFtp6QRIWwEuEENKWdfBCLnDQkDyyyt7OR2lwyS7NgRPjIxoMVmIJI0NN8Vxmmx0oaWOc23NwFWo/qHOwcBh4Zmm5E6u+pwbZdrojYSgC7BUwG9QOIEm3vhWVR9kcWCAKRdaK65qSceqOpKqS74NKPQb6LKxcRhWh+FC8RA8P/TYAYm4hBuccVDl6wGXkkr9r5XzrWkD1x5UhGApIssBKQbMC77ZuGbj95FLHDD6CpIsoGMrrnHH0tunMNJGRzoRCM1U1/K4OFYm4VwLZ9S5BSn4YvKgxDsvgP4GJcPY4qiKGLMek05UUpUdu3ILDMrUrAGuXI2Vyp2F9LU284jUA0Vi39UeEmLXx7MqTiSvRUnnTEIbZMuqOegZaR6YRcNkxYDTnkphmpoHmFSQexbj2LxnrKKbwwa7IvcYhcZWNOg5u1qkWHFTBoN5078/6DUaHgm93ETYbCKusVLWapX6Dsk5vTpGSuBIbnTAfRqcV9JRG5p2dRbaHu+0szvDMYqU92lT3aArkWhXZpZ4qEUMXv5dP0GgBtLPd45TUDxdD4CJVAA75p9Bm1Y/g/H4XZm2nyTd2pSnn/TRGxWhU4vz3ttUjY1GgJjDsh3sMX4/4iBUQ9EySuKvHSt/LjCjiW6yULtSqxwWddoOB4BWGugRTFq65sTcjU56+2Y/uVcpCj14E5RZLtHpPlRaWbEDOzgR9zbv7dEPQ6P2k++fYEi49RGUsRI2QNBJ2FlaklhjyWCgtVmY9FCboEK2fm5RU0sTGKIW2Ed6X8uhRhJZ68Wp1jeovhrQWFvSR9KJV/KZuMS99CIIHH+a1NCpExzvsK0TNgnL/3UvQ1xabQ1zBJnU9nwNX67RMCCmGBUnHeCBPtkhkScC2GzUwnFm1CewGN7ic8CrPJ3y308DYxG52nbZgtiQn930TJBfkiZik81RG+BVipCorhnPneKzCjQdASB0PLMdYD6lmEmXUBU1u6buJTtqJ6KyHLeC8QaYbX406yTbml+84iPjw3uHGA+GBfQhECHPRfDgBXtjY6aaNbeTOi4t2veiAdBg/8glwIQqKD0EAfO47K6edAcDTzT1TNqpE5xT2DEFIGNu1Hu0UZ8aLZNYBxCCNtpvt6Ijtx/VGTnXDHwWuPUoUZtn6DRXwu1d0hmdlYLWceC5vUMY/t+BLNEns7Ap7Q+VVNf6RgdsQ9KZYac6LD5a05Yv74rUJGgLmlRSRngRGcUQY6omemu0taMR0Zo1Y8ETQN8oaMP/37Cv9dZtcz7BoNlNfExjRPMTBn8/Bzy+yAnDis8zoenImt8qCVDMHdbP7nFfnx2GQpnT4s0awP0TOOroBN113Izy2XzjaP5nvrv0ZvRFHqYWuKbDLnV8Q6vylPneaOZDh84G2krDZ7cac6ADEAMGVrhYONngW+e3Ex5ywUVzhG1+Vcllo1VsfBTPdxM3P/5yk0gDuXFonddPjROKpGDh5u1w59m9ccRiok+HO4v07l6tLUiw4/FHWrluDc2nw/N2ATMh/F0ifCTfS+BrzvU7VSOrN+sd+TiIhWhNtR2eE62/+0hgKZ51G+GcT+21NySDEdtOPHB7OPAetTkeNuY2giOB5uCruByUyWyznj6ev+3pMD5QFpqny2pIsvC8llOHHjRIBUOvSLdIre9fUahqPPhGH/Q9dSoqEyi+tSYsU2NERITZoahEv9aOqdEJy4+x8Z0q8jgbPJpX14rIP2PjdMGsOShKv5L0eq2nzQ+D7vDw7ztfW9T6pit6MryDiQ+NVmlfU8KCpzgdJbwGWuwnONNMzFyXc/8CxiN8aA==")))
trie = TRIE
first = True
for character in sys.argv[1]:
CHARACTER = character.lower()
if character in trie:
trie = trie[character]
elif first and CHARACTER in trie:
first = False
trie = trie[CHARACTER]
else:
print False
if "" not in trie:
print False
print True
Construction of base 64 encoded, zlib'ed trie
f = file("dictionary","r")
END = ""
TRIE = {}
def addWord(word):
trie = TRIE
for character in word:
trie = trie.setdefault(character, {})
trie.setdefault(END, 0)
for line in f.readlines():
l = line.replace("\n", "")
addWord(l)
import zlib, base64
compressed = zlib.compress(str(TRIE).replace(" ", ""), 9)
print base64.b64encode(compressed)
-
\$\begingroup\$ LOL no :) ... it's a typo \$\endgroup\$Nejc– Nejc2015年06月05日 10:42:04 +00:00Commented Jun 5, 2015 at 10:42
-
1\$\begingroup\$ Are you trying to edit your answer more times than I edited the question? \$\endgroup\$CJ Dennis– CJ Dennis2015年06月05日 10:43:58 +00:00Commented Jun 5, 2015 at 10:43
-
\$\begingroup\$ @squeamishossifrage the
else:breakwould not work... If you would query with wordcoloura, which obviously is not a word, the program would return0. That is because in the last step, the trie would be{"":0}. It would not find"a", but it would find""and return0. \$\endgroup\$Nejc– Nejc2015年06月05日 10:51:32 +00:00Commented Jun 5, 2015 at 10:51 -
\$\begingroup\$ @Nejc I deleted my comment after noticing the same thing. However, I think
else:t={};breakandprint""in tmight work. The final newline is also unnecessary \$\endgroup\$r3mainer– r3mainer2015年06月05日 10:53:41 +00:00Commented Jun 5, 2015 at 10:53 -
\$\begingroup\$ The list is also 11 bytes shorter now so you might want to recompress. \$\endgroup\$CJ Dennis– CJ Dennis2015年06月05日 10:55:44 +00:00Commented Jun 5, 2015 at 10:55
PHP - Bytes : (削除) 2892 (削除ここまで) 2811, False +ves: 0, False -ves: 0 = Score: (削除) 2892 (削除ここまで) 2811
<?php
preg_match_all('/(\D*)(\d)/i',bzdecompress(base64_decode('QlpoOTFBWSZTWaILilgAAfKNgH/gACA////wYAhL71trq71ortD2Kw3J6ZeWvTF3c6Iqf6JkTCYTSNA0jU0Ip4AQJpJptVPE1MAip+DRU/VPTRNDQAAGqexCJpMp6mlPUepsp6jJjDIwJpgTIYmjAanoCBMpMJ6mptQMj1jt+3vz889/XMld2kXPhiMgQJwBAfi+ycsvZ/qwWRmQhcQKUtRAd1Ml+2zpBLJ53aoUhCPzfLzu5B1M9wnUYtgN3XbcTTBwP2QBBNCNgoD1wqkkr1osQpQHMkCUltCBliCKmXjaYDDfmTdJGizbGn/jWpIrDGpStVeTiOoFn8ASoSseHaSYC7Iau2KMXZxZB2YWpMqFlT2CSKmkxEJM0V8ag9ed+DsN1gWzRpewuHHh4x4i3rjLGY3I/WC4K6a3BuLkQ69xFrEiu9q1OzrY6cFKZVAxSSz2kdqQkN6S7rrFy1jY9434MVTgwnycz59BHrxqJuDrqSSuplpG3a2b7wzptRP6DFxkZtHRNEk0Kvylx7uMCR5PfcixfGSLa9vGX0uhpyJ8yr5O9dVp/rdzjGxl1VknpS1qNRj7DXd7j2jB98PSQbMCRNp6WowFajA48z07uAXxnHJBfe+uaUQYUWGuvHgUyUgJ3Uplz2pezKQfKyid0FBGcDTSpgrJnCLni6Xxm+FOpkxqYUl6EwcTGr+mLHv0E9ta1EEHnlvQ0W7pFSedZSNRiqYYrwu1V1rhIgVGXnVEiVl1SaIJn4VGIhkNcokNKUEAZ3V6lkt3U0zGOCJ6sgewrPTdrR2XdvNM5VoQ2/i6Dwg821JoHMEgvxeNEoeZC3MlWCqrWsN11tXs4C2hywWYbh66y8nqQEIVGMjLCxflqTuiQ6xi+Lc5qol6G4jOrRL1qmudnsZ5YEsIhyGlxLhYcyrYwd6aGxjA9i4sKH4RMkQWyYfwOY9GvF8dNrPHeq7Duob6HI3FVOH/ea8Yjd0UxEhMAtIJDwripa08CH9Ow12wXz0o8AnofjobjMfb9vIM0Yg2nYYbHImZt0CtoLq3UNUDatyIl1+HxNa/yeOM3HfeeJ/2yI+imsSJbRBXo5UBwfsKKjEbShUG0tnGwfENhPuAUdu/I7eorWqjDsr0+mR1vUut48dR86azlqOphtcjDuwwCriVSUCUS/10+6/qWJrbMPDmyYgfcMpHvaq69PpVyGTsnz+nXn6nHy/o9Pk0QXP2hjoYJxQtxqRapXzr7rrbEu9ybP5Tt7QJs2KujKagvI+2Po8QDCpJpzE0vwQK8RF76Sk4yh1Dq4wMUVu1u6O6nmsHrXH3xwbZlXp+2Gi/VUckLIJHYnujg++hudcW7dJv4rvZPITTbyriWGh9XQFtLDPKqWF7AuAj56PPc1NKS1EFYzYlo8IkFyo3xVTIonBgQcnc+7M5xC0aFfDW3CjUaohQuN1sgMzSUXM7jxXyOb9YdFrNZ5ziw7lIRojbKfHFi7ZcMdAiCSgVCKmZ3kFRGZ2j71ONKrE4RiKGG0xc2rBVCxBzqWgH2eHHbfTz3QDyajdrXoRaxamrtDWmfAlW5xzlSAeJl/QwtnVBcFCJv0hCF/M8vO3tGerpBV15zbTK59//m2tFrreDfNKkrYM1Y9vWT9cl2DlYRu964+Hlkg0uCbmLMSRK+8sx71osBTZ3ZLFKK4MVn1phiyDCaIYXwi0XNMieiCMyhpeN9oWDWGKUz99dVZCM3ZA50AwIDDYHnCFY1TKkFgRgz1OJsVAZCTaGXtj8qUlPLafUxT5xFqY2QclwVpdKISdQiLFG2gTZUiwr8a+vghqNxC8eAtvrlyFFYtP1dvjd9Zj5zEFF9jSSMsRHGRTIrePWqiaF5n8Ph1+mukR8L2WNnxb/sYDuSG5ZHlNMgPaCzf4+mddOSwIq/tm8ozhGUPWmCpdcyx49uLc4HWJHrQ0gAxdxNY8uwCOzp6PuNEkRw3utTsw3odJwKVJ9bcb0EHheguSbpLhgx5k1qlBB6STiAsxkcXG3hDyigqo8morvnIL1mUSyoSmbFFBDTLqA92RaF9+sZsd8ITAzAYpJRaRbTa1CiPBaRAoKWeOIstgil1kEE1O1aC1QHUUJoJ0JSLBHBchSDDj2ZVDjGBG8PNCJeBrM01eV6IfXwRjV8HmFxiYfLPR9FzRsnQZNGItcSimQWNDA6YGL1ouYsyduqOwjjh27I40WU0vloEn6dKRMOjDxl4QiaRLW4EeLP8ycLxC49pbri5Zs17TZSJCkPfTgH7iNPiqgcZmDKNgdEYwAuSjJFIZosePcg8l+I/rFOfs5khIuUHYpBlo1R8oyZr6eth30STIGQitOBYiKgl7oGd0RWdvw3kx8lCxsse70IUvx14HjKbIEusdZIjdQN9TS72jt3ZoRSotFa4ky3iT6Hepk/RErposPd69dxfRk0Pw499oGe/mJGLImBf9JPyfKVBjtVnl0BxSSCzsJ+fDGhARYHs0pOb9Myi9Ig7+5tTbno8Q/zlaIcngPBqAKi4FRMwPy6uqcoIIyCoNjNkDb5dnqoW7pWqqC7ODP95zHrUUgQTchXQdmj3cPWQKbATBPwRrs/8LuSKcKEhRBcUsA')),$b,2);$w='';foreach($b as$p){$l[]=strrev($w.=$p[1]);$w=substr($w,0,strlen($w)-$p[2]);}echo 1-!array_intersect([$argv[1],lcfirst($argv[1])],$l);
Ungolfed (slightly):
<?php
$z='QlpoOTFBWSZTWaILilgAAfKNgH/gACA////wYAhL71trq71ortD2Kw3J6ZeWvTF3c6Iqf6JkTCYTSNA0jU0Ip4AQJpJptVPE1MAip+DRU/VPTRNDQAAGqexCJpMp6mlPUepsp6jJjDIwJpgTIYmjAanoCBMpMJ6mptQMj1jt+3vz889/XMld2kXPhiMgQJwBAfi+ycsvZ/qwWRmQhcQKUtRAd1Ml+2zpBLJ53aoUhCPzfLzu5B1M9wnUYtgN3XbcTTBwP2QBBNCNgoD1wqkkr1osQpQHMkCUltCBliCKmXjaYDDfmTdJGizbGn/jWpIrDGpStVeTiOoFn8ASoSseHaSYC7Iau2KMXZxZB2YWpMqFlT2CSKmkxEJM0V8ag9ed+DsN1gWzRpewuHHh4x4i3rjLGY3I/WC4K6a3BuLkQ69xFrEiu9q1OzrY6cFKZVAxSSz2kdqQkN6S7rrFy1jY9434MVTgwnycz59BHrxqJuDrqSSuplpG3a2b7wzptRP6DFxkZtHRNEk0Kvylx7uMCR5PfcixfGSLa9vGX0uhpyJ8yr5O9dVp/rdzjGxl1VknpS1qNRj7DXd7j2jB98PSQbMCRNp6WowFajA48z07uAXxnHJBfe+uaUQYUWGuvHgUyUgJ3Uplz2pezKQfKyid0FBGcDTSpgrJnCLni6Xxm+FOpkxqYUl6EwcTGr+mLHv0E9ta1EEHnlvQ0W7pFSedZSNRiqYYrwu1V1rhIgVGXnVEiVl1SaIJn4VGIhkNcokNKUEAZ3V6lkt3U0zGOCJ6sgewrPTdrR2XdvNM5VoQ2/i6Dwg821JoHMEgvxeNEoeZC3MlWCqrWsN11tXs4C2hywWYbh66y8nqQEIVGMjLCxflqTuiQ6xi+Lc5qol6G4jOrRL1qmudnsZ5YEsIhyGlxLhYcyrYwd6aGxjA9i4sKH4RMkQWyYfwOY9GvF8dNrPHeq7Duob6HI3FVOH/ea8Yjd0UxEhMAtIJDwripa08CH9Ow12wXz0o8AnofjobjMfb9vIM0Yg2nYYbHImZt0CtoLq3UNUDatyIl1+HxNa/yeOM3HfeeJ/2yI+imsSJbRBXo5UBwfsKKjEbShUG0tnGwfENhPuAUdu/I7eorWqjDsr0+mR1vUut48dR86azlqOphtcjDuwwCriVSUCUS/10+6/qWJrbMPDmyYgfcMpHvaq69PpVyGTsnz+nXn6nHy/o9Pk0QXP2hjoYJxQtxqRapXzr7rrbEu9ybP5Tt7QJs2KujKagvI+2Po8QDCpJpzE0vwQK8RF76Sk4yh1Dq4wMUVu1u6O6nmsHrXH3xwbZlXp+2Gi/VUckLIJHYnujg++hudcW7dJv4rvZPITTbyriWGh9XQFtLDPKqWF7AuAj56PPc1NKS1EFYzYlo8IkFyo3xVTIonBgQcnc+7M5xC0aFfDW3CjUaohQuN1sgMzSUXM7jxXyOb9YdFrNZ5ziw7lIRojbKfHFi7ZcMdAiCSgVCKmZ3kFRGZ2j71ONKrE4RiKGG0xc2rBVCxBzqWgH2eHHbfTz3QDyajdrXoRaxamrtDWmfAlW5xzlSAeJl/QwtnVBcFCJv0hCF/M8vO3tGerpBV15zbTK59//m2tFrreDfNKkrYM1Y9vWT9cl2DlYRu964+Hlkg0uCbmLMSRK+8sx71osBTZ3ZLFKK4MVn1phiyDCaIYXwi0XNMieiCMyhpeN9oWDWGKUz99dVZCM3ZA50AwIDDYHnCFY1TKkFgRgz1OJsVAZCTaGXtj8qUlPLafUxT5xFqY2QclwVpdKISdQiLFG2gTZUiwr8a+vghqNxC8eAtvrlyFFYtP1dvjd9Zj5zEFF9jSSMsRHGRTIrePWqiaF5n8Ph1+mukR8L2WNnxb/sYDuSG5ZHlNMgPaCzf4+mddOSwIq/tm8ozhGUPWmCpdcyx49uLc4HWJHrQ0gAxdxNY8uwCOzp6PuNEkRw3utTsw3odJwKVJ9bcb0EHheguSbpLhgx5k1qlBB6STiAsxkcXG3hDyigqo8morvnIL1mUSyoSmbFFBDTLqA92RaF9+sZsd8ITAzAYpJRaRbTa1CiPBaRAoKWeOIstgil1kEE1O1aC1QHUUJoJ0JSLBHBchSDDj2ZVDjGBG8PNCJeBrM01eV6IfXwRjV8HmFxiYfLPR9FzRsnQZNGItcSimQWNDA6YGL1ouYsyduqOwjjh27I40WU0vloEn6dKRMOjDxl4QiaRLW4EeLP8ycLxC49pbri5Zs17TZSJCkPfTgH7iNPiqgcZmDKNgdEYwAuSjJFIZosePcg8l+I/rFOfs5khIuUHYpBlo1R8oyZr6eth30STIGQitOBYiKgl7oGd0RWdvw3kx8lCxsse70IUvx14HjKbIEusdZIjdQN9TS72jt3ZoRSotFa4ky3iT6Hepk/RErposPd69dxfRk0Pw499oGe/mJGLImBf9JPyfKVBjtVnl0BxSSCzsJ+fDGhARYHs0pOb9Myi9Ig7+5tTbno8Q/zlaIcngPBqAKi4FRMwPy6uqcoIIyCoNjNkDb5dnqoW7pWqqC7ODP95zHrUUgQTchXQdmj3cPWQKbATBPwRrs/8LuSKcKEhRBcUsA';
preg_match_all('/(\D*)(\d)/',bzdecompress(base64_decode($z)),$b,PREG_SET_ORDER);
$w='';
foreach($b as$p){
$l[]=strrev($w.=$p[1]);
$w=substr($w,0,strlen($w)-$p[2]);
}
echo 1-!array_intersect([$argv[1],lcfirst($argv[1])],$l);
Old data
I have applied my own compression before compressing with bzip2 and encoding as base64. My compression reduces the data from 5942 bytes to 3662 bytes, then bzip2 gets that down to 1997 bytes and base64 increases it to 2664 bytes. bzip2 without my compression gets it down to 2852.
The data used to look like this:
a0bout2ve4cross4t0ive1ity7dd2
Which translates into:
a
about
above
across
act
active
activity
add
etc. The example 29 bytes expand into 45 bytes.
New data
I reduced the compressed size of the data by reversing it! I saved 64 bytes in the base64 representation and only added 8 bytes to undo the reversal!
Then I just use in_array() to find the match in each letter case. I have chosen to save a few bytes by not checking for all uppercase words.
And, by the way, if this should somehow have the lowest score it won't be selected as the winner.
-
\$\begingroup\$ Question! Why
printand notecho? Also,var_dump(PREG_SET_ORDER); => 2;) \$\endgroup\$Mr. Llama– Mr. Llama2015年06月05日 17:08:23 +00:00Commented Jun 5, 2015 at 17:08 -
\$\begingroup\$ @Mr.Llama Are you sure
PREG_SET_ORDER == 2for all versons of PHP? \$\endgroup\$CJ Dennis– CJ Dennis2015年06月05日 23:34:03 +00:00Commented Jun 5, 2015 at 23:34 -
\$\begingroup\$ Since like forever, yes. \$\endgroup\$Mr. Llama– Mr. Llama2015年06月07日 05:36:13 +00:00Commented Jun 7, 2015 at 5:36
Python 2 - Bytes: 3170, False +ves: 0, False -ves: 0 = Score: 3170
i=raw_input()
print[e for e in'eJw1lwF64yoMhK+iqxGbOGww+AFO6uXy7x+5+7W209QIaaQZiWDhUc8x6ydaWFrtfY450ifqlsZlYV0tPFtI6xyxWdhCKjPOOltkReKbnGctfFM2vgvrNb/h6hb2Wc8yLJS5zq1ds45XbLN/uV3z0oprvBKLrpF2TB1HDG0emY8tzhjmPvdLNtbZ3B8Lfb4t4F5YeGJ8jnMvFtjPHuFxzR7n8tZ2E5/Ke/Z35J2XPeLEDayHc6TnmSc+rHNtte7zFbD9rGy5ERgOrVqc63f2tMY+OxYIfI5vjMUeiSBxaLx0reESWtEeGZdmjOusPGvl+Z0n39fA27gMKgmXQOnNesy1MlvjJQyNkYlu4Ms3z59JLC2U5TXlGHhyvXU9A67EAIozrRu37TWm8LuBrd8yz06sZ8qrzPdUItnkR05i9bIlvKNDM8vKnsdsHkzkV6i0mLWiNUeSwJaXLbGMFlgyTr5ug+zb8iLvILoIVjwJXY4exM+HtLxjmfKBq/GxLjWHQeSJ92vt0ZbUFoWMQzmwYfzgfyRDKoC0P24QQUaRdUpg1UUtaG19PmMUpGQrK1dxGaTrbJOFe5w7uRzhgf19r2Xuh6ppp6yGHqdSWdY0Ev8rI5Uz8mg1KzHcKcI7N0XPJtugXhT2VMFxAcPZCBSnbWmh4yYxEMt5zONO+LDV0QGcVk8Qfc+TbKnybVVKF0prLrBFyB0qxyNSeJ2SxZkL27ketiYq8xkBcfBMy5lxxR3DLwXQBplaquAj82ul8Gubp4L/Fltb+ALpPpXUBB3EonrIh3N5e2FfFgOVhsV8ediAzveqGIu8FbSNRZxY+GLbLHrVqeAsCvmWFgJ4BTwkOXE/ZJJIItT9U0Xf7eW6Ef87QzYvJpCx2Jdw8PhQILGoihWkygB4uZAHiz9BuxLd9ajrpb/3VNylGZTRucRjgOWAd7EtCTAPZQwsu2I9stI2WtxjZvkV7RnICm/ARZGcBXsi8t2TNNs+b40iwgnY9lRByniee8ie8vm1p2Og9TulJFIrz7MkSkFc8uVs/2PPHDaVatVFZlSl1QXQnpXilU5QefxzPCSjuLCIjGkjFRE7bYu6KRhqCP9+qD17Ir0SGxQjxr98ZOMk1HUjHL6vqtVEBOdN96L0JjgclWPJGR9tCxAGOSrSsC1SWs70HHfIOGxLTwViW6ZkIapt1bVs/rq+PoB0I6HrM7hRPu23GonWt36RVeX4V8krJKlf285iLiL5KSD33Vkp9hVul1RJuqrNkTvtnsfrcgX/XKyoMx/8NbRHuZWwCD2hby+Ri7KTYeRv2Ks+aA4SC9QvSbR3VcoUi/uRBkFrt4x+Qhx74Wgj64WuxVoAf52uiY+gXjQMITP2CJaelvZDekOjo9R+Ga3GMUel8iocUhT6d4NJqdBTWdxnxpYlWGN/YqY6/9QHHLU/p1RyP6hBe8cosr6TUih3JMj0oHch/ekZb5n8GtlRR0EgWScPkeTu8BHTJeWZfy/L0p3wBLXimdkmyXWz4jxdokOru9dZTrQgEXE+heSYb7X3ebg2DJXLcDZYlgtFu6gPve+2gfHv/MoMMmM7AkMnklyr+URV46XWQz2rPQvZcUsjWSEV6tRdhapSS4vW8tr+UIk498errranVS0sv0mzazmRIO4eyVCb+7G9ruQUaOgCUpdCEZa3bmotxavFpaffiiZxd53n38S2nyhjR99IxX5Zca54y3YnxJ8ifQjSTnV/dEIuUY0PET3dPLtFY347v0dQe/kZVpRlr1qM1CnBlnoRars12NuvT0dDr36tnELA6gP//0iN60LHtPqc/GD0qddIjdWUDW2xqimNwqrOqdpVd7XdDXuVJRdxOcpOqlhTHzHGCDvC5vlKgAZHWe49h7+UNO7Dm5Cazz0fvUge09viyV0Q1yp1vnvpbGoejHAoRvMao9va8aqjIh3Hy44UgAAAF5enKGRk523ot48YuqB6B4qpSQIC352T26rGwe4UYhy3x9okC46cNHZk9cTjzKSq332/9yTER4CjxK9SPZBb70AaRwaRqTNhaxEQuslo+mj3hOJWxk0JLWbQWid53ZXBqpjrUMT1I1APujFui8LqDwfg8Bj23+my2N0hZia0m+3IUQtrqjf6l4rCmrdokReGix9Z9RbVE5aKUK5Ut7Nj1+y6k0UKfB5S10Pye/C6NOombzibN2vNEuBdrClGNlh/C3JTI2+V/R5KCr0EObR2PrTF6gzOoqImjs5bKIS66ZCmsj+7X9aXVwV32pFD1yuq3OGK4qBhVw0IjAjzV9iGv9c+rpTiC1H+WH+FVdoP48RnqDM1yoV2aLGmJhCLNBBXl4NKBXNNmENDjgKupxoVN+VEZ5yvdUGtUAt7ZN9XcOaP46ca8cGibLr5BOfnCD8A0Hz/Ruuux8pP01Tds1xJ3tX6XuUo/456hZNML/qarnkIy6eDtEedffZ405snp58OnlIkzZYgQpNEhAy9WOI9/i8Jf6OPIjzVSHldMerEBbnKZp0JC00ixbQBdSrXJnhzD0ziTL7HQfzVmU3KfJ/cBBcP5ouqId81ogne5oe682bmuYoSGsTPI63Wz4eEaD7YUVAtC+TRMULFpJli0zT/bxT3ktEYchy327MdTeXNgVCVr61Xu99+/+q6cJ8/yQaHQefArpgy86a8zvGTukL7nXCSTi5ObAR7vNRGhioFHohpkhaOOQr095hyk2y7R2OIlPo9rjaxRqfbkVb1QHqdoRSaGyic6LK+V28FxUePek9xsHtqllDfQEWZdFNxn5nmyR3G5RwHjlMOfjVFcJgwZ+G4mIbPomOR9/LYhCo+6TCe7TxMs4mf0s7gF9L+iXjjiGHmQ07RbfvonGWCZtg3pPGLpsR7l/QIU+aIebfcL/X69krxsNbVR/H3PVLlpe7xzgOK9X35LJf/oZkwktS4JLIypakHNfBD2bw7ssir+Xh4A1Fbk/jpFz7at2rOLH4cXr236OSqCgSgb3PDFPalLtudhH7gtque0GTTKPyXA5b9rfV/GAUQyw=='.decode('base64').decode('zlib').split()if e[:2]==i[:1].lower()+i[1:2]and i[2:]in e[2:].split('{')]>[]or'I'==i
The word is taken from standard input, and True or False is printed in response. The code part isn't very golfed yet. Here's a (somewhat) more readable version:
i = raw_input()
d = 'eJw1lwF64yoMhK+iqxGbOGww+ ...'
d = d.decode('base64').decode('zlib').split()
m = [e for e in d if e[:2] == i[:1].lower() + i[1:2] and i[2:] in e[2:].split('{')]
print m > [] or i == 'I'
After decoding, the long string looks like this:
about{ove across{t{tive{tivity add afraid{ter again{e{o{ree ...
The words are grouped by their first two letters (i.e. about{ove stores about and above).
We search through each group, checking to see if any starts with the same prefix as the input word and also contains the rest of the word. Single letter words (a, A and I) are handled separately.
-
\$\begingroup\$ Can you save a few bytes by not including
aandA? OnlyIis a special case. \$\endgroup\$CJ Dennis– CJ Dennis2015年06月05日 13:28:06 +00:00Commented Jun 5, 2015 at 13:28 -
\$\begingroup\$ @CJDennis yeah it seems to work if I move
ain with the other words. \$\endgroup\$grc– grc2015年06月05日 13:37:31 +00:00Commented Jun 5, 2015 at 13:37
Factor - Bytes: 1342, False +ves: X, False -ves: 0 = Score: X
USING: ascii base64 bit-arrays bloom-filters io kernel sequences system ;
IN: examples.golf.wordcheck
CONSTANT: S "iAlOYCCAE2MARMNA4ICNzwABZAbwhAeGBMIYgDsYAAwIgoQFB/ADCBwX4IrgAOHA/g7YAYiAYaEBgEINgMGZAeGIJI0AwAqePxzLhi7HYgMaY8PLMQAKIOAPMAC7xoEB4AA6+J+zAOAAzxhDBEwaAAYQbHEYYA8YIMFw0DIaDHhA2H4RMAkYAgACGjOCD2Dmwwy2YuDSdMcCZQAkwJ4siYMPFxjQfN0MMIwDPgIBDBDvGSY4M4IAIjYOgoyMxwCOujfje5AL7KFwfGEAiIPBOAABsIDDAR8gBgAjgEUAPng+71kCJoAdBiZw+zA4gAAaILANEBxwg5eBMEZqeAYwC53BgBlYaoMBMAlAUMCgeeD2gQ4rAK7gQce4D/BmEJhfAR48D8QKMxk2CMSP8ObTAHmcB8aemAEyDiOjAJhhwLvGAJhvP7zwgY4D6Ms2cx/A4KcAjgOzY8SbzMFgN+AU1Hwwhjd3MyD3wcNjBGAY3OEZgHAA/AbvfwI3NiEQAoCDOOjRMDP4IGH2EYUoD27ygAd44AyA9+2z33jXDMRAG2xiAQBgDsALgG0O+dgAfwMIXJZHMPKolod/68CJh+HDwNAcwTEGgCci29hsdCAA2IESQPCYg+AcAIABmB3MwAfHIL8v5gSw5vtmEACeA8CQyMc77DzaZmDh/Q/ceEDCHTwA5gPyAAOgAJsPA4GDt2sNt9ULYBb9TRhigJ1OGAA2YOE6/xuzIAA5AhA8wB9swAKA4Abd+Qfwe0C8c+Gmf+QAcX4AQwYwM2A2zgAD/koLsOOAAcIgMA5sO8AC9xN+/AkYAA7UWVCG+QPwwAW8eQEcAMB++H15EBAS0A6/gQHD5QBghkEGkAEQez9i3gP9BB4ZxzABKl48mOANgAcMaPMh4EkI0HgeWBje3zbwQAcAHWaZQS5wgM8HDAIFxBiBn/MGbPMCwzATX+AwQdnECAMQEFgYwuA5xjAAoAAs3AzkGMDPA2MYHtownIGBGU5w0AQAMGAGDBADDgEs0AEzXB6ARGzDgXE4gkaejd2REQ=="
: F ( -- f )
4 S base64> 6247 swap bit-array boa 1000 996 bloom-filter boa ;
readln 1 cut swap >lower prepend
dup "I" = [ drop t ] [ F bloom-filter-member? ] if
1 0 ? exit
Run using:
$ factor golf/wordlist/wordcheck.factor ; echo $?
color
0
$
The approach is based on a pre-calculated Bloom filter. There are a bunch of false positives (misspelled words seen as correctly spelled) so the code probably isn't competitive when you take that into account.
-
\$\begingroup\$ Is there an easy way I can run this code? \$\endgroup\$CJ Dennis– CJ Dennis2015年06月06日 05:02:31 +00:00Commented Jun 6, 2015 at 5:02
-
\$\begingroup\$ You need the Factor system installed (www.factorcode.org) then you can run it. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2015年06月06日 11:27:50 +00:00Commented Jun 6, 2015 at 11:27
-
\$\begingroup\$ I'm interested to try and find collisions. From what I've read it should be possible to find any number if the data is long enough. If the likelihood of 50 false positives requires the data to be over 100 characters long I'll accept your answer as valid. If 50 false positives can be found in words of length up to 10 then the answer will have to be invalid. If your answer is valid the maximum penalty is 500 so you'd be the current leader! Either way, thank you for posting it! I'd never heard of Bloom filters before! Do you know how many hashes are used? \$\endgroup\$CJ Dennis– CJ Dennis2015年06月06日 12:17:50 +00:00Commented Jun 6, 2015 at 12:17
-
\$\begingroup\$ Not sure what you mean. For a random string with length < 10 the error rate is about 15%. It can be remedied with a larger filter but even with an error rate of 0.001%, it's a huge number of strings. It believe it works better for "normal" misspellings such as
colectorcollekt. \$\endgroup\$Gaslight Deceive Subvert– Gaslight Deceive Subvert2015年06月06日 12:39:56 +00:00Commented Jun 6, 2015 at 12:39 -
\$\begingroup\$ @BjörnLindqvist There are 3,268,647,867,246,256,383,381,332,100,041,691,484,373,976,788,312,974,266,629,140,102,414,955,744,756,908,184,404,049,903,032,490,380,904,202,638,084,876,187,965,749,304,595,652,472,250,353 words of up to 100 characters that are not in the list. Your code is invalid unless it can reject at least 3,268,647,867,246,256,383,381,332,100,041,691,484,373,976,788,312,974,266,629,140,102,414,955,744,756,908,184,404,049,903,032,490,380,904,202,638,084,876,187,965,749,304,595,652,472,250,303 of them. Good luck with that. (I suggest you turn your attention to another question.) \$\endgroup\$r3mainer– r3mainer2015年06月07日 00:00:32 +00:00Commented Jun 7, 2015 at 0:00
Python - Bytes: 2294, False +ves: 0, False -ves: 1 = Score: 2304
Code (273 bytes):
import gzip
c=gzip.open("b").read()
def d(k):
t={}
while 1:
b=ord(c[k]);s,k=(0,k+1)if b&32 else d(k+1);t[chr((b&31)+96)]=b&64,s;
if b&128:break
return t,k
t=d(0)[0]
w=raw_input()
if w.istitle():w=w.lower()
r=0
for x in w:
r,t=t[x]if t and x in t else(0,0)
print r>0
In addition, this needs a binary file named b that is 2021 bytes long. Not sure how I can best share it. Suggestions are welcome if somebody wants to see it. Or I can post the code that produces it from the dictionary.
I'm taking a penalty for a false negative on I. Bothering with upper case letters, or special casing it, would cost me at least 10 bytes anyway.
The pre-processing procedure is:
- A trie is built from the 996 words in the dictionary (skipping I).
The trie is encoded in a byte array. In addition to the 5 bits for the letter itself in each node, 3 extra bits are used to encode trie structure information:
- 1 bit for tracking if the node is a leaf node.
- 1 bit for tracking if the node is at the end of a valid word (note that non-leaf nodes can be at the end of a word, e.g. for "any" vs. "anyone").
- 1 bit for tracking if the node is the last one in the list of children for its parent.
The encoded byte array is 2664 bytes long.
The encoded byte array is compressed to 2021 bytes with gzip.
The spelling check code posted here then:
- Reads the compressed and encoded trie from the binary file, and decompresses it.
- The function
d()decodes the byte stream, and builds a trie using nested dictionaries. - Processes the input word for the upper/lower case rules (all upper is not accepted).
- Walks the trie letter by letter until either the end of the input word is reached, or a leaf node of the trie.
- Prints the result.
JavaScript (削除) 4737 (削除ここまで) 4378
alert(is(prompt("enter word")))
function is(w){if(w=="I")return true;w=w.toLowerCase();var key='/345I_abcdefghijklmnopqrstuvwxyz',x = "MemsgBtQBIvTGACy7agOz4AQCSkAF3M5IBZVcAmGdMAVAoC6lAE64ARiCk1AYAN1GT4A+G9gA5U1PIBzSBl/ADTLVXAPHFXAL1JqAHLXTYATpKAEAa1Ua4AxUAV1GACXgApqaQCu7agCyAGIABnJkQAPU8gBnUpgHw3gCOY/gARAAEgYCMQAnAAMKBBWQCy0AVGmC4NZcvUQBAJN6UkAargAul1AEx0wBabagBdNIBRiCnABw5KsAFkAM5VcAeKVMAQBcwF0gy1JvACAOVADEyIACpSQBoiAAUSAHAA0oAaGukAMgAnwAOiAE1ABSAAXSq4AzBenACy0GYqACkgC5EAOg8AFzTQ0AdqAFRkggs2MgHlqgDcligBY3IAzYANMtVcAuTAHWGgFp0UgDeYAYdNWMAH4AMg8AHgAQaCgAjEAJpioAqgukCl6iAMVYwAd/ADhQAZQ0AlTzc0QB1fwB3yZ0wBAFpnXACaFAAxQA4UAKjVABWFAE5EFTAKKbqmAIA0RSJsqAEAXIUANMKAN10RUAWfADE2MACo0wXADargE6RwBoiAAYUAGWqwAWk+AOhsgAtalAE6YARSCKkZAKmrgFI4FAXS+THioAgClSYA1NdQAxVlQBayq4AgDNLsupMAQByZ0wBdPSgCvSIB0gCIAVqq4BXmq4A3UjIB4yAGp5v4Ar4UAW1XAK5sNAGmMAD4ANVPQ10gCAOQBSaaFAAxV6awAIBXgAGmNyq4AgD4AFGSACHJUAN1GwoAQBVUFwAqqaQCxAGb08Ae1RpUAQBcoC1quqeQDcjUcgCAU5quAV1Iy6kwBAHPgBwim1XANoA0YClwA0eKgC5MAbm4ACjSADGABdOAAbUANKgB4ANIgAGMgBnwBKMhoAXj4AJgy0AcZB4AMgAnSDZdSYBAEtakZANjAAcxuQBy1VwDipGbuQAdWmnkAQBwoAZVz4AppAqXgBfTwA00xoBZVcALmmhQAgFbRogFhBqqAPapumwAcgBfx6J8AVJqAEAumRkAJOmbLqTAEAqxUAaFVyALsqSAarodhQAgDVSMgBPDtqAEAYmdMA83VJUfAEAXlACsyFAyADogBGIMKAGTo+AGmsAFeQAYyAGWquAWo5AJRrgApIIgBI0VAFtVwB4ADmNyAMYgkAE0gUBirgC7DQCuoGMgDkG1AB0AImYDIAUNkAKXADVwA4q4BeACkUgBRIIgyc0YgCAN0KABTmTAHGQBYrIAO2oAoAkAGrgB0ALqUmkgB+oAcNAFyppj4AQBpIE8gDp2QC1GIATn4AbzdnV1ACAOWquAedXUAIAsNJQAXSqYBZUAKpquaIBZiqRpgCAWQAcvIAbUAMTJBjABUNkAEgikAKJP5QAgDc00rNlqrgCAMqZaq4BAFjAAbUAKjZABUwA8AFNTSAFQA4AOpgC0zrgBFYIgBKSrgFNIAVqqYAeAG6QAyoANqAFRkgjlvgCvkANqmAHgBcxuQCjGgAVACYLqBgAXJUADGgBGIASBgGQAoc/gBikCgOSbwByUAFVABfCgBxV2TRAEAcqiAOrgwoAeABqabqSATL+AK9/ABkAMHNNIAgEQDkKABKjACsEq0vk08gEAU0XUbCgBAHDkqALNAXonSFACAW1TyALsqAO9JgDiJppAPOABeqMfAFQ4HTAHTkKAGVQAxkAoKVUAPAA6MQAmkDAAyhqpgCAM1KABy1ADREAA4AUTJSq4A+ACVQAmkALsUAOMgBlUfAFpjQB/4AVGSBYLzADagBXkAGAmkDMtALGKTALOVXANx7mv4A1AWoAGNyAMFABSYAVBjKpgFnMVAFtQA0RAAJqj4AWABSAAYUAGQbUAHFXALSIAFIyGumoA3TACCgAioAJ+ACoLwVkAb+AHOVXALwAKNEEwY1dQBZABLkOmoAQBVkAJHVcAp5dSYB5bRIBuSmKgCjAEXUmAc0g1lQBYwDJoKAPoAUSqILzAKSp5AKavABBXgBy0AaTAC6gTdNgB4yAGWquAWp5M6YB5aAO1ADpDQAw5AAZAHwATNJQAXvTgBZdSYA6uoA1GvHwBIgACkxUATmNPTVwBAHLVXAPINquAPMVNVVwBALuQBchQAMbkAZqAGh4qAF2FADNQAXy0AcKADLXTYATkKALgAakdVwBAGh1eAB6kZANCjTAJa1XAByFAFlUwCdEAIpACagR8ANVTACtMOyoAQCuabFAElVwCy1VwC1cGcOSoAQB2q4A8mANTMUADp5AC4ATBVVwBuqeQBgDM1XAHwA4wDIAZaAHgAUZCgBmh0QCpWKgCrVVwCutSMgE6iQBxSYBZvSIBtpmjLmq0AgDcadABRAGdXUAakKAFgJwAGJkKACagZADGXIALKgA8ACjYUkAp58AdEQVkAnTyAGKTAKLkKADKgCpEFwArqJrgHh2XUmAeHHioAQBybNAC1cAOKuAV1YVPIA4ADOfADtU8gG5CgAmhQA4pMAdmyoAvqAGhzHj4AoqkASdIUAUnYUAVVXAHKkZALbkqAEAdHi5ABRiAE3YaAVXRAHDQAyANtKVMAMZdSYAgC5EAAVkAMqAO5kuoALp+ADCgCUZDQAT4AIx8ASFO2oAqXSASQSVI6rgG6aQBptQBTyAFTOuAKjZAGPgApfIBYybVzTyAIBajkAs6vMA3IUBoAJUADG5AGbAAwoAaGSABwIgAFJABqaQC0c6rgElQARVcAZgw0AsGSBagAdEAI5ACSgAmkANqADwAIbSkQCcqaFAFjFL4AEAlGuhoA5ABFJpAJUgAjEAJpBlU0KAEAb7UAO1TBc0QD6ADTJUCnABgoAKqAC6gVAFSqgyAEVgC6agAqC+QBooFMgBUF8gBqKVXALIAcAFyIAAlQAMmaIBRVNCgCj4ANquAUnRNcAqxUAU0KABkVADAAMZVcAsg6D6gBg6MQAmC+QB8AEVKqALqgBTgAyNGIAqjEAXRUANIKAKdOADQ1QARAAF5ACSqTUAOWumwAnSVgAQCmCkwAvfwA6mkAKgy0AqmQoAVGgACHGiASkgBGIATSANKTAC+QBuoyQBdNgB20a6gDyaVQAmkAL5ADLqTALwAKNEEgAqADoxAGkjIaAU1ABUF1AkDwA3NNigCUrIA0mwAdIzq6gBALSVTyAPgAq5IB6O9SMgFjJpoUAIBIQrGF6iAIBNABKVTAJhrgC7JjxUAIAykq4BTn4Aa1pfIB3UCt3YUAIBcUrIAukAFLpAPJjxUAMFABGAIgBjKgB64AVGQ0AJAuAEVVbSagD27DqTAEAogBObsAFe7jxUAIA4yALTTBkAFOuAEwXUul1ACAOFAC5EAAmwEAAukAMABTTGgFdRsgAoAnJ8ACgZioAqJN4AVAYrLVXAEAZUvenABAGbmNyAdIgy0AagyaIA6Q0AeTAG5nTACQNqiASlABTR4qAPSgYyAPgA1eYA8chQBeqoAtTRFQApVfGTTSAQBOyAGXRAKoMKXqIBaNGPgBALamKyY8VACAV/AC6MTMUAeHZAKjkKAPhnZACCgAjAAJ5AC8gAw0GVADlDQAVcAfAAo0AXBlqrgGlLpsANUAAcxuQCiiklAFEAMZADIANNkAFKiAEwXUALkNACKgAyoAaA8ADkqABagAikEQAmmnABUDKrgFdQAYUBoAZbTWQBAGkjTAGaVXAK6QQBFIAX8AGMgFd2VADSbAD8o1wAxlV0m8AIBZACmpsAC4BfVegA0oARSl"
for(var i=0,t="",x=atob(x),m;i<x.length;i++)m=x.charCodeAt(i).toString(2),t+="00000000".substr(m.length)+m
t=t.replace(/.{5}/g, function(m){return key.charAt(parseInt(m,2))}).split(/_/)[0];
t=JSON.parse("{"+t.replace(/[a-zA-Z]/g,'"$&":{').replace(/\/(\d*)/g, function(m,d){
for(var n=parseInt(d,10)||1,o="";n--;)o+="}"
return o +",";}).replace(/,円\}/g, "}").replace(/\{,円/g, "{").replace(/,円$/, "").replace(/\{\}/g, "null"))
for(var i=0,n=t,c;i<w.length;n=n[c]){
c=w.charAt(i++)
if(!(c in n))return false}
return n==null}
There's three encodings of the words. First the words are converted into a shorthand tree like about/2ve. This has the word about then back 2 /2 add ve for above. The encoding is converted to only allow for / /3 /4 and /5 backing and all others are converted as combinations. This gives a total of 32 different chars and allows for 5 bit encoding. The 5bit encoding is binary and needs to be base64 encoded.
When decoding the shorthand tree is converted to a JSON tree for a simple node search.
I initially tried to use canvas and toDataURL to get some compression but it wasn't good. If you go that route do not use the alpha channel. Browsers hack the alpha and do rounding with the data so just set that to 255 and put your data in rgb.
-
\$\begingroup\$ Capitalised words seem to return false... \$\endgroup\$Oliver– Oliver2015年11月22日 09:53:07 +00:00Commented Nov 22, 2015 at 9:53
Shell - Bytes: 2545, False +ves: 0, False -ves: 0 = Score: 2545
Code: (削除) 54 (削除ここまで) 52 bytes, Dictionary: (削除) 2518 (削除ここまで) (削除) 2494 (削除ここまで) 2493 bytes, FP: (削除) 1 (削除ここまで) 0, FN: 0
[ 1ドル = "${1#?*[A-Z]}" ]&&unlzma<z|grep -qix "${1%I}"
Quotes are not needed around the initial 1ドル because "the input is always a single word, all ASCII [A-Za-z]." However, quotes are needed around the parameter expansions because they might generate an empty string.
Examples:
In shell, an exit code of 0 is true (anything else is false). $? is the previous command's exit code.
$ sh in997 colour; echo $?
0
$ sh in997 Colour; echo $?
0
$ sh in997 coLoUr; echo $?
1
$ sh in997 COLOUR; echo $? # I chose to consistently mark these false
1
$ sh in997 Color; echo $?
1
$ sh in997 I; echo $?
0
$ sh in997 i; echo $?
1
$ sh in997 ColourI; echo $? # testing because I strip off one trailing "I"
1
$ sh in997 II; echo $? # ... and "I" isn't in the dictionary (see below)
1
(Examples copied from Nejc's answer plus an all-caps example and two tests for specific corner cases used to vet my i vs I logic. I have no idea why Google Prettify likes Proper Case and CamelCase words when highlighting lang-bash.)
Ungolfed and explained:
if [ "1ドル" = "${1#?*[A-Z]}" ]; then
cat z | unlzma | grep -q -i -x "${1%I}"
fi
The conditional compares the argument with itself after a string manipulation that alters the string if it matches any character followed by any number of characters followed by an uppercase letter (equivalent to s/^..*?[A-Z]// – ignore the ? if you don't understand it), which would compare coLoUr to oUr and Colour to Colour. This controls for mixed case (and fails all-caps), though it skips the first letter. It accepts i (but neither i nor I are in the dictionary, see below).
If the conditional matches (there are no capitals after the first letter), unlzma is run on the contents of the file named z, which puts the uncompressed dictionary into standard output. grep then queries quietly (-q), without regard to case—we've already controlled for that (-i), and on a whole line (-x).
The query is an altered version of the argument: if the argument has a trailing capital I (case sensitive!), it is removed. Because we controlled for case earlier, the only time we'd remove a capital I would be for the word I, in which case the grep query seeks an empty string on its own line. The dictionary's I entry was swapped for a blank line, so we'll get a match.
The last item that executed dictates the exit value of the script. If the conditional was false, the script exits false. If grep found no match, it (and the script) return false. If grep found a match, it and the script return true.
Dictionary creation:
Wow. There are some awesome dictionary compression schemes in other answers here. I didn't do anything that fancy. There is zero custom compression in this code (unless you count removing DOS linebreaks), just minor prep work that makes it easier to check later on (blanking the I line actually costs a byte!).
I just took the stock dictionary, blanked the I, and compressed the output. I chose LZMA because it compressed this particular file better than zip, gzip, bzip2, xz, or zlib.
Here's my code:
wget -qqO- 'http://pastebin.com/raw.php?i=wh6yxrqp' \
| perl -pne 's/\r//; s/^I$//' | lzma -c9 > z
wget flags -qqO- suppress all output and sends the HTML content to standard output. That's parsed by perl, which removes the DOS line breaks (\r) and converts the I entry to a blank line (since we stripped the trailing I in grep). That's then passed to lzma which outputs the compressed dictionary into a pipe to file z.
Note that unlzma will refuse to operate on a file lacking the .lzma extension, but it's fine with a pipeline. That means unlzma z fails while unlzma<z (effectively cat z | unlzma) succeeds.
-
\$\begingroup\$ If you're curious, my dictionary would be 834 bytes larger if it were base64-encoded. That (plus the 10 bytes from adding
|base64 -d) would easily remove my answer from the leaders' pack, whereas LZMA could radically improve other answers here. \$\endgroup\$Adam Katz– Adam Katz2015年11月21日 09:14:22 +00:00Commented Nov 21, 2015 at 9:14
ColourreturnsTRUE, what aboutcOlOUr? \$\endgroup\$