- I will try to reopen this question with some edits.
Since programming is predicted to become natural linguisized by the unstoppable advances in LLM technology, perhaps we can code golf LLMs themselves.
Produce the shortest coding task for an LLM (one of the chat ones or a coding agent) in English that produces code, which when run objectively fails at the task by producing the incorrect output for the the inputs provided.
Extra points if the error is reproducible by others (perhaps it was generated with low temperature), not one even a child would make, or is particularly hilarious.
-
3\$\begingroup\$ Welcome to Code Golf, and interesting first question! However, it seems very difficult to arbitrate, since what counts as "obvious" (or, indeed, what counts as a "falsehood") is up for debate. \$\endgroup\$97.100.97.109– 97.100.97.1092023年09月15日 17:35:23 +00:00Commented Sep 15, 2023 at 17:35
-
1\$\begingroup\$ Are we allowed to abuse the fact that it isn’t trained on more up-to-date data? \$\endgroup\$noodle person– noodle person2023年09月15日 17:39:05 +00:00Commented Sep 15, 2023 at 17:39
-
\$\begingroup\$ Ideally we'll score based on a good faith effort to produce falsehoods on things the LLM would be expected to have no trouble with. I agree that there is some amount of arbitrariness. \$\endgroup\$Innuo– Innuo2023年09月15日 17:40:37 +00:00Commented Sep 15, 2023 at 17:40
-
\$\begingroup\$ I think this needs a much, much clearer specification. \$\endgroup\$Jonathan Allan– Jonathan Allan2023年09月15日 18:28:42 +00:00Commented Sep 15, 2023 at 18:28
-
4\$\begingroup\$ I think that makes this a popularity-contest - A popularity contest must always include an objective validity criterion, which is a set of rules that regulate what every answer must comply with. Answers that do not comply are invalid and will be removed. A clear specification of the goal that must be achieved. Questions like "do (this) the most creative way" should be avoided. Creativity should be the tool, not the goal. \$\endgroup\$Jonathan Allan– Jonathan Allan2023年09月15日 18:44:55 +00:00Commented Sep 15, 2023 at 18:44
8 Answers 8
GPT3.5 model, 1 byte
Hexdump: 0x01
It doesn’t look like a blank message...
Edit:
This does not work with GPT4o, which was released after this question was created.
12 characters (GPT-4)
Query: eerie sorted
GPT-4: Sorting the word "eerie" alphabetically results in "eeeri".
This one's quite inconsistent. The longer "Sort letters eerie", though, elicited a wrong answer 10/10 times, always with an extra r and sometimes an extra e, though the letters' ordering was correct.
-
\$\begingroup\$ Not sure this is something an "LLM would be expected to have no trouble with" (from comments not yet in the question text itself). \$\endgroup\$Jonathan Allan– Jonathan Allan2023年09月15日 18:29:12 +00:00Commented Sep 15, 2023 at 18:29
-
\$\begingroup\$ @JonathanAllan It's interesting because I think this expectation for me comes from seeing ChatGPT messing up with letters, and knowing that it sees text in chunks. A couple years ago I'd have guessed letters would be easy for LLM's compared to more abstract notions. \$\endgroup\$xnor– xnor2023年09月15日 18:37:52 +00:00Commented Sep 15, 2023 at 18:37
-
\$\begingroup\$ I gave "sort eerie" to GPT-4 and its first response was: 'The word "eerie" is already in its sorted form. If you sort the letters alphabetically, it remains "eerie." Would you like help with anything else?' \$\endgroup\$Sophie Swett– Sophie Swett2023年09月16日 15:00:52 +00:00Commented Sep 16, 2023 at 15:00
ChatGPT 3.5 - 5 bytes
Vyxal
- APL and J aren't golfing languages
- APL and J weren't inspirations for Vyxal (I know this because I didn't take anything from J)
-
\$\begingroup\$ I would argue that APL and J are golfing languages. Although they are not meant for the recreational discipline, they are clearly designed to allow short code in terms of bytes. \$\endgroup\$Adamátor– Adamátor2025年06月26日 11:59:38 +00:00Commented Jun 26 at 11:59
9 characters (GPT-3.5)
[User] am i dead
[GPT-3.5] I'm just a computer program, so I can't determine your current state of being. If you're feeling uncertain or concerned about your health, I would strongly recommend seeking immediate medical attention or contacting a healthcare professional. They can provide you with the necessary assistance and guidance to address any health-related concerns.
One could argue this is clearly false, since if I am interacting with the model then I must be alive. (However, that is up for debate.)
-
\$\begingroup\$ (Note: this was the first output I received given this input -- I didn't regenerate this multiple times to find a contradictory result.) \$\endgroup\$97.100.97.109– 97.100.97.1092023年09月15日 17:40:57 +00:00Commented Sep 15, 2023 at 17:40
-
\$\begingroup\$ That's great. However, I am willing to give the LLM a pass because at least it seems to be saying "I don't know" rather than "yes". \$\endgroup\$Innuo– Innuo2023年09月15日 17:44:15 +00:00Commented Sep 15, 2023 at 17:44
-
1\$\begingroup\$ Could probably include typos like
am I dedor evenmI ded\$\endgroup\$Jonathan Allan– Jonathan Allan2023年09月15日 18:30:12 +00:00Commented Sep 15, 2023 at 18:30 -
\$\begingroup\$ @Innuo In general answering 'I don't know' is a good excuse (although to my knowledge, LLMs don't do that often) but in this example or the closely related 'Am I asleep?' answering 'I don't know' is mostly wrong, even a child would know. \$\endgroup\$quarague– quarague2023年09月16日 07:19:34 +00:00Commented Sep 16, 2023 at 7:19
-
1\$\begingroup\$ Nice one! imo "I can't determine your current state" is definitely false as it could infer that you're alive from the information provided. \$\endgroup\$Sam– Sam2023年09月18日 07:46:26 +00:00Commented Sep 18, 2023 at 7:46
GPT-3.5, 2 bytes
_?
I entered a little more than an underscore, so it's somewhat of a falsehood.
-
2\$\begingroup\$ This one is kinda lame \$\endgroup\$noodle person– noodle person2023年09月16日 02:33:14 +00:00Commented Sep 16, 2023 at 2:33
-
4\$\begingroup\$ Golf be like that @noodleman \$\endgroup\$2023年09月16日 03:47:26 +00:00Commented Sep 16, 2023 at 3:47
-
3\$\begingroup\$ Your message did include an underscore, so ChatGPT might be correct here... \$\endgroup\$The Empty String Photographer– The Empty String Photographer2023年09月16日 09:33:38 +00:00Commented Sep 16, 2023 at 9:33
-
2\$\begingroup\$ @TheEmptyStringPhotographer I interpret it as saying that it's calling the entirety of what I entered an underscore \$\endgroup\$2023年09月16日 10:15:07 +00:00Commented Sep 16, 2023 at 10:15
-
\$\begingroup\$ @lyxal I suppose... \$\endgroup\$The Empty String Photographer– The Empty String Photographer2023年09月16日 11:08:01 +00:00Commented Sep 16, 2023 at 11:08
ChatGPT, 5 bytes
"pi 97"
The value of pi (π) to 97 decimal places is: 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679
This is pi to 100 decimal places, not 97.
-
1\$\begingroup\$ Nice one! It reproduces on GPT-4. \$\endgroup\$xnor– xnor2023年09月15日 18:39:25 +00:00Commented Sep 15, 2023 at 18:39
-
3\$\begingroup\$ This is just mathematics, specifically counting. GPT can't count :p \$\endgroup\$Jonathan Allan– Jonathan Allan2023年09月15日 18:47:49 +00:00Commented Sep 15, 2023 at 18:47
17 bytes (ChatGPT)
ChatGPT sucks at code golf.
"th shrtst js quine"
The code it posted doesn’t work because it doesn’t include the console.log(JSON.stringify()) bit, and is also not the shortest.
-
\$\begingroup\$ Could you leave off "the" and/or misspell a few words to get it shorter? \$\endgroup\$2023年09月15日 18:00:57 +00:00Commented Sep 15, 2023 at 18:00
-
\$\begingroup\$ i couldn’t get it to actually try to golf it without "the". it would just show "one of the shortest" \$\endgroup\$noodle person– noodle person2023年09月15日 18:01:47 +00:00Commented Sep 15, 2023 at 18:01
-
\$\begingroup\$ "shortest quine js" worked for me first time \$\endgroup\$2023年09月15日 18:09:18 +00:00Commented Sep 15, 2023 at 18:09
-
\$\begingroup\$ @RydwolfPrograms couldn’t get it to say "the shortest js quine is...", only "this is a short js quine..." \$\endgroup\$noodle person– noodle person2023年09月15日 18:14:14 +00:00Commented Sep 15, 2023 at 18:14
-
\$\begingroup\$ Same, but the ones it gives me aren't valid quines or even valid JS \$\endgroup\$2023年09月15日 19:47:43 +00:00Commented Sep 15, 2023 at 19:47
44 Characters (ChatGPT 3.5)
User: "banana" sorted reverse lexicographically is
ChatGPT: When the word "banana" is sorted in reverse lexicographical order (from Z to A), it becomes "nnaaab."
-
2\$\begingroup\$ change lexicographically to alphabetically to golf \$\endgroup\$noodle person– noodle person2023年09月15日 17:59:59 +00:00Commented Sep 15, 2023 at 17:59