Shortest English prompt to produce LLM bug [closed]

Question 1

I will try to reopen this question with some edits.

Since programming is predicted to become natural linguisized by the unstoppable advances in LLM technology, perhaps we can code golf LLMs themselves.

Produce the shortest coding task for an LLM (one of the chat ones or a coding agent) in English that produces code, which when run objectively fails at the task by producing the incorrect output for the the inputs provided.

Extra points if the error is reproducible by others (perhaps it was generated with low temperature), not one even a child would make, or is particularly hilarious.

Question 2

Welcome to Code Golf, and interesting first question! However, it seems very difficult to arbitrate, since what counts as "obvious" (or, indeed, what counts as a "falsehood") is up for debate.

Question 3

Are we allowed to abuse the fact that it isn’t trained on more up-to-date data?

Question 4

Ideally we'll score based on a good faith effort to produce falsehoods on things the LLM would be expected to have no trouble with. I agree that there is some amount of arbitrariness.

Question 5

I think this needs a much, much clearer specification.

Question 6

I think that makes this a popularity-contest - A popularity contest must always include an objective validity criterion, which is a set of rules that regulate what every answer must comply with. Answers that do not comply are invalid and will be removed. A clear specification of the goal that must be achieved. Questions like "do (this) the most creative way" should be avoided. Creativity should be the tool, not the goal.

Question 7

GPT3.5 model, 1 byte

Hexdump: 0x01

Conversation

It doesn’t look like a blank message...

Chat Link

Edit:

This does not work with GPT4o, which was released after this question was created.

Question 8

12 characters (GPT-4)

Chat GPT

Query: eerie sorted
GPT-4: Sorting the word "eerie" alphabetically results in "eeeri".

This one's quite inconsistent. The longer "Sort letters eerie", though, elicited a wrong answer 10/10 times, always with an extra r and sometimes an extra e, though the letters' ordering was correct.

Question 9

Not sure this is something an "LLM would be expected to have no trouble with" (from comments not yet in the question text itself).

Question 10

@JonathanAllan It's interesting because I think this expectation for me comes from seeing ChatGPT messing up with letters, and knowing that it sees text in chunks. A couple years ago I'd have guessed letters would be easy for LLM's compared to more abstract notions.

Question 11

I gave "sort eerie" to GPT-4 and its first response was: 'The word "eerie" is already in its sorted form. If you sort the letters alphabetically, it remains "eerie." Would you like help with anything else?'

Question 12

ChatGPT 3.5 - 5 bytes

Vyxal

enter image description here

APL and J aren't golfing languages
APL and J weren't inspirations for Vyxal (I know this because I didn't take anything from J)

Question 13

I would argue that APL and J are golfing languages. Although they are not meant for the recreational discipline, they are clearly designed to allow short code in terms of bytes.

Question 14

9 characters (GPT-3.5)

enter image description here

[User] am i dead

[GPT-3.5] I'm just a computer program, so I can't determine your current state of being. If you're feeling uncertain or concerned about your health, I would strongly recommend seeking immediate medical attention or contacting a healthcare professional. They can provide you with the necessary assistance and guidance to address any health-related concerns.

One could argue this is clearly false, since if I am interacting with the model then I must be alive. (However, that is up for debate.)

Question 15

(Note: this was the first output I received given this input -- I didn't regenerate this multiple times to find a contradictory result.)

Question 16

That's great. However, I am willing to give the LLM a pass because at least it seems to be saying "I don't know" rather than "yes".

Question 17

Could probably include typos like am I ded or even mI ded

Question 18

@Innuo In general answering 'I don't know' is a good excuse (although to my knowledge, LLMs don't do that often) but in this example or the closely related 'Am I asleep?' answering 'I don't know' is mostly wrong, even a child would know.

Question 19

Nice one! imo "I can't determine your current state" is definitely false as it could infer that you're alive from the information provided.

Question 20

GPT-3.5, 2 bytes

_?

enter image description here

I entered a little more than an underscore, so it's somewhat of a falsehood.

Question 21

This one is kinda lame

Question 22

Golf be like that @noodleman

Question 23

Your message did include an underscore, so ChatGPT might be correct here...

Question 24

@TheEmptyStringPhotographer I interpret it as saying that it's calling the entirety of what I entered an underscore

Question 25

@lyxal I suppose...

Question 26

ChatGPT, 5 bytes

"pi 97"

enter image description here

The value of pi (π) to 97 decimal places is: 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679

This is pi to 100 decimal places, not 97.

Question 27

Nice one! It reproduces on GPT-4.

Question 28

This is just mathematics, specifically counting. GPT can't count :p

Question 29

17 bytes (ChatGPT)

ChatGPT sucks at code golf.

"th shrtst js quine"

The code it posted doesn’t work because it doesn’t include the console.log(JSON.stringify()) bit, and is also not the shortest.

enter image description here

Question 30

Could you leave off "the" and/or misspell a few words to get it shorter?

Question 31

i couldn’t get it to actually try to golf it without "the". it would just show "one of the shortest"

Question 32

"shortest quine js" worked for me first time

Question 33

@RydwolfPrograms couldn’t get it to say "the shortest js quine is...", only "this is a short js quine..."

Question 34

Same, but the ones it gives me aren't valid quines or even valid JS

Question 35

44 Characters (ChatGPT 3.5)

User: "banana" sorted reverse lexicographically is

ChatGPT: When the word "banana" is sorted in reverse lexicographical order (from Z to A), it becomes "nnaaab."

enter image description here

Question 36

change lexicographically to alphabetically to golf

The Empty String Photographer 2,7681 gold badge11 silver badges28 bronze badges · Accepted Answer · 2023-09-16 09:25:56Z

10

\$\begingroup\$

GPT3.5 model, 1 byte

Hexdump: 0x01

Conversation

It doesn’t look like a blank message...

Chat Link

Edit:

This does not work with GPT4o, which was released after this question was created.

Share

Improve this answer

edited May 28, 2024 at 11:12

answered Sep 16, 2023 at 9:25

The Empty String Photographer's user avatar

The Empty String Photographer

2,7681 gold badge11 silver badges28 bronze badges

\$\endgroup\$

0

Add a comment |

Stack Exchange Network

Shortest English prompt to produce LLM bug [closed]

8 Answers 8

GPT3.5 model, 1 byte

Edit:

12 characters (GPT-4)

ChatGPT 3.5 - 5 bytes

9 characters (GPT-3.5)

GPT-3.5, 2 bytes

ChatGPT, 5 bytes

17 bytes (ChatGPT)

44 Characters (ChatGPT 3.5)

Hot Network Questions

Shortest English prompt to produce LLM bug [closed]

8 Answers 8

GPT3.5 model, 1 byte

Edit:

12 characters (GPT-4)

ChatGPT 3.5 - 5 bytes

9 characters (GPT-3.5)

GPT-3.5, 2 bytes

ChatGPT, 5 bytes

17 bytes (ChatGPT)

44 Characters (ChatGPT 3.5)

Related

Hot Network Questions