Skip to main content Link Menu Expand (external link) Document Search Copy Copied
aider
Aider is AI pair programming in your terminal. Aider is on GitHub and Discord.

January 25, 2024

The January GPT-4 Turbo is lazier than the last version

benchmark results

OpenAI just released a new version of GPT-4 Turbo. This new model is intended to reduce the "laziness" that has been widely observed with the previous gpt-4-1106-preview model:

Today, we are releasing an updated GPT-4 Turbo preview model, gpt-4-0125-preview. This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of "laziness" where the model doesn’t complete a task.

With that in mind, I’ve been benchmarking the new model using aider’s existing lazy coding benchmark.

Benchmark results

Overall, the new gpt-4-0125-preview model seems lazier than the November gpt-4-1106-preview model:

  • It gets worse benchmark scores when using the unified diffs code editing format.
  • Using aider’s older SEARCH/REPLACE block editing format, the new January model outperforms the older November model. But it still performs worse than both models using unified diffs.

This is one in a series of reports that use the aider benchmarking suite to assess and compare the code editing capabilities of OpenAI’s GPT models. You can review the other reports for additional information:

AltStyle によって変換されたページ (->オリジナル) /