-
-
Notifications
You must be signed in to change notification settings - Fork 892
Releases: simonw/llm
0.32a3
Driven by the needs of Datasette Agent's human-in-the-loop ask_user() feature, made the following improvements to how tool calls work:
- Tool implementations can declare a parameter named
llm_tool_callin order to be passed thellm.ToolCallobject for the current invocation. This allows them to access the currentllm_tool_call.tool_call_id. See Accessing the tool call from inside a tool. #1480 - Every tool call is now guaranteed a unique
tool_call_id- providers that do not supply one get a synthesizedtc_-prefixed ULID. #1481 - Tools can raise a
llm.PauseChainexception to cleanly pause the tool chain, useful for things like waiting for human approval. The exception propagates to the caller with.tool_calland.tool_results(completed sibling results) attached, and no model call is made with a placeholder result. See Pausing a chain from inside a tool. #1482 - Failure semantics for concurrent tool execution: async sibling tool calls always run to completion before a pause or hook exception propagates. #1482
- Chains can now resume from a
messages=history ending in unresolved tool calls: the calls are executed through the normalbefore_call/after_callmachinery before the first model call, skipping any that already have results. Theexecute_tool_calls()method also accepts a new optionaltool_calls_list=argument for executing an explicit list ofToolCallobjects in place of the calls requested by the response. See Resuming a chain with pending tool calls. #1482 - Fixed a bug where the async tool executor silently dropped calls to tools not present in
tools=- these now returnError: tool "..." does not existresults, matching the sync executor. #1483
Assets 2
0.32a2
Support for the OpenAI Responses API
Most reasoning-capable OpenAI models now use the /v1/responses endpoint instead of /v1/chat/completions. This enables interleaved reasoning across tool calls for GPT-5 class models. #1435
- New
ResponsesandAsyncResponsesmodel classes driving the OpenAI Responses API. The existingChatandAsyncChatclasses are unchanged so other plugins that import them keep working. - The following models now use the Responses API by default:
o1,o3-mini,o3,o4-mini,gpt-5,gpt-5-mini,gpt-5-nano,gpt-5.1,gpt-5.2,gpt-5.4,gpt-5.4-mini,gpt-5.4-nano,gpt-5.5(and their pinned date variants). - Use
-o chat_completions 1to fall back to the older/v1/chat/completionscode path for any of these models. - Encrypted reasoning items are captured as
provider_metadataonReasoningPartobjects and round-tripped back to OpenAI on subsequent turns. - Reasoning summaries are now requested with
"summary": "auto"so visible reasoning text is streamed back where the model produces it, unless--hide-reasoningorhide_reasoning=is set. - This means OpenAI prompts run using
llm promptthat return reasoning tokens will display those on standard error.
CLI
- New
llm -m model --optionsflag to list the options supported by a given model. #1441 - The
-R/--no-reasoningoption has been renamed to-R/--hide-reasoning.
Python API
- New
hide_reasoning=Truekeyword argument onmodel.prompt(),conversation.prompt(),model.chain(),conversation.chain(), and their async counterparts, exposed to model plugins asprompt.hide_reasoning. Model plugins can use this to decide if they should request visible reasoning summaries from their providers. #1442 - New
options=dict keyword argument onModel.prompt(),Conversation.prompt(),Response.reply(), and their async equivalents, matching the pattern already used by.chain(). The previous**kwargsform continues to work for backwards compatibility but is no longer documented, and will be removed in the future. #1432
Bug fixes
add_tool_call()calls that were not also recorded as stream events are now correctly emitted asToolCallPartobjects when assembling response parts, so they survive serialization viaresponse.to_dict(). #1433
Assets 2
0.32a1
0.32a0
This alpha introduces a major backwards-compatible refactor. Models can now be prompted with a list of messages, OpenAI Chat Completions style, and the response can now be iterated over as a sequence of mixed types of content, for example reasoning tokens mixed with text tokens mixed with tool calls.
For more background on this release take a look at the annotated release notes on my blog.
Prompt inputs and response outputs are now expressed as a list of Message objects, each containing typed Part objects (text, reasoning, tool calls, tool results, attachments).
The llm CLI tool can now display reasoning tokens while executing a prompt.
Plugin authors should read the expanded Advanced model plugins documentation, which now covers StreamEvent, consuming prompt.messages, and round-tripping opaque provider metadata such as Anthropic extended-thinking signatures and Gemini thoughtSignature values.
Structured messages and streaming events
- New
llm.Messagevalue type and constructor helpersllm.user(),llm.assistant(),llm.system(), andllm.tool_message()for building structured prompt inputs. The helpers accept strings,Attachmentinstances, or nestedPartlists. - New
messages=keyword argument onmodel.prompt(),conversation.prompt(),model.chain(),conversation.chain(), and their async counterparts. Theprompt=,system=,attachments=, andtool_results=keywords still work and synthesize into the sameMessagelist internally. - New
response.stream_events()andresponse.astream_events()methods yielding typedStreamEventobjects (typeis one of"text","reasoning","tool_call_name","tool_call_args","tool_result", plus aredacted=Truemarker for opaque reasoning). Iterating againstresponsedirectly continues to yield only text strings. - New
response.messages()method (async:await response.messages()) returning the assembledlist[Message]produced by the model. Calling it forces execution if the response prompt has not yet been executed. - New
response.reply(prompt=None, **kwargs)method that continues the conversation from anyResponse, regardless of origin. When the previous response made tool calls andtool_results=was not passed,reply()automatically executes the pending tool calls and threads the results into the next turn. On async responsesreply()is awaitable. - New
response.to_dict()andResponse.from_dict(data, *, model=None)for JSON-safe serialization of a full conversation turn --- model id, input chain, assembled output (including reasoning parts and provider metadata), options, and audit fields. Reasoning signatures andthoughtSignaturevalues round-trip viaprovider_metadata, so multi-turn extended thinking works across process boundaries. - New
llm/serialization.pymodule exposingMessageDict,PartDict,ResponseDict,PromptDict,UsageDict,AttachmentDict, and the per-Part TypedDicts. Everyto_dict()/from_dict()method is annotated with the matching TypedDict. Response.prompt.messagesis now the canonical structured input across the entire conversation chain.Conversation.promptandAsyncConversation.promptpre-compute the full chain (prior input + prior output + new turn) before constructing the nextPrompt, soresponse.prompt.messagesis always exactly what the model was sent.
CLI
llm promptandllm chatnow display visible reasoning text to stderr in a dim style while the response streams.- New
-R/--no-reasoningflag forllm promptandllm chatto suppress the reasoning stream. llm logsnow renders any visible reasoning emitted during a response under a## Reasoningheading above the response.- New
reasoningcolumn on theresponsestable populated from the visible-reasoning text.
Assets 2
0.31
- New GPT-5.5 OpenAI model:
llm -m gpt-5.5. #1418 - New option to set the text verbosity level for GPT-5+ OpenAI models:
-o verbosity low. Values arelow,medium,high. - New option for setting the image detail level used for image attachments to OpenAI models:
-o image_detail low- values arelow,highandauto, and GPT-5.4 and 5.5 also acceptoriginal. - Models listed in
extra-openai-models.yamlare now also registered as asynchronous. #1395
Assets 2
0.30
- The register_models() plugin hook now takes an optional
model_aliasesparameter listing all of the models, async models and aliases that have been registered so far by other plugins. A plugin with@hookimpl(trylast=True)can use this to take previously registered models into account. #1389 - Added docstrings to public classes and methods and included those directly in the documentation.
Assets 2
0.29
0.28
- New OpenAI models:
gpt-5.1,gpt-5.1-chat-latest,gpt-5.2andgpt-5.2-chat-latest. #1300, #1317 - LLM now requires Python 3.10 or higher. Python 3.14 is now covered by the tests.
- When fetching URLs as fragments using
llm -f URL, the request now includes a custom user-agent header:llm/VERSION (https://llm.datasette.io/). #1309 - Fixed a bug where fragments were not correctly registered with their source when using
llm chat. Thanks, Giuseppe Rota. #1316 - Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313
- Fixed a deprecation warning for
asyncio.iscoroutinefunction. - Type annotations for the OpenAI Chat, AsyncChat and Completion
execute()methods. Thanks, Arjan Mossel. #1315 - The project now uses
uvand dependency groups for development. See the updated contributing documentation. #1318
Assets 2
0.27.1
llm chat -t templatenow correctly loads any tools that are included in that template. #1239- Fixed a bug where
llm -m gpt5 -o reasoning_effort minimal --save gmsaved a template containing invalid YAML. #1237 - Fixed a bug where running
llm chat -t templatecould cause prompts to be duplicated. #1240 - Less confusing error message if a requested toolbox class is unavailable. #1238
Assets 2
0.27
This release adds support for the new GPT-5 family of models from OpenAI. It also enhances tool calling in a number of ways, including allowing templates to bundle pre-configured tools.
New features
- New models:
gpt-5,gpt-5-miniandgpt-5-nano. #1229 - LLM templates can now include a list of tools. These can be named tools from plugins or arbitrary Python function blocks, see Tools in templates. #1009
- Tools can now return attachments, for models that support features such as image input. #1014
- New methods on the
Toolboxclass:.add_tool(),.prepare()and.prepare_async(), described in Dynamic toolboxes. #1111 - New
model.conversation(before_call=x, after_call=y)attributes for registering callback functions to run before and after tool calls. See tool debugging hooks for details. #1088 - Some model providers can serve different models from the same configured URL - llm-llama-server for example. Plugins for these providers can now record the resolved model ID of the model that was used to the LLM logs using the
response.set_resolved_model(model_id)method. #1117 - Raising
llm.CancelToolCallnow only cancels the current tool call, passing an error back to the model and allowing it to continue. #1148 - New
-l/--latestoption forllm logs -q searchtermfor searching logs ordered by date (most recent first) instead of the default relevance search. #1177
Bug fixes and documentation
- The
register_embedding_modelshook is now documented. #1049 - Show visible stack trace for
llm templates show invalid-template-name. #1053 - Handle invalid tool names more gracefully in
llm chat. #1104 - Add a Tool plugins section to the plugin directory. #1110
- Error on
register(Klass)if the passed class is not a subclass ofToolbox. #1114 - Add
-hfor--helpfor allllmCLI commands. #1134 - Add missing
dataclassesto advanced model plugins docs. #1137 - Fixed a bug where
llm logs -T llm_version "version" --asyncincorrectly recorded just one single log entry when it should have recorded two. #1150 - All extra OpenAI model keys in
extra-openai-models.yamlare now documented. #1228