-
Notifications
You must be signed in to change notification settings - Fork 311
introduce iterable schema
#1792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
CodSpeed Performance Report
Merging #1792 will not alter performance
Comparing dh/iterable-schema (c45e43c) with main (0cd11fe)
Summary
✅ 163 untouched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python, an iterable is an object that has an __iter__() method and presumably can produce iterators multiple times from it. What if users want the iterable to just be validated that we can grab an iterator from it (that is what input.validate_iter() is doing), and leave it as is? I guess the issue is that we can't validate the type of the values this way?
Or maybe my point is that as it stands now, we support ABCs to try to preserve the input type to our best knowledge. If someone uses Sequence[int] for example, (1, 2, 3) needs to be preserved as (1, 2, 3), and [1, 2, 3] needs to be preserved as [1, 2, 3].
With Iterable[int] in non lazy/eager mode, some_int_iterable is collected as a list, which might be surprising. Then what would be the benefit of using Iterable[int] over list[int], which also allows iterables to be validated.
I guess the issue is that we can't validate the type of the values this way?
I guess exactly this, yes. Validation may in general lead to coercions (e.g. string '1' to integer 1) so Iterable[int] might need some work done.
Then what would be the benefit of using Iterable[int] over list[int], which also allows iterables to be validated.
A great question. Perhaps you're right, and the answer is that Iterable[int] we should treat exactly like Sequence[int], including the way we attempt to reconstruct the original type.
For Iterator and Generator, we could change the behaviour to be more like Callable where we can't ever validate the actual contents. And we could expose ValidatorIterator and/or ValidatorGenerator which users could use to opt-in to the lazy one-use behaviour.
So... seems like this needs more design?
It feels to me that several users reported issues with Iterable (or upvoted such issues) because of three reasons:
-
users using external types (not meant for Pydantic in the first place), that use
Iterableas an annotation. This is what happens in Accessing aTypedDictfield hasValidationIteratorinstead of the original value pydantic#9467 (7 👍): users want to validate some type from the OpenAI SDK, naturally use a list for this type and are surprised to see that aValidatorIteratoris actually used:from openai.types.chat import ChatCompletionAssistantMessageParam from pydantic import BaseModel class MyModel(BaseModel): history: list[ChatCompletionAssistantMessageParam] history = [ { "content": None, "role": "assistant", "tool_calls": [ { "id": "id", "function": { "arguments": '{"location":"Tokyo, Japan"}', "name": "GetCurrentWeather", }, "type": "function", } ], }, ] my_model = MyModel(history=history) print(my_model.history) """ [{'role': 'assistant', 'content': None, 'tool_calls': ValidatorIterator(index=0, schema=...)}] """
While confusing, there isn't much we can do.
ChatCompletionAssistantMessageParamusesIterablebecause it is a type that isn't related to any Pydantic validation process, and as such they probably want to be as loose as possible for static type checkers.Even if we introduced a config setting/annotation to eagerly evaluate the iterable, the type isn't "owned" by end users and so they can't apply such config/annotation on it (and unfortunately a lot of OpenAI types are using
Iterable). -
Users that mistakenly think that they should use the most broad type/protocol to match as many types as possible, as reported in Attributes declared as iterables are replaced in the instances by
pydantic-coreValidatorIteratorinstance pydantic#9541 (12 👍) (also Validation ofIterable[T]might want revisiting in V3 pydantic#9266 (comment) ). I think we should at least recommend on both these issues that they should just use concrete types. Yes, this breaks static type checking, but this is a general Pydantic issues with type coercion. -
Users that purposely use
Iterableto provide types that implement__iter__(). It is unclear to me still if they expect to be able to fetch iterators from them multiple times (by repeatediter()calls), in which case we should just try to validate that the type is indeed iterable?
Change Summary
Adds
iterable_schema, which is intended to solve my proposal in pydantic/pydantic#9541 (comment)pydanticshould update all existing uses ofgenerator_schematoiterable_schema, which allows forlazy = Falseas a field-level setting. We should probably also have a config setting calledlazy_iterablesor similar, (TODO).If we want to allow support for
IteratorandGeneratortypes inpydantic, those can usegenerator_schema.Related issue number
pydantic/pydantic#9541
Checklist
pydantic-core(except for expected changes)