-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Proposal for Extending schema.ts in Drizzle ORM with Runtime Middlewares
#1513
-
Hello Drizzle Maintainers and Community,
I am excited to propose an enhancement to the schema.ts in Drizzle ORM, aiming to integrate runtime validations using libraries like Zod or Joi. This addition is poised to bring a new level of data integrity and flexibility to Drizzle ORM.
Currently, schema.ts is structured as follows:
export const mySchemaUsers = mySchema.table('users', { id: serial('id').primaryKey(), name: text('name'), });
While effective in defining the table structure, it lacks capabilities for in-depth data validation. Here's how I envision enhancing this:
-
Column-Level Validation:
This approach allows each column to have bespoke validation logic, offering precise control over the constraints of each field.Example:
import { z } from 'zod'; export const mySchemaUsers = mySchema.table('users', { id: serial('id').primaryKey().validation(z.string().default(() => uuid())), name: text('name').validation(z.string().min(4).max(150)), });
-
Table-Level Validation:
This holistic approach enables validation of the entire data structure, accommodating complex inter-field validations and business rules.Example:
import { z } from 'zod'; const userSchema = z.object({ id: z.string(), name: z.string().min(4).max(150) }); export const mySchemaUsers = mySchema.table('users', { id: serial('id').primaryKey(), name: text('name'), }).validation(userSchema);
Advantages of This Approach:
-
Enhanced Validation Beyond DB Capabilities: This feature surpasses the typical database-level validations, enabling more sophisticated checks like string lengths, array validations, and other complex data integrity checks not typically available at the database level.
-
Robust Defaults: With libraries like Zod, we can define strong defaults, such as using nanoid, uuid, or even namespaced IDs (e.g.,
post_abcd12345), providing more flexibility and robustness in data handling. -
Shared Validation Logic with Frontend: The proposed validation functions can be exported and reused in the frontend. This consistency ensures that both backend and frontend validations are aligned, reducing redundancy and enhancing overall application integrity.
I believe these enhancements will significantly benefit Drizzle ORM users by offering more powerful and flexible validation mechanisms, ultimately leading to more robust and reliable applications.
I'm eager to hear the community's thoughts on this proposal and am ready to actively contribute towards the implementation of this feature. Your feedback will be invaluable in shaping this potential addition to Drizzle ORM.
Thank you for considering this suggestion, and I look forward to a fruitful discussion.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 13 -
👎 2
Replies: 17 comments 19 replies
-
Thank you for taking the time to write this.
The following opinions are my own, and mine only.
I personally think that the core drizzle-orm package should stay dependency-free. That is one of the selling point that brought me to it.
The opt-in architecture of plugins like drizzle-zod, drizzle-kit, the driver I use, etc. is just top of the line. That means I'm in control of what I bring in my project, totally in control. That allows the users to decide if they want zod, valibot, typebox or their own validator library.
I think the approach we should follow is improve the experience of the plug-ins we have.
For example, what could be missing in drizzle-zod that made you suggest this feature?
Beta Was this translation helpful? Give feedback.
All reactions
-
I agree with you about keeping drizzle dependency-free, and I think that it can still be achieved with the API example I provided.
The .validation() functions could just take in any function. All they do is pass the column or row to whatever validator function the user passes. It could be totally up to the user to install Zod, Joi, or anything else.
What I miss from drizzle-zod is exactly this, the possibility of having "in-client" validations by default.
If you think about it, which better place for the users to describe their data in detail if not the schema.ts file? It would be awesome to describe your schema in very granular details in a single file. Drizzle could then split these validations between DB (for whatever is supported) and runtime (e.g. string lengths, email validations, etc...). All of it becomes invisible to the user that just interacts with the client.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 4 -
❤️ 2
-
Your clarification just changed my mind! I think this is a great idea!
A couple of questions though:
- if the user passes validation to both the table and the columns, what should be the behavior, order of execution?
- Should the user expect the validation to be run on insert, update or both?
- I saw that you included:
id: serial('id').primaryKey().validation(z.string().default(() => uuid())),. I believe that will create that key in the returned object. Should that key supersede the possible implemented.default()or.$default()values? - In general, should drizzle just run the validation passed and insert/update the returned object?
- Something to note here is that
z.string().default(() => uuid())doesn't return a validation function but an object with aparsemethod. We should make sure we are clear on what we accept.
Having said this, I think we should convert this discussion to a feature request. I think it is a nice idea.
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 2
-
Thank you for your thoughtful questions! Take my answers with a pinch of salt because I am no Drizzle expert at all, but here is my feeling about them:
-
Behavior and Order of Execution for Table and Column Validations:
If both table and column validations are provided, the most logical approach would be to execute column-level validations first and then proceed to table-level validations. This is because column-level validations are more granular, focusing on individual fields, whereas table-level validations can involve more complex logic potentially spanning multiple fields or even external data.Example: Stocks Table
-
Column-Level Validation: For a
circulatingSupplycolumn in a Stocks table, we might have a validation ensuring it's a non-negative number:circulatingSupply: number('circulatingSupply').validation(z.number().nonnegative()),
-
Table-Level Validation: The table-level validation might ensure that the
circulatingSupplyis equal to the sum of alluserBalancesfor that stock, requiring an asynchronous operation:.validation(async (tableData) => { const totalUserBalances = await calculateTotalUserBalancesForStock(tableData.id); if (tableData.circulatingSupply !== totalUserBalances) { throw new Error("Circulating supply must equal the sum of all user balances."); } });
This example demonstrates how column-level validations handle individual field constraints, while table-level validations can enforce complex business rules involving multiple data points.
-
-
Validation on Insert and Update:
The validation should ideally be implemented for both insert and update operations to maintain data integrity throughout the data lifecycle. This ensures that any new or modified data adheres to the defined schema and business logic. -
Default Values and Validation:
In the case ofid: serial('id').primaryKey().validation(z.string().default(() => uuid())),, the default value provided by the validation should take precedence. This approach allows for a consistent behavior, where the validation library manages default values and data integrity. -
Behavior with Validation Results:
Drizzle should execute the validations as specified and use the validated object for insert/update operations. This process ensures that all data interactions with the database conform to the defined validation rules, upholding data quality. -
Handling Validation Functions and Objects:
For validation libraries like Zod, which return objects with aparsemethod, Drizzle should be designed to recognize and utilize these objects correctly. Clear documentation specifying the expected format of validation functions or objects will aid developers in implementing validations correctly.
I want to reiterate that this is a quick response and not much thought has gone into it. Many edge cases might have been missed :)
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 3
-
Totally agree, lacking an event hook system is the only reason holding me back from swithing to drizzle. Both TypeORM and Prisma can do this job. Our system heavily depends on it. But with Drizzle it's not quite possible to achive it without using some ugly hacks
Beta Was this translation helpful? Give feedback.
All reactions
-
Hello again,
After further reflection on the proposed enhancements for Drizzle ORM, I've developed an additional perspective that might further expand our capabilities: replacing the .validation() method with a more versatile .use() middleware approach.
This concept of middleware aligns more closely with the pattern seen in many modern web frameworks. It opens up a plethora of possibilities, going beyond mere validation. Here's how it could look:
export const mySchemaUsers = mySchema.table('users', { id: serial('id').primaryKey().use(myIdMiddleware), name: text('name').use(myNameMiddleware), });
Expanded Capabilities with Middleware:
-
Beyond Validation:
While maintaining the ability to validate data, middleware can perform various other functions. For example, it can transform data before it's saved or after it's retrieved from the database. -
Data Type Transformations:
Middleware can be extremely useful for data type conversions, such as transforming BigInt to number, or vice versa, ensuring seamless handling of different data types. -
JSON Field Handling:
Another application could be the conversion and validation of JSON DB fields into actual JavaScript objects, streamlining the process of working with complex data structures. -
Custom Business Logic:
It can facilitate the implementation of custom business logic at the schema level, like adding timestamps, modifying data based on specific conditions, or implementing complex inter-field logic. -
Plug-and-Play Extensions:
This approach also allows for the easy integration of third-party libraries or custom functions as middleware, enhancing the ORM's flexibility and capability.
By conceptualizing these functions as middleware, we not only retain the initial idea of enhanced validation but also open the door to a more powerful and flexible data-handling paradigm within Drizzle ORM.
I'm curious to know what the community and maintainers think about this expanded idea. Does this middleware approach align with the goals and architecture of Drizzle ORM?
Looking forward to hearing your thoughts and suggestions!
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 22
-
I really love the rollback idea! 👏🏻
I think it can solve the atomicity problem I mentioned before
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
-
@valerioleo @CanRau we should compose a real world business example, it is vital to validate api designs on production data
Beta Was this translation helpful? Give feedback.
All reactions
-
@AlexBlokh I didn't understand if it was a request to me or @CanRau -- if it was, I'm very happy to provide anything that can help.
Beta Was this translation helpful? Give feedback.
All reactions
-
Hey sorry, notification got a bit lost, any suggestion?
Different api design, but I gave more usage examples over at #1426
Middleware would make it pretty easy to enforce things in-app, but like on the connection level, no-one could just go around them without creating a new connection.
Like restricting certain fields from being fetched at all or returned, e.g. we have an internal only id and an additional public_id on some tables, id should usually not leak into the client, examples in the discussion linked #1051 (comment)
I have polymorphic relations, as Planetscale doesn't even support foreign keys so that's not important to me, and instead of having to define on each query what the table relates to I could define this in the middleware, for this specific case there might be something else coming like mentioned here #1051 (comment) ? Though middleware might even make the other api unnecessary 🤷🏼
And I'd love to have soft-deletes which I currently have to manually do everywhere and remember, instead of being able to call .delete and have the middleware turn that into an .update with status changed and deleted_at date set on tables where it's needed.
And as @valerioleo mentioned being able to enforce additional validation/normalization on data before inserting or updating would be very nice, for some columns I store a separate column_raw which is mostly untouched whereas column is normalized to our needs, this could also happen in a middleware instead of having to do this manually everywhere
On more specific example is domain & domain_hash where the latter could be handled via middleware.
There's so much stuff going on that it's just such a nice place to have all this kinda along the schema but on the app level, uh a lot of this is also due to the fact that Vitess (Planetscale) doesn't support stored procedures, at least currently, so I couldn't put most of that logic directly into the db, that's why orm level, right next to the schema is a fantastic alternative which can even be more powerful as I have all of Javascript available 🔥
Not sure this is exactly what you've asked for though hope it helps, just let me know 🙏🏼
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 3
-
Also forgot to mention even tho it's in #1426 is having one centralized place to sort subqueries would be amazing, I mean specifically I have subqueries within the select using JSON_ARRAYAGG & JSON_OBJECT but MySQL doesn't "respect" order by within those subqueries, so maybe I could have an orderBy on the subquery and in the .use middleware "enforce" it via javascript 🤔
Beta Was this translation helpful? Give feedback.
All reactions
-
The functionality that developers can extend their drizzle client like prisma client extension.
It would improve the developer experience way better.
Scenarios that I have faced:
- Hashing the password value before saving it into a database.
- Adding a new property to query results at runtime and type-level.
e.g. to use casl ability it requires to__caslSubjectType__property. - Ability to implement soft delete feature like https://github.com/olivierwilkinson/prisma-soft-delete-middleware/blob/main/src/lib/createSoftDeleteMiddleware.ts
Sorry for spamming.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 6 -
❤️ 4
-
Just sharing:
$onUpdatehook added at https://github.com/drizzle-team/drizzle-orm/releases/tag/0.30.5
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 1
-
I've been thinking about this. There is a lot to consider. I think the middleware should be at the table level only to avoid any race condition/complications around the order of operations. To be able to make it useful enough, use should accept and object with the following properties:
const usersMiddleware = { beforeSelect: ({table: CurrentTable, selectedFields: SelectFields, where: SQL, logger: CustomLogger}) => ({ SelectedFields, where }), afterSelect: ({result: DependsOnTheSelect[], logger: CustomLogger }) => CustomReturn, beforeInsert: ({ table: CurrentTable, insertedFields: Partial<InsertedFields>, logger: CustomLogger }) => ({ insertedFields, returnedObj }), afterInsert: ({result: ResultTypeFromTheDriver, returnedObject: WhateverTheUserWants, logger: CustomLogger}) => unknown, beforeDelete: ({table: CurrentTable, where: SQL, logger: CustomLoger}) => boolean | ((db: Database) => unknown), afterDelete: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => unknown, beforeUpdate: ({ table: CurrentTable, insertedFields: InsertedFields, logger: CustomLogger }) => InsertedFields, afterUpdate: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => unknown, }
Not sure if logger is a good fit here, or the user should have it in the scope. As you might see, the names are very descriptive and make the intent very easy to guess. This avoid the if-else nightmare of an event-like api like on, where you have to do something different depending on the type of event. Even though this is basically the same thing, you can just subscribe to whichever you like and make sure you'll get what you are suppose to get in the callback.
Here is an annotated example of what this could allow:
const users = pgTable('users', { id: serial('id').primaryKey(), tenantId: bigint('tenant_id').notNull(), name: varchar('name', { length: 150 }).notNull(), password: varchar('password', { length: 300 }).notNull(), lastName: varchar('last_name', { length: 150 }).notNull(), fullName: varchar('full_name', { length: 300 }).notNull(), createdAt: timestamp('created_at').notNull().defaultNow(), updatedAt: timestamp('updated_at').notNull().defaultNow(), deletedAt: timestamp('deleted_at'), randomId: varchar('random_id').notNull() }).$use({ beforeSelect({table, where, selectedFields }) { // Here we make sure that if the query is db.select().from(users), the password won't come out. // But if we do db.select({ password: users.password }).from(users), it would work. if (!selectedFields) { const { password, rest: selectedFields } = getTableColumns(table); } // You could implement multi tenant applications like this return { selectedFields, where: and(where, eq(table.tenantId, getTenantId()), isNull(table.deletedAt)) }; }, beforeInsert({table, insertedFields, logger}) { // Here you can modify the data, validate. The sky is the limit. if (Array.isArray(insertedFields) { insertedFields = insertedFields.map((field) => userSchema.parse({ ...field, randomId: generateRandomId(), fullName: `${field.name} ${field.lastName}` }) } else { insertedFields.randomId = generateRandomId(); insertedFields.fullName = `${insertedFields.name} ${insertedFields.lastName}` insertedFields = userSchema.parse(insertedFields); } // Here we can define a custom object to return after the insert: return { insertedFields, generatedRandomId: Array.isArray(insertedFields) ? insertedFields.map(field => field.randomId) : insertedFields.randomId } }, afterInsert({ result, returnedObject, logger }) { logger("inserted random id(s): ", returnedObject.generatedRandomId) return { result, randomId: returnedObject.generatedRandomId } }, beforeDelete({table, where }) { // Here we can do soft deletes: return async (db) => await db.update(users).set({ deletedAt: new Date() }).where(and(where, eq(table.tenantId, getTenantId())) } }
This are just the examples I came up with and that I see people asking about.
Notice how this would basically deprecate $default() and the possible $onUpdateFn that's in the works.
Having $use defined this way would also help in separating the code that's supposed to be sql code vs the code that's supposed to only run in javascript.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 14
-
A lot of question are still in the air. What about joins? what about subqueries, CTEs, etc.
In a join you are selecting the table, so should it trigger the beforeSelect?
Same for subqueries and CTEs.
Another question is, what about RQB?
Beta Was this translation helpful? Give feedback.
All reactions
-
👀 4
-
I hope middleware functions to be asynchronous.
const usersMiddleware = {
beforeSelect: ({table: CurrentTable, selectedFields: SelectFields, where: SQL, logger: CustomLogger}) => Promise<({ SelectedFields, where })>,
afterSelect: ({result: DependsOnTheSelect[], logger: CustomLogger }) => Promise<CustomReturn>,
beforeInsert: ({ table: CurrentTable, insertedFields: Partial<InsertedFields>, logger: CustomLogger }) => Promise<({ insertedFields, returnedObj })>,
afterInsert: ({result: ResultTypeFromTheDriver, returnedObject: WhateverTheUserWants, logger: CustomLogger}) => Promise<unknown>,
beforeDelete: ({table: CurrentTable, where: SQL, logger: CustomLoger}) => Promise<boolean | ((db: Database) => unknown)>,
afterDelete: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => Promise<unknown>,
beforeUpdate: ({ table: CurrentTable, insertedFields: InsertedFields, logger: CustomLogger }) => Promise<InsertedFields>,
afterUpdate: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => Promise<unknown>,
}
one of my use case is to encrypt/decrypt some columns with AWS KMS, which provides async methods to encrypt/decrypt data.
otherwise, supporting async functions for customType can be another option for this use case.
const encrypted = customType<{
data: string;
driverData: string;
}>({
dataType() {
return 'varchar(256)',
},
async fromDriver(value) {
return await decrypt(value);
},
async toDriver(value) {
return await encrypt(value);
}
})
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 2
-
I think an implementation like django-signals would be a good fit here.
We can setup signals in schema like below with possibility to add user defined custom signals.
const users = pgTable('users', { id: integer("id"](id: serial('id').primaryKey() }).signals( preSave: async (row) => doPreSaveValidation(row), postSave: async () => runPostSaveCallback, preDelete: async () => runSomeValidation, postDelete: async () => someCallBack, )
Or may be custom signals with the ability to set when a callback runs and whether it is a returning callback or not.
A returning callback for example returns validated data, a non returning callback just runs a callback but returns the actual data returned by sql query.
const users = pgTable('users', { id: integer("id"](id: serial('id').primaryKey() }).signals({ parse: async ({ row }) => zodSchema.parse(row), validate: async ({ row }) => validation(row), notify: async ({ table, row }) => sendNotification(table, row), warn: async ({ row} ) => sendWarning(row) ]) .on([‘insert’, ‘update’], users.signals.validate) .on([‘insert’, ‘update’], users.signals.notify) .on([‘select’], users.signals.parse(returning=true)) .on([‘delete’], users.signals.warn) const sessions = pgTable('users', { id: integer("id"](id: serial('id').primaryKey() userId: integer("user_id").references(() => users.id) }).signals({ // users table will be the sender invalidate: async ({ sender }) => invalidateAllSessions(sender) })
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 3
-
Middleware is a must have in my opinion. It's actually preventing me from migrating a current project to drizzle. I think Sequelize offers a strong solution.
Some important aspects for me:
- Hooks at global/table level
- Ability to pass arguments when calling a query (ex. Passing in user detail to add detail to createdby column)
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 4
-
Same here. We currently use Prisma client extensions for the following use-cases:
- Table, row and field access policies
- Extended permission checks
- Search-cache indexing
- Performance logging
- Soft-delete and filtering
- Extended validation
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 7
-
Same, I currently use prisma's middleware to time & count queries. This is the one thing holding me back from migrating
Beta Was this translation helpful? Give feedback.
All reactions
-
For anyone needing to plug into Drizzle's lifecycle, check this gist. It's not perfect but it's a good workaround until this is officially supported.
Beta Was this translation helpful? Give feedback.
All reactions
-
❤️ 4
-
Is there an implementation plan? I really need this feature.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 15 -
👎 1 -
❤️ 12 -
👀 10
-
I need it too, it's the only reason which stops me from migrating my projects from TypeORM/Prisma
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
We currently wrap drizzle in a base repository https://gist.github.com/cayter/49d5c256a885d90c399ca6c1eca19f51 which it will trigger the before and after hooks. Note that we only made it working with postgres.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
No update 😕
Beta Was this translation helpful? Give feedback.
All reactions
-
I like that Drizzle takes its time for this, because it's very easy to get this one wrong.
And we can actually implement it on our own with a repository or proxy pattern (see above the gists).
I prefer the proxy pattern because I don't want the user (the dev) to create a new language that wraps Drizzle.
On Django, we have the same issue: the core team is slower to add things, particularly when they're not really core. Which is a good thing.
But in Django there is also a great ecosystem of third party packages, in particular, middlewares. Then the Django core team eventually decides to integrate these packages into Django based on how successful they are.
The sad thing though, is that THIS feature WOULD allows a package ecosystem to thrive ! haha!
So it's a bit like a chicken and egg problem.
WE CAN implement a middleware/proxy pattern on top of the existing DSL, but we're missing the primary hooks to do so.
I tried to think about what's the best way to hook into this and I ended up with the same thing as the solution above: https://gist.github.com/ikupenov/10bc89d92d92eaba8cc5569013e04069 so I invite you to look into it.
So the drizzle team could give us functions built on top of this proxy pattern and "let us play with it". Because right now we're all building our own proxy system, which doesn't really feel right to me (it's also a lot of nasty work and type checks)
If these hooks are built properly, this might foster an ecosystem as big as Vite on the DB side of things (BTW, the api could be very similar to what Vite offers, this would facilitate adoption).
I think a good start is to JUST give the function that's called AFTER the proxy, and the function simply passes exactly the context in which drizzle was called (all the db.insert. etc, etc... plus their arguments).
No after_save, no after_update, just something very low level that could allow these higher level hooks to be built, this way it's easier to get it right for the Drizzle team.
Implementing this is safer than coming up with the perfect "vite-like" api. Which is hard to get right.
Beta Was this translation helpful? Give feedback.
All reactions
-
There are some improvements in the proxy solution. It’s now a lot more flexible and hooks can easily be added. Bear in mind that if you go with this approach you’ll need to maintain it and there are some breaking changes coming with V1. Shouldn’t be too bad but worth keeping in mind.
That’s the updated gist: https://gist.github.com/ikupenov/26f3775821c05f17b6f8b7a037fb2c7a.
Beta Was this translation helpful? Give feedback.
All reactions
-
Is there any way i can implement live query feature with sqlite using drizzle or maybe just get table name passed to the SQL query in drizzle so that I can trigger render when changes are made to that tables and queries associated with that table
Beta Was this translation helpful? Give feedback.
All reactions
-
Has there been any updates to this topic?
I heard mentions of a middleware implementation?
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 37
-
Hi,What's the progress?
Beta Was this translation helpful? Give feedback.
All reactions
-
Hi,What's the progress?
Beta Was this translation helpful? Give feedback.