Proposal for Extending `schema.ts` in Drizzle ORM with Runtime Middlewares · drizzle-team/drizzle-orm · Discussion #1513

valerioleo
Nov 14, 2023

Hello Drizzle Maintainers and Community,

I am excited to propose an enhancement to the schema.ts in Drizzle ORM, aiming to integrate runtime validations using libraries like Zod or Joi. This addition is poised to bring a new level of data integrity and flexibility to Drizzle ORM.

Currently, schema.ts is structured as follows:

export const mySchemaUsers = mySchema.table('users', {
 id: serial('id').primaryKey(),
 name: text('name'),
});

While effective in defining the table structure, it lacks capabilities for in-depth data validation. Here's how I envision enhancing this:

Column-Level Validation:
This approach allows each column to have bespoke validation logic, offering precise control over the constraints of each field.

Example:

import { z } from 'zod';
export const mySchemaUsers = mySchema.table('users', {
 id: serial('id').primaryKey().validation(z.string().default(() => uuid())),
 name: text('name').validation(z.string().min(4).max(150)),
});

Table-Level Validation:
This holistic approach enables validation of the entire data structure, accommodating complex inter-field validations and business rules.

Example:

import { z } from 'zod';
const userSchema = z.object({
 id: z.string(),
 name: z.string().min(4).max(150)
});
export const mySchemaUsers = mySchema.table('users', {
 id: serial('id').primaryKey(),
 name: text('name'),
}).validation(userSchema);

Advantages of This Approach:

Enhanced Validation Beyond DB Capabilities: This feature surpasses the typical database-level validations, enabling more sophisticated checks like string lengths, array validations, and other complex data integrity checks not typically available at the database level.
Robust Defaults: With libraries like Zod, we can define strong defaults, such as using nanoid, uuid, or even namespaced IDs (e.g., post_abcd12345), providing more flexibility and robustness in data handling.
Shared Validation Logic with Frontend: The proposed validation functions can be exported and reused in the frontend. This consistency ensures that both backend and frontend validations are aligned, reducing redundancy and enhancing overall application integrity.

I believe these enhancements will significantly benefit Drizzle ORM users by offering more powerful and flexible validation mechanisms, ultimately leading to more robust and reliable applications.

I'm eager to hear the community's thoughts on this proposal and am ready to actively contribute towards the implementation of this feature. Your feedback will be invaluable in shaping this potential addition to Drizzle ORM.

Thank you for considering this suggestion, and I look forward to a fruitful discussion.

Replies: 17 comments 19 replies

Angelelz
Nov 14, 2023
Collaborator

Thank you for taking the time to write this.

The following opinions are my own, and mine only.

I personally think that the core drizzle-orm package should stay dependency-free. That is one of the selling point that brought me to it.
The opt-in architecture of plugins like drizzle-zod, drizzle-kit, the driver I use, etc. is just top of the line. That means I'm in control of what I bring in my project, totally in control. That allows the users to decide if they want zod, valibot, typebox or their own validator library.

I think the approach we should follow is improve the experience of the plug-ins we have.
For example, what could be missing in drizzle-zod that made you suggest this feature?

0 replies

valerioleo
Nov 14, 2023
Author

I agree with you about keeping drizzle dependency-free, and I think that it can still be achieved with the API example I provided.
The .validation() functions could just take in any function. All they do is pass the column or row to whatever validator function the user passes. It could be totally up to the user to install Zod, Joi, or anything else.

What I miss from drizzle-zod is exactly this, the possibility of having "in-client" validations by default.
If you think about it, which better place for the users to describe their data in detail if not the schema.ts file? It would be awesome to describe your schema in very granular details in a single file. Drizzle could then split these validations between DB (for whatever is supported) and runtime (e.g. string lengths, email validations, etc...). All of it becomes invisible to the user that just interacts with the client.

0 replies

Angelelz
Nov 14, 2023
Collaborator

Your clarification just changed my mind! I think this is a great idea!
A couple of questions though:

if the user passes validation to both the table and the columns, what should be the behavior, order of execution?
Should the user expect the validation to be run on insert, update or both?
I saw that you included: id: serial('id').primaryKey().validation(z.string().default(() => uuid())),. I believe that will create that key in the returned object. Should that key supersede the possible implemented .default() or .$default() values?
In general, should drizzle just run the validation passed and insert/update the returned object?
Something to note here is that z.string().default(() => uuid()) doesn't return a validation function but an object with a parse method. We should make sure we are clear on what we accept.

Having said this, I think we should convert this discussion to a feature request. I think it is a nice idea.

2 replies

@valerioleo

valerioleo Nov 14, 2023
Author

Thank you for your thoughtful questions! Take my answers with a pinch of salt because I am no Drizzle expert at all, but here is my feeling about them:

Behavior and Order of Execution for Table and Column Validations:
If both table and column validations are provided, the most logical approach would be to execute column-level validations first and then proceed to table-level validations. This is because column-level validations are more granular, focusing on individual fields, whereas table-level validations can involve more complex logic potentially spanning multiple fields or even external data.

Example: Stocks Table
- Column-Level Validation: For a circulatingSupply column in a Stocks table, we might have a validation ensuring it's a non-negative number:
```
circulatingSupply: number('circulatingSupply').validation(z.number().nonnegative()),
```
- Table-Level Validation: The table-level validation might ensure that the circulatingSupply is equal to the sum of all userBalances for that stock, requiring an asynchronous operation:
```
.validation(async (tableData) => {
 const totalUserBalances = await calculateTotalUserBalancesForStock(tableData.id);
 if (tableData.circulatingSupply !== totalUserBalances) {
 throw new Error("Circulating supply must equal the sum of all user balances.");
 }
});
```
This example demonstrates how column-level validations handle individual field constraints, while table-level validations can enforce complex business rules involving multiple data points.
Validation on Insert and Update:
The validation should ideally be implemented for both insert and update operations to maintain data integrity throughout the data lifecycle. This ensures that any new or modified data adheres to the defined schema and business logic.
Default Values and Validation:
In the case of id: serial('id').primaryKey().validation(z.string().default(() => uuid())),, the default value provided by the validation should take precedence. This approach allows for a consistent behavior, where the validation library manages default values and data integrity.
Behavior with Validation Results:
Drizzle should execute the validations as specified and use the validated object for insert/update operations. This process ensures that all data interactions with the database conform to the defined validation rules, upholding data quality.
Handling Validation Functions and Objects:
For validation libraries like Zod, which return objects with a parse method, Drizzle should be designed to recognize and utilize these objects correctly. Clear documentation specifying the expected format of validation functions or objects will aid developers in implementing validations correctly.

I want to reiterate that this is a quick response and not much thought has gone into it. Many edge cases might have been missed :)

@yf-hk

yf-hk Apr 12, 2024

Totally agree, lacking an event hook system is the only reason holding me back from swithing to drizzle. Both TypeORM and Prisma can do this job. Our system heavily depends on it. But with Drizzle it's not quite possible to achive it without using some ugly hacks

valerioleo
Nov 16, 2023
Author

Hello again,

After further reflection on the proposed enhancements for Drizzle ORM, I've developed an additional perspective that might further expand our capabilities: replacing the .validation() method with a more versatile .use() middleware approach.

This concept of middleware aligns more closely with the pattern seen in many modern web frameworks. It opens up a plethora of possibilities, going beyond mere validation. Here's how it could look:

export const mySchemaUsers = mySchema.table('users', {
 id: serial('id').primaryKey().use(myIdMiddleware),
 name: text('name').use(myNameMiddleware),
});

Expanded Capabilities with Middleware:

Beyond Validation:
While maintaining the ability to validate data, middleware can perform various other functions. For example, it can transform data before it's saved or after it's retrieved from the database.
Data Type Transformations:
Middleware can be extremely useful for data type conversions, such as transforming BigInt to number, or vice versa, ensuring seamless handling of different data types.
JSON Field Handling:
Another application could be the conversion and validation of JSON DB fields into actual JavaScript objects, streamlining the process of working with complex data structures.
Custom Business Logic:
It can facilitate the implementation of custom business logic at the schema level, like adding timestamps, modifying data based on specific conditions, or implementing complex inter-field logic.
Plug-and-Play Extensions:
This approach also allows for the easy integration of third-party libraries or custom functions as middleware, enhancing the ORM's flexibility and capability.

By conceptualizing these functions as middleware, we not only retain the initial idea of enhanced validation but also open the door to a more powerful and flexible data-handling paradigm within Drizzle ORM.

I'm curious to know what the community and maintainers think about this expanded idea. Does this middleware approach align with the goals and architecture of Drizzle ORM?

Looking forward to hearing your thoughts and suggestions!

10 replies

@valerioleo

valerioleo Nov 27, 2023
Author

I really love the rollback idea! 👏🏻
I think it can solve the atomicity problem I mentioned before

@AlexBlokh

AlexBlokh Nov 28, 2023
Maintainer

@valerioleo @CanRau we should compose a real world business example, it is vital to validate api designs on production data

@valerioleo

valerioleo Nov 28, 2023
Author

@AlexBlokh I didn't understand if it was a request to me or @CanRau -- if it was, I'm very happy to provide anything that can help.

@CanRau

CanRau Nov 29, 2023

Hey sorry, notification got a bit lost, any suggestion?

Different api design, but I gave more usage examples over at #1426

Middleware would make it pretty easy to enforce things in-app, but like on the connection level, no-one could just go around them without creating a new connection.

Like restricting certain fields from being fetched at all or returned, e.g. we have an internal only id and an additional public_id on some tables, id should usually not leak into the client, examples in the discussion linked #1051 (comment)

I have polymorphic relations, as Planetscale doesn't even support foreign keys so that's not important to me, and instead of having to define on each query what the table relates to I could define this in the middleware, for this specific case there might be something else coming like mentioned here #1051 (comment) ? Though middleware might even make the other api unnecessary 🤷🏼

And I'd love to have soft-deletes which I currently have to manually do everywhere and remember, instead of being able to call .delete and have the middleware turn that into an .update with status changed and deleted_at date set on tables where it's needed.

And as @valerioleo mentioned being able to enforce additional validation/normalization on data before inserting or updating would be very nice, for some columns I store a separate column_raw which is mostly untouched whereas column is normalized to our needs, this could also happen in a middleware instead of having to do this manually everywhere
On more specific example is domain & domain_hash where the latter could be handled via middleware.

There's so much stuff going on that it's just such a nice place to have all this kinda along the schema but on the app level, uh a lot of this is also due to the fact that Vitess (Planetscale) doesn't support stored procedures, at least currently, so I couldn't put most of that logic directly into the db, that's why orm level, right next to the schema is a fantastic alternative which can even be more powerful as I have all of Javascript available 🔥

Not sure this is exactly what you've asked for though hope it helps, just let me know 🙏🏼

@CanRau

CanRau Nov 29, 2023

Also forgot to mention even tho it's in #1426 is having one centralized place to sort subqueries would be amazing, I mean specifically I have subqueries within the select using JSON_ARRAYAGG & JSON_OBJECT but MySQL doesn't "respect" order by within those subqueries, so maybe I could have an orderBy on the subquery and in the .use middleware "enforce" it via javascript 🤔

2coo
Dec 20, 2023

The functionality that developers can extend their drizzle client like prisma client extension.
It would improve the developer experience way better.

Scenarios that I have faced:

Hashing the password value before saving it into a database.
Adding a new property to query results at runtime and type-level.
e.g. to use casl ability it requires to __caslSubjectType__ property.
Ability to implement soft delete feature like https://github.com/olivierwilkinson/prisma-soft-delete-middleware/blob/main/src/lib/createSoftDeleteMiddleware.ts

Sorry for spamming.

1 reply

@2coo

2coo Apr 16, 2024

Just sharing:

$onUpdate hook added at https://github.com/drizzle-team/drizzle-orm/releases/tag/0.30.5

Angelelz
Dec 22, 2023
Collaborator

I've been thinking about this. There is a lot to consider. I think the middleware should be at the table level only to avoid any race condition/complications around the order of operations. To be able to make it useful enough, use should accept and object with the following properties:

const usersMiddleware = {
 beforeSelect: ({table: CurrentTable, selectedFields: SelectFields, where: SQL, logger: CustomLogger}) => ({ SelectedFields, where }),
 afterSelect: ({result: DependsOnTheSelect[], logger: CustomLogger }) => CustomReturn,
 beforeInsert: ({ table: CurrentTable, insertedFields: Partial<InsertedFields>, logger: CustomLogger }) => ({ insertedFields, returnedObj }),
 afterInsert: ({result: ResultTypeFromTheDriver, returnedObject: WhateverTheUserWants, logger: CustomLogger}) => unknown,
 beforeDelete: ({table: CurrentTable, where: SQL, logger: CustomLoger}) => boolean | ((db: Database) => unknown),
 afterDelete: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => unknown,
 beforeUpdate: ({ table: CurrentTable, insertedFields: InsertedFields, logger: CustomLogger }) => InsertedFields,
 afterUpdate: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => unknown,
}

Not sure if logger is a good fit here, or the user should have it in the scope. As you might see, the names are very descriptive and make the intent very easy to guess. This avoid the if-else nightmare of an event-like api like on, where you have to do something different depending on the type of event. Even though this is basically the same thing, you can just subscribe to whichever you like and make sure you'll get what you are suppose to get in the callback.

Here is an annotated example of what this could allow:

const users = pgTable('users', {
 id: serial('id').primaryKey(),
 tenantId: bigint('tenant_id').notNull(),
 name: varchar('name', { length: 150 }).notNull(),
 password: varchar('password', { length: 300 }).notNull(),
 lastName: varchar('last_name', { length: 150 }).notNull(),
 fullName: varchar('full_name', { length: 300 }).notNull(),
 createdAt: timestamp('created_at').notNull().defaultNow(),
 updatedAt: timestamp('updated_at').notNull().defaultNow(),
 deletedAt: timestamp('deleted_at'),
 randomId: varchar('random_id').notNull()
}).$use({
 beforeSelect({table, where, selectedFields }) {
 // Here we make sure that if the query is db.select().from(users), the password won't come out.
 // But if we do db.select({ password: users.password }).from(users), it would work.
 if (!selectedFields) {
 const { password, rest: selectedFields } = getTableColumns(table);
 }
 // You could implement multi tenant applications like this
 return { selectedFields, where: and(where, eq(table.tenantId, getTenantId()), isNull(table.deletedAt)) };
 },
 beforeInsert({table, insertedFields, logger}) {
 // Here you can modify the data, validate. The sky is the limit.
 if (Array.isArray(insertedFields) {
 insertedFields = insertedFields.map((field) => userSchema.parse({
 ...field,
 randomId: generateRandomId(),
 fullName: `${field.name} ${field.lastName}`
 })
 } else {
 insertedFields.randomId = generateRandomId();
 insertedFields.fullName = `${insertedFields.name} ${insertedFields.lastName}`
 insertedFields = userSchema.parse(insertedFields);
 }
 // Here we can define a custom object to return after the insert:
 return {
 insertedFields,
 generatedRandomId: Array.isArray(insertedFields) ? insertedFields.map(field => field.randomId) : insertedFields.randomId
 }
 },
 afterInsert({ result, returnedObject, logger }) {
 logger("inserted random id(s): ", returnedObject.generatedRandomId)
 return { result, randomId: returnedObject.generatedRandomId }
 },
 beforeDelete({table, where }) {
 // Here we can do soft deletes:
 return async (db) => await db.update(users).set({ deletedAt: new Date() }).where(and(where, eq(table.tenantId, getTenantId()))
 }
}

This are just the examples I came up with and that I see people asking about.
Notice how this would basically deprecate $default() and the possible $onUpdateFn that's in the works.
Having $use defined this way would also help in separating the code that's supposed to be sql code vs the code that's supposed to only run in javascript.

3 replies

@Angelelz

Angelelz Dec 22, 2023
Collaborator

A lot of question are still in the air. What about joins? what about subqueries, CTEs, etc.
In a join you are selecting the table, so should it trigger the beforeSelect?
Same for subqueries and CTEs.
Another question is, what about RQB?

@Variel

Variel Jun 18, 2024

I hope middleware functions to be asynchronous.

const usersMiddleware = {
 beforeSelect: ({table: CurrentTable, selectedFields: SelectFields, where: SQL, logger: CustomLogger}) => Promise<({ SelectedFields, where })>,
 afterSelect: ({result: DependsOnTheSelect[], logger: CustomLogger }) => Promise<CustomReturn>,
 beforeInsert: ({ table: CurrentTable, insertedFields: Partial<InsertedFields>, logger: CustomLogger }) => Promise<({ insertedFields, returnedObj })>,
 afterInsert: ({result: ResultTypeFromTheDriver, returnedObject: WhateverTheUserWants, logger: CustomLogger}) => Promise<unknown>,
 beforeDelete: ({table: CurrentTable, where: SQL, logger: CustomLoger}) => Promise<boolean | ((db: Database) => unknown)>,
 afterDelete: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => Promise<unknown>,
 beforeUpdate: ({ table: CurrentTable, insertedFields: InsertedFields, logger: CustomLogger }) => Promise<InsertedFields>,
 afterUpdate: ({result: ResultTypeFromTheDriver, logger: CustomLogger}) => Promise<unknown>,
}

one of my use case is to encrypt/decrypt some columns with AWS KMS, which provides async methods to encrypt/decrypt data.
otherwise, supporting async functions for customType can be another option for this use case.

const encrypted = customType<{
 data: string;
 driverData: string;
}>({
 dataType() {
 return 'varchar(256)',
 },
 async fromDriver(value) {
 return await decrypt(value);
 },
 async toDriver(value) {
 return await encrypt(value);
 }
})

@ak4zh

ak4zh Jul 30, 2024

I think an implementation like django-signals would be a good fit here.
We can setup signals in schema like below with possibility to add user defined custom signals.

const users = pgTable('users', {
 id: integer("id"](id: serial('id').primaryKey()
}).signals(
 preSave: async (row) => doPreSaveValidation(row),
 postSave: async () => runPostSaveCallback,
 preDelete: async () => runSomeValidation,
 postDelete: async () => someCallBack,
)

Or may be custom signals with the ability to set when a callback runs and whether it is a returning callback or not.
A returning callback for example returns validated data, a non returning callback just runs a callback but returns the actual data returned by sql query.

const users = pgTable('users', {
 id: integer("id"](id: serial('id').primaryKey()
}).signals({
 parse: async ({ row }) => zodSchema.parse(row),
 validate: async ({ row }) => validation(row),
 notify: async ({ table, row }) => sendNotification(table, row),
 warn: async ({ row} ) => sendWarning(row)
])
.on([‘insert’, ‘update’], users.signals.validate)
.on([‘insert’, ‘update’], users.signals.notify)
.on([‘select’], users.signals.parse(returning=true))
.on([‘delete’], users.signals.warn)
const sessions = pgTable('users', {
 id: integer("id"](id: serial('id').primaryKey()
 userId: integer("user_id").references(() => users.id)
}).signals({
 // users table will be the sender 
 invalidate: async ({ sender }) => invalidateAllSessions(sender)
})

cbetori
Jan 13, 2024

Middleware is a must have in my opinion. It's actually preventing me from migrating a current project to drizzle. I think Sequelize offers a strong solution.

Some important aspects for me:

Hooks at global/table level
Ability to pass arguments when calling a query (ex. Passing in user detail to add detail to createdby column)

2 replies

@oktogmbh

oktogmbh Jan 30, 2024

Same here. We currently use Prisma client extensions for the following use-cases:

Table, row and field access policies
Extended permission checks
Search-cache indexing
Performance logging
Soft-delete and filtering
Extended validation

@mxz7

mxz7 Mar 5, 2024

Same, I currently use prisma's middleware to time & count queries. This is the one thing holding me back from migrating

ikupenov
Feb 29, 2024

For anyone needing to plug into Drizzle's lifecycle, check this gist. It's not perfect but it's a good workaround until this is officially supported.

0 replies

cheft
Mar 25, 2024

Is there an implementation plan? I really need this feature.

1 reply

@yf-hk

yf-hk Sep 15, 2024

I need it too, it's the only reason which stops me from migrating my projects from TypeORM/Prisma

cayter
Jun 28, 2024

We currently wrap drizzle in a base repository https://gist.github.com/cayter/49d5c256a885d90c399ca6c1eca19f51 which it will trigger the before and after hooks. Note that we only made it working with postgres.

0 replies

omnaiduu
Nov 10, 2024

No update 😕

0 replies

waynehamadi
Nov 10, 2024

I like that Drizzle takes its time for this, because it's very easy to get this one wrong.
And we can actually implement it on our own with a repository or proxy pattern (see above the gists).
I prefer the proxy pattern because I don't want the user (the dev) to create a new language that wraps Drizzle.

On Django, we have the same issue: the core team is slower to add things, particularly when they're not really core. Which is a good thing.

But in Django there is also a great ecosystem of third party packages, in particular, middlewares. Then the Django core team eventually decides to integrate these packages into Django based on how successful they are.

The sad thing though, is that THIS feature WOULD allows a package ecosystem to thrive ! haha!

So it's a bit like a chicken and egg problem.

WE CAN implement a middleware/proxy pattern on top of the existing DSL, but we're missing the primary hooks to do so.

I tried to think about what's the best way to hook into this and I ended up with the same thing as the solution above: https://gist.github.com/ikupenov/10bc89d92d92eaba8cc5569013e04069 so I invite you to look into it.

So the drizzle team could give us functions built on top of this proxy pattern and "let us play with it". Because right now we're all building our own proxy system, which doesn't really feel right to me (it's also a lot of nasty work and type checks)

If these hooks are built properly, this might foster an ecosystem as big as Vite on the DB side of things (BTW, the api could be very similar to what Vite offers, this would facilitate adoption).

I think a good start is to JUST give the function that's called AFTER the proxy, and the function simply passes exactly the context in which drizzle was called (all the db.insert. etc, etc... plus their arguments).

No after_save, no after_update, just something very low level that could allow these higher level hooks to be built, this way it's easier to get it right for the Drizzle team.

Implementing this is safer than coming up with the perfect "vite-like" api. Which is hard to get right.

0 replies

ikupenov
Nov 11, 2024

There are some improvements in the proxy solution. It’s now a lot more flexible and hooks can easily be added. Bear in mind that if you go with this approach you’ll need to maintain it and there are some breaking changes coming with V1. Shouldn’t be too bad but worth keeping in mind.

That’s the updated gist: https://gist.github.com/ikupenov/26f3775821c05f17b6f8b7a037fb2c7a.

0 replies

Uh oh!

Proposal for Extending schema.ts in Drizzle ORM with Runtime Middlewares #1513

Uh oh!

Advantages of This Approach:

Replies: 17 comments · 19 replies

Uh oh!

Angelelz Nov 14, 2023 Collaborator

Uh oh!

valerioleo Nov 14, 2023 Author

Uh oh!

Angelelz Nov 14, 2023 Collaborator

Uh oh!

valerioleo Nov 14, 2023 Author

Uh oh!

Uh oh!

Uh oh!

valerioleo Nov 16, 2023 Author

Expanded Capabilities with Middleware:

Uh oh!

valerioleo Nov 27, 2023 Author

Uh oh!

AlexBlokh Nov 28, 2023 Maintainer

Uh oh!

valerioleo Nov 28, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Angelelz Dec 22, 2023 Collaborator

Uh oh!

Uh oh!

Angelelz Dec 22, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Proposal for Extending `schema.ts` in Drizzle ORM with Runtime Middlewares #1513

Replies: 17 comments 19 replies

Angelelz
Nov 14, 2023
Collaborator

valerioleo
Nov 14, 2023
Author

Angelelz
Nov 14, 2023
Collaborator

valerioleo Nov 14, 2023
Author

valerioleo
Nov 16, 2023
Author

valerioleo Nov 27, 2023
Author

AlexBlokh Nov 28, 2023
Maintainer

valerioleo Nov 28, 2023
Author

Angelelz
Dec 22, 2023
Collaborator

Angelelz Dec 22, 2023
Collaborator