Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

DDL support for partitioning #5360

beikov started this conversation in Design Proposals
Sep 30, 2022 · 3 comments · 6 replies
Discussion options

Moving the discussion from #5353 to here, the idea is to introduce the following annotations which will be used for emitting partitioning DDL in hbm2ddl.

  • @PartitionByRange({ @RangePartition(name = "...", from = "...", to = "...") }) // from is inclusive, to is exclusive
  • @PartitionByList({ @ListPartition(name = "...", keys = {...}) })
  • @PartitionByHash({ @HashPartition(name = "...") }) // modulus is determined by count of partitions, index determines remainder value

I'll list the databases that have some support for partitioning:

Database Range List Hash
Oracle ✓(only less than supported)
DB2
PostgreSQL ✓ (since v11)
SQL Server ✓(bounds are inclusive) ✓(emulate through range?) ✓(emulate through computed column)
MySQL ✓(only less than supported)
MariaDB ✓(only less than supported)

For databases that don't support partitioning we simply don't generate DDL or fail. Partitioning is a tool for data management but it shouldn't matter semantically if partitioning is used or not.

The only part I am unsure about is range partitioning since SQL Server uses inclusive bounds, we can't just omit the from member and infer that through the order of defined partitions, as we will have to use LEFT bound values for the DDL on SQL Server. We could think about dropping the to member though, as we could infer that from the next partition value and for the last partition, use the special MAXVALUE. So I am pretty sure we could support

@PartitionByRange({
 @RangePartition(name = "p1", start = "0"),
 @RangePartition(name = "p1", start = "10") 
})

to generate

PARTITION BY LIST (..)
(
 PARTITION p1 VALUES LESS THAN 10
 PARTITION p2 VALUES LESS THAN MAXVALUE
)
You must be logged in to vote

Replies: 3 comments 6 replies

Comment options

I don't hate it, but I'm still sorta struggling to see myself wanting to write this mess of annotations rather than just writing fragments of native SQL in, say, @Partition and @PartitionBy annotation.

@PartitionBy("RANGE (logdate)")
@Partition("FROM ('2006-02-01') TO ('2006-03-01')")
@Partition("FROM ('2006-03-01') TO ('2006-04-01')")

Or, alternatively:

@Partition(by="RANGE (logdate)", 
 partitions={"FROM ('2006-02-01') TO ('2006-03-01')", 
 "FROM ('2006-03-01') TO ('2006-04-01')"})

I think this would be much easier to implement, and probably also easier to use.

If there's a need for "portable" cross-dialect partitioning, which I just don't see as something that's like to be common at all, then we could easily add @PartitionBy/@Partition to @DialectOverride.

That way we would let the user customize partitioning for the particular database, and their particular needs, rather than struggle by ourselves to abstract out some sort of lowest-common-denominator partitioning facility, where the abstraction would by nature not even be very transparent.

WDYT, @beikov ?

You must be logged in to vote
5 replies
Comment options

beikov Nov 14, 2022
Maintainer Author

I don't mind if you want to experiment with this approach, but I guess I'd rather not use the "good name" for something that expects SQL dialect specific DDL fragments. I'd like to keep the good name for the abstraction if we anticipate to add this.

I really think that list and hash partitioning are something that can be nicely abstracted and that specifying the partitions for these partitioning schemes is actually something that make sense for people to define in their code.
For range partitioning I am not so sure. I guess most people that use range partitioning will have some sort of tool that automates the generation of new partitions. Specifying the partitions for range partitioning in code then seems a bit unnecessary. We could maybe start out with a simple @PartitionByRange annotation that doesn't accept partitions. We would simply emit a default partition in the DDL that accepts all values. If people start using this and request this feature, we can think about adding it. Wdyt @gavinking?

Comment options

My main issue is that I think the annotations you're proposing are going to end up verbose, difficult to read, and inconvenient to write, compared to just writing fragments of SQL, and that the more verbose approach doesn't actually offer any benefits in practice because the number of systems in the intersection of 1. using partitioning, 2. doing cross-db development, and 3. using schema export to create the partitioned schema, is a vanishingly tiny set.

What i'm saying is that something like what I sketched out above is more user-friendly, significantly less complicated, and also makes it easier for people to get at every last little nuance or toggle switch of what their db offers, rather that having to target some sort of lowest-common-denominator.

Comment options

beikov Nov 14, 2022
Maintainer Author

the number of systems in the intersection of 1. using partitioning, 2. doing cross-db development, and 3. using schema export to create the partitioned schema, is a vanishingly tiny set.

This kind of makes me think we should maybe not do it at all unless someone from the community steps up and provides us with real world needs and use cases.

What i'm saying is that something like what I sketched out above is more user-friendly, significantly less complicated, and also makes it easier for people to get at every last little nuance or toggle switch of what their db offers, rather that having to target some sort of lowest-common-denominator.

Yeah that's true. Since the @Partition column wouldn't interfere with the naming I proposed, it's fine for me if you want to take a stab at that.

Comment options

@beikov
We'd like to provide our real-world use case:
There's an account table (containing accounts and balances), every day we make a snapshot into the account_snapshot table (that contains snapshot_date, account, and balance cols). That's a huge table, and we'd like to partition this table by snapshot_date column, by range (weekly). We're restricted to use hbm2ddl to create database schema.
For us, minimal feature would be the support of partitioning in hbm2ddl, creation of partions themselves would be done via native queries.

Comment options

beikov Feb 26, 2024
Maintainer Author

Your request is noted and I understand that it is desirable to have this, but you really shouldn't rely solely on hbm2ddl for schema management.

Comment options

A few comments related to Oracle. Since 11g release, you have:

  • INTERVAL partitioning: same as RANGE but with partitions automatically added
  • REFERENCE partitioning: equipartition a child table with its partitioned parent using parent's partition key (column(s) not present inside the child table)
  • SYSTEM partitioning: let the application (ORM?) control the placement of data

Since 12cR2, you have:

  • AUTO-LIST partitioning: partition maintenance for LIST partitioned table done automatically

Also, Oracle supports 2 levels of partitioning (called sub-partitioning) such as partition by interval and then by list.

Finally, regarding point 3 above, Oracle allows partitioning a non-partitioned table online without schema export/import since 12cR2.

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
Comment options

beikov Sep 18, 2024
Maintainer Author

This has nothing to do with this discussion. Going to delete this. Go to https://hibernate.org/community/ and ask your question on the forums or chat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

AltStyle によって変換されたページ (->オリジナル) /