-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Duration storage #5955
-
Now that I've got our Duration
support working properly again, it has re-occurred to me that the way Hibernate stores Duration
s is not very user-friendly: sure, Java represents all durations with nanosecond precision, but it's not very nice to open up the database and see a value with 11 zeros on it representing one measly hour. A nanosecond is not actually a very useful human-scale measure of time.
So to my mind, a much friendlier way to store a duration is as seconds, i.e. using numeric(21,9)
instead of numeric(21)
.
Or, hell, even as days would make sense, using something like numeric(22,14)
.
How could we accommodate this in a way that doesn't break backward compatibility with our own data?
Well, I know I've resisted introducing new special-purpose annotations for this kind of thing in the past, but here I guess I would propose a new @DurationStorage
annotation, with enum TimeZoneStorageType { NANOSECOND, SECOND, DAY }
.
This would actually be pretty straightforward to implement: all the machinery is there. I was very pleasantly surprised to notice that actually implemented HQL duration support in a pretty robust way, where durations know about their own precision and the translation code automatically adapts to the precision of the duration. So that's easy enough.
Finally, I believe that @beikov would like to be able to store a Duration
in a SQL interval second
column, and I think we can accommodate that here too, though it would take a bit more work to get it working in HQL. (What was there previously was not working.) Ultimately we could have something like:
enum DurationStorageType {
NANOSECOND,
SECOND,
DAY,
INTERVAL_SECOND
}
Alternatives?
- Well, one straightforward alternative worth due consideration is to just introduce
SecondDurationJavaType
andDayDurationJavaType
, and use the@JavaType
annotation. This would work well-enough, I believe, and I guess it could also be extended to accommodateIntervalDurationJavaType
. Note, however, that the HQL translator would need to be aware of these newJavaType
s and special-case them usinginstanceof
, so it's not quite as clean as it looks. - Perhaps this can even be handled with just a converter? I would have to try. (Hell, it probably could even be done with a
JdbcType
, though I think that would be an abuse of the APIs.)
Either of these alternative approaches has the advantage of not introducing new annotations, though on the flip side that makes the functionality less-discoverable to users. And I guess we could argue about which of:
@DurationStorage(DAY)
Duration duration;
vs
@JavaType(DayDurationJavaType.class)
Duration duration;
is more aesthetically pleasing. I'm not sure, but I guess I lean toward the first option. Especially since we have already gone there with @TimeZoneStorage
.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 3 comments 11 replies
-
That looks quite nice, and I have no objections at all in adding such handy mapping annotations - however let me point out that I don't agree with the premise of "user friendlyness" of not storing nanoseconds.
Encoding such values at high resolution (not precision) at least it ensures that on reading a stored value, the same value is returned. Let's remember that time-zone dependent values are a mess as they depend on politics and systems being updated timely and consistently to be aware of whatever weird non-technical decisions politicians made; even assuming time-zone settings are correct from a technical point of view, it's quite possible that the JVM and the DB are at different stages or quality of patching in such regard (i.e. one might not have been updated to the latest oddness yet), and since storing at lower precision implies truncations based on such inconsistent rules, the "user friendliness" is arguable. Storing at highest resolution at least prevents some classes of such oddities.
Also, most users won't care what you store. So while I like being able to control the resolution I'd keep high resolution as a default.
Beta Was this translation helpful? Give feedback.
All reactions
-
however let me point out that I don't agree with the premise of "user friendlyness" of not storing nanoseconds.
I'm not suggesting not storing them. numeric(21,9)
lets you store nanoseconds as the fractional part, after the decimal point. And since numeric
is an exact type, there's no rounding error.
UPDATE: this isn't quite right, as @yrodiere points out below, for DAY
storage, there can be a rounding error, it looks like numeric
is actually a fair bit worse than double
in this respect.
On the other hand, SECOND
storage doesn't have rounding error.
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
great!
Beta Was this translation helpful? Give feedback.
All reactions
-
but it's not very nice to open up the database and see a value with 11 zeros on it representing one measly hour
I'd argue that's an endless problem: when I'm expressing a duration in years, the number of days will likely be so large it will feel meaningless.
Or, hell, even as days would make sense, using something like numeric(22,14).
I very much doubt all durations expressed as a fraction of a day can be represented as a decimal number (with a finite number of decimal digits). You're bound to hit rational but non-decimal numbers at some point and then you might lose precision. That's the kind of limitation I'd hate to expose users to, especially just for the convenience of reading "understandable" number in the DB, and especially if the problem can be introduced by something so seemingly innocuous as picking one enum value over another.
For example take a duration representing a prime number of seconds, say 32212254719
. In days (divided by 86400
), that's 372827.0222106481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481481...
.
Rounding might work out fine in this case (I don't know), but then it might not in other cases.
enum TimeZoneStorageType { NANOSECOND, SECOND, DAY }.
I suppose you meant DurationStorageType
? I don't see what Timezone has to do with this.
Alternatives?
I believe either alternative is preferable to @DurationStorage
, be it only because the separation between "safe" storage and unsafe storage such as "days" is clearer.
Beta Was this translation helpful? Give feedback.
All reactions
-
Looks good.
Ah hahah, wait, no it doesn't. What?? Let me try again.
Beta Was this translation helpful? Give feedback.
All reactions
-
OK, fine, you're right @yrodiere, numeric
on postgres is significantly worse at this than double
. This I did not know.
So, fine, DAY
storage probably isn't a great idea.
hreact=# create table t(t double precision);
CREATE TABLE
hreact=# insert into t values(32212254719.0/86400);
INSERT 0 1
hreact=# select t*86400 from t;
?column?
-------------
32212254719
(1 row)
Good.
hreact=# create table t(t numeric(24,15));
CREATE TABLE
hreact=# insert into t values(32212254719::numeric/86400::numeric);
INSERT 0 1
hreact=# select t*86400::numeric from t;
?column?
-----------------------------
32212254718.999999987200000
(1 row)
Bad, as you predicted.
Beta Was this translation helpful? Give feedback.
All reactions
-
Reversability of the function being able to get back to the exact same representation would be my primary concern; and not because it could be technically wrong (I doubt that) but because of user surprise.
No biggie, I suppose we could warn about that possibility. I just don't want to have to explain that to hundreds of users who don't have basic math training :)
Beta Was this translation helpful? Give feedback.
All reactions
-
Wait, perhaps I misunderstood your concern. Is your concern reversibility of the conversion to days and back, when you have something like an exact quantity of hours?
Indeed, that was my concern. Users are not introducing the mistake here, we are.
No biggie, I suppose we could warn about that possibility
I doubt documentation will cut it here, especially if we make it so easy to switch from SECONDS
to NANOSECONDS
to DAYS
. It just feels surprising that one of those carries such limitations, especially if those limitations are not consistent (only for some prime numbers etc.).
Like I said, the alternative solutions might be easier to document effectively (read: with users actually getting warned before using the feature).
OR we could just always truncate values, so that at least the behavior is consistent. It feels "right" to me that users needing nanosecond precision will end up storing an (integer) number of nanoseconds while users needing only day/second precision will end up storing an (integer) number of days/seconds. And if we name the annotations/enums right, the truncation won't be surprising at all.
Beta Was this translation helpful? Give feedback.
All reactions
-
So it looks like it works as long as you call round()
explicitly. (Tried on both postgres and MySQL.)
hreact=# select round(t*86400) from t;
round
-------------
32212254719
Which is fine. But the whole point of this was to get values displayed nicely to the user in adhoc queries, where they won't (by default) know to use round()
. So that undermines the value of having this at least somewhat.
Beta Was this translation helpful? Give feedback.
All reactions
-
So it seems to me that we all agree the following options would be useful:
enum DurationStorageType {
NANOSECOND, // store as numeric(21)
SECOND, // store as numeric(21,9)
INTERVAL_SECOND // store as interval second
}
I think that a truncating DAY
option could also make sense, similar to how @Temporal(DATE)
would truncate the time part away, but maybe we should rather name it DAY_TRUNCATE
if we want that and store it as bigint
?
Apart from that, the idea to have a storage type enum kind of would help mapping Period
to other interval types (also see #4892)
Beta Was this translation helpful? Give feedback.