-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Use CLOCK_BOOTTIME for Instant in Fuchsia/Android #132331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CLOCK_BOOTTIME for Instant in Fuchsia/Android #132331
Conversation
Fuchsia and Android both want Instants to progress during periods of suspension, and thus must use CLOCK_BOOTTIME as the backing reference clock.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @tmandry (or someone else) some time within the next two weeks.
Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review
and S-waiting-on-author
) stays updated, invoking these commands when appropriate:
@rustbot author
: the review is finished, PR author should check the comments and take action accordingly@rustbot review
: the author is ready for a review, this PR will be queued again in the reviewer's queue
cc @maurer
This comment was marked as off-topic.
This comment was marked as off-topic.
Don't have official review powers, but LGTM as the Android maintainer.
Could you also update this table in the docs?
Lines 221 to 229 in 88eafa4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once docs are updated.
r? rust-lang/libs - Both platform owners approve this change, but I would like your team to be in the loop also. This change is allowed by my reading of the Instant docs.
CLOCK_BOOTTIME
always progresses monotonically; the only difference is that it progresses during suspend, unlike CLOCK_MONOTONIC
. The platform owners have determined that this is a better default for applications on their platform, especially for use cases that interact with outside services. Android has long been patching their platform toolchain to do this. This change allows us to converge on one upstream implementation.
Fuchsia and Android both want Instants to progress during periods of suspension,
They cannot rely on this because the documentation explicitly does not guarantee for this behavior:
As part of this non-guarantee it is also not specified whether system suspends count as elapsed time or not. The behavior varies across platforms and Rust versions.
Until #87906 is settled we shouldn't introduce platform-specific behavior that makes code non-portable.
Also who is this "android wants" person that attempts to define how the Rust standard library should behave? And where did that discussion happen?
Edit: Ah I guess that was also #87906, though android has been mentioned only once deep in that thread as a code-reference.
As a brief summary of the situation resulting from the other thread:
- Android currently has a patch carried against the Rust compiler to make the above true (though our carried patch is broader, and does it for all unixy OSes).
- Even without being able to rely on it, ecosystem crates assume that
Instant
ticks in suspend. We have a library we use in first party code to make sure time reliably works correctly, but as long as ecosystem crates assumeInstant
ticks in suspend, we need to keep this patched.
Even without being able to rely on it, ecosystem crates assume that Instant ticks in suspend.
Then, at the moment, the crates have a bug. It would be great if we could offer a portable solution here, but currently we can't because not all OSes provide such a clock.
Shipping a custom toolchain for a specific target just makes even more crates fall for that trap because it works on the platform they mainly care about, essentially relying on implementation details.
So I don't see this as a path to actually solving the problem, possibly even making it worse.
The options I see are
- fixing those crates
- implementing a clock source API
- poking POSIX to standardize such a clock and getting it into the relevant OSes, but I'm skeptical that this is workable, embedded platforms might not have anything to measure elapsed time during sleep
- giving up on portability and saying things like "tier 1 platforms must have these properties, others might not" or something along those lines
Fuchsia and Android both want Instants to progress during periods of suspension, and thus must use CLOCK_BOOTTIME as the backing reference clock.
I've updated the docs to reflect the usage of boot time in Android and Fuchsia.
Until #87906 is settled we shouldn't introduce platform-specific behavior that makes code non-portable.
As @maurer has pointed out, the Instant
library already has platform specific behavior - it behaves differently on Windows vs Linux. I do not see how having it behave differently on Fuchsia and Android is any different.
Instant library already has platform specific behavior - it behaves differently on Windows vs Linux.
The current situation does not exist to enable specific crates to keep their broken implementations forever. But this PR would. Enabling crates in their reliance on implementation details on one platform makes this into a problem for other platforms. In other words you would be amplifying hyrum, and you're shifting the burden to other OSes.
This is... myopic, bad incentives/policy/game theory or however you want to call it.
It goes roughly like this:
- someone writes broken code that doesn't work with suspend
- someone else encounters this code
- "wouldn't it be nice if MY platform used a boottime clock instead?"
- "here's a PR that only fixes it on MY platform,"
- repeated by different people for all platforms that have a boottime clock
- we are now left with a majority of platforms on one clock source and a minority on another
std
aims to provide portable abstractions. That goal is not served by an incremental, myopic steps. The minority platforms are left out in the cold with subtly broken code. std
can still not make any additional guarantees about universal behavior. People ignore the docs even more than before and code to observed behavior instead because it works "almost everywhere" anyway. That's an undesirable outcome.
Instead the fact that such clocks aren't available on all platforms should be made explicit and the crates then will have to think about fallbacks or other ways to guard against whatever undesirable behavior we're talking about here in the abstract. Or at least they could make the error upfront and fail loudly when an essential building block is missing.
Additionally this ignores the issue that there also are uses where not counting suspend can also be useful, so it just trades one benefit for another rather than enabling both.
Yes the current situation isn't great. And you're feeling pain due to it. But imo that pain should be incentive to either fix the broken crates or help with a proper solution in std. Not papering over bugs for just one platform.
Also, just because which clock is used on particular platforms is documented does not mean you're allowed to rely on it. It says
The following system calls are [currently](crate::io#platform-specific-behavior) being used by `now()`
so even if this PR were accepted today, someone could come along tomorrow and change it back.
Is this even correct for Fuchsia in the first place? CLOCK_BOOTTIME
is the same thing as CLOCK_MONOTONIC
on Fuchsia: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/third_party/ulib/musl/src/time/clock_gettime.c;l=40-41
I have a patch to fix libc
to use the right syscall in Fuchsia. Fuchsia only recently implemented CLOCK_BOOTTIME, and we're still working through updating everything that needs it to use it.
The justification for this PR is in #88714 (comment):
It seems best for std's Instant to just use whatever is the 'standard' monotonic clock on each platform.
The owners of a platform get to define what that standard monotonic clock is for their platform. In this case, both Fuchsia and Android want the standard monotonic clock to be the boottime clock, and this PR reflects that.
std aims to provide portable abstractions. That goal is not served by an incremental, myopic steps. The minority platforms are left out in the cold with subtly broken code.
Which minority platforms are you worrying about here? This is two minority platforms deciding that they'd rather diverge from the majority (Linux and macOS) because they think it will work better in the majority of cases for the application classes that their platforms are designed for.
Yes we want portable code, but in the absence of a well-defined semantics the precise details of what that code does may be impacted by the platform it is running on, which is relevant insofar as it informs us what kind of behavior is expected for applications running on that platform.
There is precedent on either side; Windows being the main example of using boot time. It seems like you are arguing against the past decision to make the behavior of this API platform-specific in the first place. Arguments about past decisions should not block PRs like this.
Additionally this ignores the issue that there also are uses where not counting suspend can also be useful, so it just trades one benefit for another rather than enabling both.
Yes, and it would be great if std had two APIs to enable both cases, but it does not. In the absence of a perfect solution we are forced to make tradeoffs. The particular tradeoff here is what should most code on this platform do most of the time.
Yes the current situation isn't great. And you're feeling pain due to it. But imo that pain should be incentive to either fix the broken crates or help with a proper solution in std. Not papering over bugs for just one platform.
We'll have to disagree here – platform maintainers are not fungible assets who can take on a much larger project of defining new std APIs and migrating the ecosystem to it. That's orders of magnitude more work that requires a different set of skills and social connections, and it's letting the perfect be the enemy of the good.
Pushing back on incremental improvements like this does no one any good, including the Rust project. If someone is interested in contributing to Rust in a small way, even in a way that's initially scoped to their platform, let them. That's how we will grow contributors with more impact over time.
Would it simplify things if we refactored this PR to just focus on Fuchsia, and a separate one for Android? Right now all the known rust code that's targetting Fuchsia is owned by the Fuchsia team, whereas Android may have users that could be impacted by this change.
Yes we want portable code, but in the absence of a well-defined semantics the precise details of what that code does may be impacted by the platform it is running on, which is relevant insofar as it informs us what kind of behavior is expected for applications running on that platform.
That sounds like you intend to rely on implementation details. Is this correct?
There is precedent on either side; Windows being the main example of using boot time. It seems like you are arguing against the past decision to make the behavior of this API platform-specific in the first place. Arguments about past decisions should not block PRs like this.
No. The I'm arguing that guaranteed behavior is not provided here. When it comes to suspend it is unspecified, NOT specified-per-platform. The documentation is very explicit about this (even though it does not have to be, anything not explicitly specified is, well, unspecified):
As part of this non-guarantee it is also not specified whether system suspends count as elapsed time or not. The behavior varies across platforms and Rust versions.
Which means we could flip a coin at each program startup to choose between CLOCK_BOOTTIME and CLOCK_MONOTONIC on any platform that has both.
Therefore any program that relies on this property has a potential correctness problem on future Rust versions and a portability problem. Changing the behavior to aid crates that want to rely on non-guaranteed behavior is encouraging the wrong thing.
Pushing back on incremental improvements like this does no one any good, including the Rust project.
I don't see how it's an improvement. If we just let platform-maintainers do uncoordinated decisions, e.g. if platform A says "we think accounting for suspend should be the default" and platform B says "suspend is not time spent working on anything, therefore shouldn't be included", then Instant
is forced to guarantee neither.
To get to an endpoint where Instant can provide a guarantee that suspend is either included or excluded everyone must move in the same direction.
If that is not possible then we need to explore alternatives.
But this PR doesn't move in any of those directions. As far as I can tell it's aiming to encourage or enable crate authors to ignore the API contract.
Would it simplify things if we refactored this PR to just focus on Fuchsia, and a separate one for Android? Right now all the known rust code that's targetting Fuchsia is owned by the Fuchsia team
Fuchsia is Tier 2, the target tier policy says that it must not be exclusively useful to a closed group. So we should assume external users do exist.
My interpretation of
The justification for this PR is in #88714 (comment):
It seems best for std's Instant to just use whatever is the 'standard' monotonic clock on each platform.
is that was referring to the OS developers changing whatever their main/preferred clock API used for timing, waiting, sleeping etc. and Instant
following those changes as we upgrade to new major OS versions.
I.e. I understand it as an argument for implementation flexibility, enabled by the current absence of guarantees. Not as an argument for someone wanting to make Hyrum more powerful (more guarantees, less implementation flexibility).
So if you want to guarantee more behavior then I think you have to request an API change from the libs-API team.
Which means we could flip a coin at each program startup to choose between CLOCK_BOOTTIME and CLOCK_MONOTONIC on > any platform that has both.
Therefore any program that relies on this property has a potential correctness problem on future Rust versions and a portability > problem. Changing the behavior to aid crates that want to rely on non-guaranteed behavior is encouraging the wrong thing.
This feels rather idealistic. I guarantee you that if you were to actually implement that change to flip a coin on startup and randomly choose a clock, a lot of rust programs in the wild would break. Calling every single one of those programs incorrect is, is in my opinion, an extremely impractical position. Telling contributors that they must modify/fix the standard library whenever they want to make existing software that uses those libraries work on their platform is pretty frustrating.
In any case, I think this argumentation is kind of getting us nowhere. How am I expected to make progress here? I have a very simple problem to solve: a large amount of code on Fuchsia that uses Instant
expects it to progress during periods of suspension. I have four paths:
- Make
Instant
use boot time, as I'm doing here. - Keep a local patch to the std lib to use boot time like Android does.
- Modify the standard library API to make
Instant
's behavior during suspend more explicit. How does one even start doing something like this? - Implement a local wrapper around
Instant
that has the semantics I want. This works in Fuchsia code, but does not work well when using third party crates that use upstream instant. The inconsistencies in thoseInstant
s may cause problems.
How would the rust library maintainers like us to proceed? (side note: who makes these decisions for the rust community?)
a large amount of code on Fuchsia that uses Instant expects it to progress during periods of suspension. I have four paths:
- fix those crates
Make Instant use boot time, as I'm doing here.
This option is worse than hyrum's law. Currently this is not even an implementation detail. You want to make this an implementation detail and then turn around and write code that relies on that. So you want to create de-facto API guarantees without using those words.
I fear a few months later someone will then come and say "look at all this code we have written that relies on this detail [which we put ourselves into std], now it should be guaranteed", effectively backdooring the API design process.
I hope you see how shady this looks.
API design does not happen by the process of writing crates that want something, then changing the std implementation and then presenting us with a fait accompli.
How would the rust library maintainers like us to proceed? (side note: who makes these decisions for the rust community?)
Usually platform maintainers are not the same people writing most of the downstream crates for a platform, so they're not in a position where they want to change an std implementation to suit their own crates' needs.
If your code needs a novel API guarantee then that is a feature request, one that will require discussion and design work. It's the libs-api team that approves API changes. See https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html
Alternatively you can ask the libs-team if this could be considered a bug-fix or quality-of-implementation improvement, though I must say I would oppose that for the reasons I explained above.
If you're saying there are lots of programs that are broken unless you use a custom std which changes behavior that sounds like a rather dire situation and I feel that this has not been accurately reflected in previous discussion.
It would be helpful to have more context to understand the issue (perhaps not in this PR though). The kind and extent of breakage are we talking about, what are some big offending crates, how are they used, etc.
We discussed this in the libs-api meeting today. One concern that was raised is that the sleep/timeout APIs all use CLOCK_MONOTONIC
internally and this change would make Instant
inconsistent with them.
On a more general note, there isn't really a suitable global policy for whether to include time suspended or not: different call site may want different behavior depending on what is being measured.
As such we want to ask: would it be acceptable to leave Instant
on CLOCK_MONOTONIC
since it is the system default, but instead provide a clock source API which allows users to explicitly select whether to include suspend time? Here is a quick draft of what it could look like:
struct Instant<C: Clock = DefaultClock>; struct DefaultClock; // Clock used for OS sleep/timeout struct ContinuousClock; // Clock which counts time in sleep struct SuspendingClock; // Clock which doesn't count time in sleep trait Clock: Sealed { fn now() -> Instant<Self>; }
@Amanieu that's along the lines of what @mathukumillia and I were playing around with offline. I've started an experimental port of the standard library over to parameterize the Clock to see how it feels on a large codebase. I'm pretty sure we'd also need to parameterize Duration
as well on the clock, since we'd have flow the duration from std::thread::sleep(Duration<Clock>)
down to an API call like clock_nanosleep on Linux.
We also need to figure out how an external library like tokio would map these types to the various syscalls. In tokio-rs/tokio#3185 (comment), on Linux only some of the networking APIs support different clocks.
We specifically talked about Duration
and the conclusion was not to parameterize it. Fundamentally, a Duration
represents a time period (in seconds) which is independent of the clock on which it is measured. Different clocks may advance through time at different rates, but they all share the same unit of time (seconds).
All timeouts/sleeps are only defined against DefaultClock
, unless you specifically opt-in to a different clock. This opt-in should be in the form of separate functions that take an additional Clock
argument.
BijanVeyssi
commented
Feb 20, 2025
We discussed this in the libs-api meeting today. One concern that was raised is that the sleep/timeout APIs all use
CLOCK_MONOTONIC
internally and this change would makeInstant
inconsistent with them.On a more general note, there isn't really a suitable global policy for whether to include time suspended or not: different call site may want different behavior depending on what is being measured.
As such we want to ask: would it be acceptable to leave
Instant
onCLOCK_MONOTONIC
since it is the system default, but instead provide a clock source API which allows users to explicitly select whether to include suspend time? Here is a quick draft of what it could look like:struct Instant<C: Clock = DefaultClock>; struct DefaultClock; // Clock used for OS sleep/timeout struct ContinuousClock; // Clock which counts time in sleep struct SuspendingClock; // Clock which doesn't count time in sleep trait Clock: Sealed { fn now() -> Instant<Self>; }
@Amanieu I love the idea. As someone working on embedded software we really want to keep track of time during idle and it would be great if it was with std.
I have been trying to find a new issue related to this but could not find it. Is there any yet ?
Someone who is interested will have to implement this change and create a tracking issue.
https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html#tracking-and-stabilization
@mathukumillia and I are in the process of writing up an rfc for this that’s essentially the proposal in #132331 (comment). We should hopefully get it posted soon.
☔ The latest upstream changes (presumably #141829) made this pull request unmergeable. Please resolve the merge conflicts.
Fuchsia and Android both want Instants to progress during periods of suspension, and thus must use CLOCK_BOOTTIME as the backing reference clock.
r? @tmandry