-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Include additional hashes in src/stage0 #142139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch changes `bump-stage0` to include: * The sha256 hash of the channel manifest used to create `src/stage0`. * The rust and rustfmt git commit in `src/stage0`. * Hashes of all the artifacts, like the source tarball, in `src/stage0`. Combined this will allow for: * Projects that bootstrap their own compiler, such as Fuchsia, or users of [bootstrap], to build their compilers offline without needing to communicate with static.rust-lang.org. * Auditors to detect if the channel manifest, and all the artifacts inside the manifest, were modified after it was used to generate `src/stage0`. Furthermore, if they did find modified artifacts, they could determine if the Rust Signing Key was compromised by checking if any modified file was signed properly. finally, it allows regeneration of `src/stage0` when specifying both the day of the build for rust, and the day of the build for rustfmt, which can allow a maintainer to regenerate `src/stage0` to verify nothing changed. [bootstrap]: https://github.com/dtolnay/bootstrap [mrustc]: https://github.com/thepowersgang/mrustc
r? @onur-ozkan
rustbot has assigned @onur-ozkan.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.
Use r?
to explicitly pick a reviewer
These commits modify the Cargo.lock
file. Unintentional changes to Cargo.lock
can be introduced when switching branches and rebasing PRs.
If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.
r? release
Projects that bootstrap their own compiler, such as Fuchsia, or users of bootstrap, to build their compilers offline without needing to communicate with static.rust-lang.org.
Can you help me understand how this enables this where it wasn't before? AFAIK, many distros and companies already build the compiler offline -- just providing the stage0 compiler via a pre-step that is online-capable, putting paths to it in our existing bootstrap.toml flags. So what is this actually enabling?
Auditors to detect if the channel manifest, and all the artifacts inside the manifest, were modified after it was used to generate src/stage0. Furthermore, if they did find modified artifacts, they could determine if the Rust Signing Key was compromised by checking if any modified file was signed properly.
I'm not sure I understand the value of auditing the manifest separately from what's already done. The stage0 file in-tree already verifies all of the downloaded artifacts have stable hashes. We effectively have a trust-on-first-use model here today -- and those checksums are verified by bootstrap.py when downloading from static.
Checking whether the publicly available manifest hasn't changed doesn't seem like it affects rust-lang/rust's build at all. If the goal is an independent auditor of whether static.rust-lang.org artifacts are (or aren't) changing, then that seems fine, but placing that burden on rust-lang/rust's stage0 file doesn't seem right to me.
Can you say more about why this belongs here, especially given that we'd not be verifying the new hashes day to day in CI or similar?
Hello Mark! Inlined below.
Can you help me understand how this enables this where it wasn't before? AFAIK, many distros and companies already build the compiler offline -- just providing the stage0 compiler via a pre-step that is online-capable, putting paths to it in our existing bootstrap.toml flags. So what is this actually enabling?
Sure thing. This is coming from our experience with how Fuchsia is automatically updating our toolchain. We track both LLVM and Rust tip-of-tree together so we can take advantage of rapid deployment of features and bug fixes, as well as cross-language LTO, PGO, sanitizers, and etc. This exposes us to a variety of issues, such bugs on our side, multiple stage0 updates for a single release (such as with #140887 and #141647), or breakages in LLVM or Rust. All of which could cause us to not being able to build Rust for days or weeks. So we've automated our toolchain compilation to:
- Check out some version of Rust.
- Parse
src/stage0
to find the stage0 date and version. - Download the channel manifest from static.rust-lang.org and extract the stage0 git commit.
- Check if we've built the stage0 git commit, if so, build Rust. Otherwise, recurse into (1), then build Rust.
This also has the nice property that it'd be easy to re-bootstrap our toolchain in case we discover any issues with our build process, such as if we discover any issues, like a Reflections on Trusting Trust attack. To make this process more secure, it'd be nice if we didn't have to communicate with static.rust-lang.org to find the older stage0 commit.
The other option I considered was that instead of downloading the channel manifest, we could read src/stage0
, then look up the git commit for the corresponding version tag. But that'd only work for old builds, it wouldn't work for unreleased versions. We'd have to instead use the beta branch but then it wouldn't be reproducible.
I'm not sure I understand the value of auditing the manifest separately from what's already done. The stage0 file in-tree already verifies all of the downloaded artifacts have stable hashes. We effectively have a trust-on-first-use model here today -- and those checksums are verified by bootstrap.py when downloading from static.
Checking whether the publicly available manifest hasn't changed doesn't seem like it affects rust-lang/rust's build at all. If the goal is an independent auditor of whether static.rust-lang.org artifacts are (or aren't) changing, then that seems fine, but placing that burden on rust-lang/rust's stage0 file doesn't seem right to me.
I agree that bootstrap.py is robust and protects against changes when compiling Rust. This is mainly about supporting projects that bootstrap their own Rust toolchain. For Fuchsia, we mainly just need the git commit. I could see other bootstrappers wanting to work with stage0 source tarballs though, so I wanted to get those hashes somewhere as well. I added the rest for thoroughness.
Regarding if this should be in src/stage0, I'd be fine moving this elsewhere if you'd be receptive to it. Maybe we could either download the channel manifests into a src/stage0-history
directory, or create a separate file like src/stage0-history.toml
that'd look something like:
[[stage0]]
channel_manifest_hash = "abcd..."
git_commit = "bcde..."
...
[[stage0]]
channel_manifest_hash = "1234..."
git_commit = "2345..."
...
This also would have the side benefits:
- if there was a known bad stage0 we could just remove it from
src/stage0-history.toml
, rather than each bootstrapper having to implement custom logic to workaround a bad version. - it could help with disaster recovery if somehow all the data backing static.rust-lang.org ever got deleted.
I'd be happy to make any other changes or discuss further if you'd like.
☔ The latest upstream changes (presumably #142974) made this pull request unmergeable. Please resolve the merge conflicts.
So we've automated our toolchain compilation to:
- Check out some version of Rust.
- Parse src/stage0 to find the stage0 date and version.
- Download the channel manifest from static.rust-lang.org and extract the stage0 git commit.
- Check if we've built the stage0 git commit, if so, build Rust. Otherwise, recurse into (1), then build Rust.
Just want to chime in that this is essentially the same process we follow at Microsoft when building Rust toolchains for internal use. The key property is that we want to be building Rust with our own builds of the stage0 compiler. Thus the key bit of data that we need is "what is the git commit that produced upstream's stage0 compiler" - which currently requires pulling down the manifest from static.rust-lang.org. Having the git info directly in src/stage0 would be quite helpful
Apologies, missed your comment -- please @rustbot ready
in the future :)
Check if we've built the stage0 git commit, if so, build Rust. Otherwise, recurse into (1), then build Rust.
To clarify - this is trying to use the stage0 git commit as (effectively) a pointer to the build artifacts from the previous build (which may or may not exist yet)? Is it accurate to say that you don't control that pointer (i.e., you have to use the commit hash, you can't use e.g. the version string or similar)?
I guess that then implies that you actually don't care about the rest of src/stage0, since you can't possibly expect the hashes to match for it with the builds you perform internally (at least I would be surprised if we're that reproducible :)
For Fuchsia, we mainly just need the git commit. I could see other bootstrappers wanting to work with stage0 source tarballs though, so I wanted to get those hashes somewhere as well. I added the rest for thoroughness.
Can you clarify -- does Fuchsia get the sources to bootstrap from by (effectively) git clone of rust-lang/rust? You're never relying on anything from static.rust-lang.org (or at least not anything recent), including the source tarballs? i.e., where (if ever) does your chain of builds stop?
I think with that explanation I'm happy merging this in -- it seems reasonable to provide the (chain) of git hashes embedded in src/stage0 and as such in e.g. src tarballs and such to allow people to follow the exact same bootstrap chain that the official artifacts were built with. r=me with a rebase.
Hello Mark, no worries. Thanks for the tip about rustbot.
To clarify - this is trying to use the stage0 git commit as (effectively) a pointer to the build artifacts from the previous build (which may or may not exist yet)? Is it accurate to say that you don't control that pointer (i.e., you have to use the commit hash, you can't use e.g. the version string or similar)?
I guess that then implies that you actually don't care about the rest of src/stage0, since you can't possibly expect the hashes to match for it with the builds you perform internally (at least I would be surprised if we're that reproducible :)
Correct, we're storing our own build artifacts indexed off of the git commit, which we're using to bootstrap the next version of the compiler. We don't use any of the prebuilts or their hashes in src/stage0. Someday we'll have to do a writeup / talk about our setup. I was meaning on submitting a proposal to RustConf but lost track of the time...
Can you clarify -- does Fuchsia get the sources to bootstrap from by (effectively) git clone of rust-lang/rust? You're never relying on anything from static.rust-lang.org (or at least not anything recent), including the source tarballs? i.e., where (if ever) does your chain of builds stop?
Yes, we're getting the sources essentially with a git clone of rust-lang/rust, since we're trying to stay close to the tip of the tree to take advantage of bug fixes and features my team is implementing across LLVM and Rust. Today we're only accessing static.rust-lang.org to get the channel manifest for the git commit. Our chain of build starts with whatever is the latest commit from Rust, and ends when:
- we've found a stage0 that we've already built
- we triggered a stage0 build and it's completed
- we time out, which usually suggests we need to fix something.
I think with that explanation I'm happy merging this in -- it seems reasonable to provide the (chain) of git hashes embedded in src/stage0 and as such in e.g. src tarballs and such to allow people to follow the exact same bootstrap chain that the official artifacts were built with. r=me with a rebase.
Great! I'll rebase and ping you when I uploaded it.
@rustbot: ready
@erickt do you have bandwidth to work on this PR? I went ahead and rebased it on top of master
in my fork, if you like lambdageek/upstream-rust@0f7bb16ef13
This patch changes
bump-stage0
to include:src/stage0
.src/stage0
.src/stage0
.Combined this will allow for:
Projects that bootstrap their own compiler, such as Fuchsia, or users of bootstrap, to build their compilers offline without needing to communicate with static.rust-lang.org.
Auditors to detect if the channel manifest, and all the artifacts inside the manifest, were modified after it was used to generate
src/stage0
. Furthermore, if they did find modified artifacts, they could determine if the Rust Signing Key was compromised by checking if any modified file was signed properly.finally, it allows regeneration of
src/stage0
when specifying both the day of the build for rust, and the day of the build for rustfmt, which can allow a maintainer to regeneratesrc/stage0
to verify nothing changed.