-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
What version of protobuf and what language are you using?
Go protobuf runtime: google.golang.org/protobuf v1.36.6
protoc-gen-go
plugin: buf.build/protocolbuffers/go:v1.36.5
protoc-gen-bq-schema
plugin: github.com/GoogleCloudPlatform/protoc-gen-bq-schema/v3 v3.1.0
buf CLI version: 1.55.1
OS: Mac darwin arm64
Go version: go1.24.4 darwin/arm64
What did you do?
We are using buf generate to generate Go protobuf code (.pb.go files) from our .proto definitions. Our setup includes a custom message option, (gen_bq_schema.bq_table), provided by the buf.build/googlecloudplatform/bq-schema plugin. This option is applied to multiple messages in our schemas.
We execute buf generate multiple times on an identical set of input .proto files within a fully isolated environment (using a temporary, unique BUF_HOME directory for each run).
Here is an example:
buf.yaml
:
version: v1
buf.gen.yaml
:
version: v2
plugins:
- remote: buf.build/protocolbuffers/go:v1.36.5 # Plugin version
out: .
opt: paths=source_relative
- remote: buf.build/googlecloudplatform/bq-schema:v3.1.0 # Plugin version
out: .
./test/v0/test.proto
:
syntax = "proto3";
package test.v0;
import "gen_bq_schema/bq_table.proto";
import "google/protobuf/timestamp.proto";
import "google/protobuf/struct.proto";
message TestMessage {
option (gen_bq_schema.bq_table) = {
table_name: "golden__order_group__contract_in_waiting"
};
// Add some common fields that might be present in affected messages
string id = 1;
google.protobuf.Timestamp event_time = 2;
google.protobuf.Struct details = 3;
optional string optional_field = 4; // Example of an optional field
}
The run the Go code generation tool over and over and you'll see different results. Try something like:
# Create isolated temporary directories for each run
mkdir -p .tmp_out/run1 .tmp_out/run2
mkdir -p .tmp_buf_home_1 .tmp_buf_home_2
# Run buf generate for the first time
BUF_HOME="$(pwd)/.tmp_buf_home_1" buf generate --config buf.yaml --template buf.gen.yaml test --output .tmp_out/run1/
# Run buf generate for the second time
BUF_HOME="$(pwd)/.tmp_buf_home_2" buf generate --config buf.yaml --template buf.gen.yaml test --output .tmp_out/run2/
# Compare the generated files
diff -u .tmp_out/run1/test/v0/test.pb.go .tmp_out/run2/test/v0/test.pb.go
As you run a diff across those different runs, you'll see that the bytestring for the descriptor will change slightly.
What did you expect to see?
I expected bit-for-bit identical
What did you see instead?
The generated Go file test.pb.go is non-deterministic between runs. The only differences observed are consistently within the file_test_v0_test_proto_rawDesc byte array.
Specifically, a snippet from two different runs shows a difference in the ordering of two sub-fields within an embedded message that represents a Protobuf extension:
(The following is from Gemini... I had no easy way to validate since I wasn't going to check hex values.)
File 1 Snippet (from test.pb.go run1):
var file_test_v0_test_proto_rawDesc = string([]byte{
// ... (preceding identical bytes) ...
0xa2, 0x51, 0x04, 0x10, 0x01, 0x08, 0x01, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33,
})
File 2 Snippet (from test.pb.go run2, showing the difference):
var file_test_v0_test_proto_rawDesc = string([]byte{
// ... (preceding identical bytes) ...
0xa2, 0x51, 0x04, 0x08, 0x01, 0x10, 0x01, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33,
})
Analysis of the difference:
The 0xa2, 0x51 tag decodes to a protobuf field number 1300 with wire type 2 (length-delimited). This is characteristic of an extension field to a standard protobuf message (in this case, google.protobuf.MessageOptions).
The 0x04 indicates that the following 4 bytes are the content of this length-delimited field.
The difference lies in the order of 0x10, 0x01 and 0x08, 0x01.
0x10, 0x01 decodes to: Protobuf Field 2 (varint type), with value 1.
0x08, 0x01 decodes to: Protobuf Field 1 (varint type), with value 1.
Therefore, the issue is that two inner fields (Field 1 and Field 2, both with value 1) of the gen_bq_schema.bq_table extension are being serialized in a different order by protoc-gen-go between runs.
Anything else we should know about your project / environment?
This non-determinism only occurs on a subset of our messages that utilize the (gen_bq_schema.bq_table) extension. Many other messages using the same (gen_bq_schema.bq_table) option, generated in the same overall buf generate command, produce fully deterministic rawDesc byte arrays.
This suggests that the non-determinism might be triggered by a specific interaction with other fields, options (e.g., specific table_name string content), or the overall complexity of the .proto message definition that these particular messages possess.
We ensure the .proto input files are byte-for-byte identical for each buf generate run by using source control and checksums.
The BUF_HOME environment variable is set to a temporary, unique directory for each buf generate run to ensure that no cached artifacts or previous build states affect the new generation.