Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Non-deterministic file_..._proto_rawDesc byte array generation for messages using custom options (e.g., gen_bq_schema.bq_table) #1692

Open
@lazarillo

Description

What version of protobuf and what language are you using?
Go protobuf runtime: google.golang.org/protobuf v1.36.6
protoc-gen-go plugin: buf.build/protocolbuffers/go:v1.36.5
protoc-gen-bq-schema plugin: github.com/GoogleCloudPlatform/protoc-gen-bq-schema/v3 v3.1.0
buf CLI version: 1.55.1
OS: Mac darwin arm64
Go version: go1.24.4 darwin/arm64

What did you do?

We are using buf generate to generate Go protobuf code (.pb.go files) from our .proto definitions. Our setup includes a custom message option, (gen_bq_schema.bq_table), provided by the buf.build/googlecloudplatform/bq-schema plugin. This option is applied to multiple messages in our schemas.

We execute buf generate multiple times on an identical set of input .proto files within a fully isolated environment (using a temporary, unique BUF_HOME directory for each run).

Here is an example:
buf.yaml:

version: v1

buf.gen.yaml:

version: v2
plugins:
 - remote: buf.build/protocolbuffers/go:v1.36.5 # Plugin version
 out: .
 opt: paths=source_relative
 - remote: buf.build/googlecloudplatform/bq-schema:v3.1.0 # Plugin version
 out: .

./test/v0/test.proto:

syntax = "proto3";
package test.v0;
import "gen_bq_schema/bq_table.proto";
import "google/protobuf/timestamp.proto";
import "google/protobuf/struct.proto";
message TestMessage {
 option (gen_bq_schema.bq_table) = {
 table_name: "golden__order_group__contract_in_waiting"
 };
 // Add some common fields that might be present in affected messages
 string id = 1;
 google.protobuf.Timestamp event_time = 2;
 google.protobuf.Struct details = 3;
 optional string optional_field = 4; // Example of an optional field
}

The run the Go code generation tool over and over and you'll see different results. Try something like:

# Create isolated temporary directories for each run
mkdir -p .tmp_out/run1 .tmp_out/run2
mkdir -p .tmp_buf_home_1 .tmp_buf_home_2
# Run buf generate for the first time
BUF_HOME="$(pwd)/.tmp_buf_home_1" buf generate --config buf.yaml --template buf.gen.yaml test --output .tmp_out/run1/
# Run buf generate for the second time
BUF_HOME="$(pwd)/.tmp_buf_home_2" buf generate --config buf.yaml --template buf.gen.yaml test --output .tmp_out/run2/
# Compare the generated files
diff -u .tmp_out/run1/test/v0/test.pb.go .tmp_out/run2/test/v0/test.pb.go

As you run a diff across those different runs, you'll see that the bytestring for the descriptor will change slightly.

What did you expect to see?

I expected bit-for-bit identical

What did you see instead?

The generated Go file test.pb.go is non-deterministic between runs. The only differences observed are consistently within the file_test_v0_test_proto_rawDesc byte array.

Specifically, a snippet from two different runs shows a difference in the ordering of two sub-fields within an embedded message that represents a Protobuf extension:

(The following is from Gemini... I had no easy way to validate since I wasn't going to check hex values.)

File 1 Snippet (from test.pb.go run1):

var file_test_v0_test_proto_rawDesc = string([]byte{
 // ... (preceding identical bytes) ...
 0xa2, 0x51, 0x04, 0x10, 0x01, 0x08, 0x01, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33,
})

File 2 Snippet (from test.pb.go run2, showing the difference):

var file_test_v0_test_proto_rawDesc = string([]byte{
 // ... (preceding identical bytes) ...
 0xa2, 0x51, 0x04, 0x08, 0x01, 0x10, 0x01, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33,
})

Analysis of the difference:

The 0xa2, 0x51 tag decodes to a protobuf field number 1300 with wire type 2 (length-delimited). This is characteristic of an extension field to a standard protobuf message (in this case, google.protobuf.MessageOptions).

The 0x04 indicates that the following 4 bytes are the content of this length-delimited field.

The difference lies in the order of 0x10, 0x01 and 0x08, 0x01.

0x10, 0x01 decodes to: Protobuf Field 2 (varint type), with value 1.

0x08, 0x01 decodes to: Protobuf Field 1 (varint type), with value 1.

Therefore, the issue is that two inner fields (Field 1 and Field 2, both with value 1) of the gen_bq_schema.bq_table extension are being serialized in a different order by protoc-gen-go between runs.

Anything else we should know about your project / environment?

This non-determinism only occurs on a subset of our messages that utilize the (gen_bq_schema.bq_table) extension. Many other messages using the same (gen_bq_schema.bq_table) option, generated in the same overall buf generate command, produce fully deterministic rawDesc byte arrays.

This suggests that the non-determinism might be triggered by a specific interaction with other fields, options (e.g., specific table_name string content), or the overall complexity of the .proto message definition that these particular messages possess.

We ensure the .proto input files are byte-for-byte identical for each buf generate run by using source control and checksums.

The BUF_HOME environment variable is set to a temporary, unique directory for each buf generate run to ensure that no cached artifacts or previous build states affect the new generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /