Decide what backwards compatibility means · open-telemetry/opentelemetry-python · Discussion #1689

ocelotl
Mar 11, 2021

In the last SIG we raised this philosophical question, what does backward compatibility mean?

SemVer says this:

Major version X (X.y.z | X > 0) MUST be incremented if any backwards incompatible changes are introduced to the public API.

Since it mentions "the public API", in my opinion, this means that the major version MUST be increased if we do something like removing a public class, or changing the signature of a public method.

I think defining backwards incompatibility in this way has the advantage of being deterministic, there is a clear, limited definition of what a backwards incompatible public API change means, a removal of a public API symbol or the change of a function signature (maybe this definition includes something else I am not considering right now).

This being said, other folks have a different opinion. From what I understood in the SIG (please correct me if I am wrong), they consider a backwards incompatible change some change that can cause application code to break, even if it does not change the public API itself. This definition of a backwards incompatible change includes changes in the code behavior.

I understand how this definition is convenient for the end user, if a change in the behavior some OpenTelemetry Python component can break application code, increasing the major version number is a good way of letting the user know that, which is the same thing we do for public API changes, we let the user know their application code may break.

What I find a bit inconvenient with this definition is that we would be using code behavior as criteria and code behavior is less well defined and less deterministic. In theory, any bug fix can be considered a change in behavior and can cause application code to break, right?

I imagine this problem has happened before to other Python developers, so maybe if we take a look at the release history of some Pyhton packages we can get some inspiration to solve our problem as well? 🤷‍♂️

Anyways, please leave your comments below 😄

Replies: 8 comments 5 replies

ehiggs
Mar 11, 2021

Coming from compiled languages: major.minor.patch;

patch -> can replace the shared library with no changes to client code. (ABI and API are held)
minor -> requires recompilation (e.g. struct sizes changed in types not held in opaque pointers) (ABI broken; API held)
major -> requires a client code change. (ABI and API are broken)

0 replies

owais
Mar 12, 2021
Collaborator

As a consumer I consider anything a breaking change that will break my application. IMO behavioral changes are worse than exceptions because they are silent and may impact application in ways that are critical but completely silent. Users could go days or weeks without noticing that their systems are misbehaving. For this reason, I always lock exact versions of my deps and always have reproducible builds. Any random person publishing code somewhere unrelated to my project should never result in my build changing in any way. IMO auto minor/patch version updates are highly overrated and cause a lot more pain than they solve. I wish we could enforce use of lock files and exact dep versions for our users :(

As an OpenTelemetry author, I think if we make any changes that break collection or generation of telemetry data in any way, it should be considered a breaking change. If a change breaks relation between spans by breaking context propagation for example without breaking APIs, I think it should be a breaking change.

I totally see your point of this being harder to define but I think as a SIG we are capable of making judgement calls for such rare cases instead of ignoring an entire set of cases that may break telemetry in customer apps on a simple re-deploy or seemingly innocent dependency upgrade. May be an infinitely expressive type system might have helped we don't have that :)

1 reply

@aabmass

aabmass Mar 19, 2021
Maintainer

For this reason, I always lock exact versions of my deps and always have reproducible builds. Any random person publishing code somewhere unrelated to my project should never result in my build changing in any way. IMO auto minor/patch version updates are highly overrated and cause a lot more pain than they solve. I wish we could enforce use of lock files and exact dep versions for our users :(

+1 it would be great if users used lock files more. I suppose if we take breaking changes very seriously and bump major versions frequently, it would achieve similar effect even if users have ~= in requirements but aren't using locking. I don't really think users care about major version bumps that much, as long as the breaking changes are well documented and they don't need to rewrite their whole code base.

As an OpenTelemetry author, I think if we make any changes that break collection or generation of telemetry data in any way, it should be considered a breaking change. If a change breaks relation between spans by breaking context propagation for example without breaking APIs, I think it should be a breaking change.

+1, breaking telemetry in any way should be a breaking change and this is being discussed at the spec level so we shouldn't need to define it eventually. I think such breaking changes will mostly come from instrumentation/exporter/propagation packages, which we can happily bump major versions for.

ocelotl
Mar 16, 2021
Author

Here is more information.

0 replies

codeboten
Mar 17, 2021

Here's the spec doc around backwards compatibility: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md

0 replies

lzchen
Mar 18, 2021
Maintainer

Okay let's scope this issue down to specifically just "are changes in behavior breaking changes"? We have all already agreed that we must maintain semver and making changes/removal of API needs to increase in major version.

In theory, any bug fix can be considered a change in behavior and can cause application code to break, right?

@ocelotl Brings up a good point here, and a specific instance can be seen in this pr. This change supposedly fixes an issue where the TraceProvider is instantiated somewhere else and in the users current application, they cannot change it (so it will be a no-op for the rest of the life cycle). Technically if we merge this PR it will solve a bug, but it will also "change" the behavior of this specific edge case. So where is the line that we draw?

IMO, there will always be bugs and undefined behavior that users will find that we won't be able to catch right away. What we should promise users is that they should only take dependencies on behavior/components that are documented in our API surface. If they take a dependency on a "bug", we should not need to promise them that these will not break. One could argue "well what is a bug and what is a breaking behavior change"? This is why these cases will probably require some judgement on our part as a community. So our official message should be "users should not take dependencies on functionality and behavior that is not documented in our public APIs, but if they do, we do not have a promise that there will be stability guarantees.
Also, Dot net SIG has also taken a similar stance.

With that being said, we should make the experience and transition as smooth as possible for customers. If we are fixing major bugs that causes behavior changes, we should do our best to be as transparent as possible, to allow users time to upgrade and adjust to the change.

This is an important issue to flesh out before our 1.0.0 so please comment on this if you have thoughts.

2 replies

@aabmass

aabmass Mar 19, 2021
Maintainer

Playing devil's advocate, if a user depends on a security vulnerability and we fix it, we have broken their code. I think it mostly comes down to documented and intended behavior (our docs + OTel spec).See "Patch version MUST be incremented if only backwards compatible bug fixes are introduced. A bug fix is defined as an internal change that fixes incorrect behavior."

@lzchen

lzchen Mar 19, 2021
Maintainer

I think it really comes down to the bug in question, which is why we should not be so strict on what we promise for breaking changes. If we evaluate this on a case by case basis, it is more feasible than giving a blanket statement to users that we will NEVER break them (behaviour or not) without increasing the major version. I feel like this is an impossible task (as mentioned by @ocelotl below).

owais
Mar 18, 2021
Collaborator

I agree. Also, from a users perspective, if I take a dependency on a library, I expect it to work and not break my apps (with or without crashes) months or years from now. If updating breaks my apps by introducing exceptions or behavioral change that fundamentally alter how the library functions or the data it produces then I'd consider that a breaking change.

0 replies

ocelotl
Mar 19, 2021
Author

I consider that for practical reasons it may not be convenient to stick to the very strict and limited definition of breaking changes that I suggested. In my opinion, the ideal scenario would be to be able to guarantee the user "you application code won't break when you use a version whose major number is the same as the one you have been using". This is also impossible to do, in my opinion, because application code is outside our control.

What I consider critical from what @lzchen suggests is to only increase the major version number for a behavioral change when the behavioral change makes us introduce a change in documentation. I understand that documentation is in itself a commitment to our users, not as strictly defined as an API but a commitment still.

0 replies

aabmass
Mar 19, 2021
Maintainer

One really important use case to remember is library authors who instrument their library directly with opentelemetry-api. In their setup.py they will probably put opentelemetry-api~=1.0 to leave a wide range for pip to resolve dependencies on. If we bump to 2.0 in the next few months, it could make a be a big problem for interoperability between libraries.

E.g. libfoo depends on opentelemetry-api~=1.0 and libbar depends on opentelemetry-api~=2.0. This is huge problem because now users can't install those two packages together because we were too hasty doing a major version bump.

2 replies

@owais

owais Mar 19, 2021
Collaborator

Valid concern but should we discuss it here? It was (correctly) called out in the SIG meeting by @lzchen that this discussion should only focus on whether a behavioral change is a breaking change or not and if it is, then in what situations. Probably we should just focus on resolving that.

That said, to answer your question, if we release 2.0 and libbar takes a dependency on 2.0, then it is libbar's responsibility to also bump it's own major version so people who use libbar with otel 1.0 can stay on libbar's previous version. Or may be I didn't understand the problem correctly?

@aabmass

aabmass Mar 22, 2021
Maintainer

focus on whether a behavioral change is a breaking change or not and if it is, then in what situations.

My point is that the API package is a bit of a special case and we need to be more flexible about keeping it at 1.x for a while because it is a common dependency for all instrumented code. If you want to bump the major version of e.g. the zipkin exporter every release to avoid the "auto minor/patch version updates are highly overrated and cause a lot more pain than they solve" thing, that is fine by me – The user just upgrades the zipkin exporter and its no big deal, doesn't effect any other dependencies.

Decide what backwards compatibility means #1689

Uh oh!

ocelotl Mar 11, 2021

Replies: 8 comments · 5 replies

Uh oh!

ehiggs Mar 11, 2021

Uh oh!

owais Mar 12, 2021 Collaborator

Uh oh!

aabmass Mar 19, 2021 Maintainer

Uh oh!

ocelotl Mar 16, 2021 Author

Uh oh!

codeboten Mar 17, 2021

Uh oh!

Uh oh!

lzchen Mar 18, 2021 Maintainer

Uh oh!

aabmass Mar 19, 2021 Maintainer

Uh oh!

lzchen Mar 19, 2021 Maintainer

Uh oh!

Uh oh!

owais Mar 18, 2021 Collaborator

Uh oh!

ocelotl Mar 19, 2021 Author

Uh oh!

Uh oh!

aabmass Mar 19, 2021 Maintainer

Uh oh!

Uh oh!

owais Mar 19, 2021 Collaborator

Uh oh!

aabmass Mar 22, 2021 Maintainer

ocelotl
Mar 11, 2021

Replies: 8 comments 5 replies

ehiggs
Mar 11, 2021

owais
Mar 12, 2021
Collaborator

aabmass Mar 19, 2021
Maintainer

ocelotl
Mar 16, 2021
Author

codeboten
Mar 17, 2021

lzchen
Mar 18, 2021
Maintainer

aabmass Mar 19, 2021
Maintainer

lzchen Mar 19, 2021
Maintainer

owais
Mar 18, 2021
Collaborator

ocelotl
Mar 19, 2021
Author

aabmass
Mar 19, 2021
Maintainer

owais Mar 19, 2021
Collaborator

aabmass Mar 22, 2021
Maintainer