Artwork: Susan Haejin Lee
Michael Hausenblas // Solution Engineering Lead, Amazon Web Services
The ReadME Project amplifies the voices of the open source community: the maintainers, developers, and teams whose contributions move the world forward every day.
Being able to observe how your software performs has always been important for developing software that operates efficiently, but perhaps never more so than now, with the proliferation of microservices and distributed systems.
In the case of a monolithic application running on a single machine, one can more or less get away with using logs alone for troubleshooting. In a distributed architecture, however, the question is not just what went wrong, but where? Which of the 20 microservices along the request path servicing an HTTP request caused the issue? The way to find this answer is to employ more signal types beyond logs: metrics and traces.
- Metrics offer aggregate health and performance insights into our application 
- Traces tell us where issues are occurring 
Taken together, logs, metrics, and traces can give us insight into our distributed applications.
While observability can provide insights you might otherwise miss, not all observability is created equal. Open source observability (OSO) tools and specifications can deliver the information you need without the downfalls of more traditional, proprietary tools, such as vendor lock-in and limited access to information.
In this Guide, you will learn:
- Why open source observability benefits us all 
- The importance of open standards 
- Best practices for overcoming common open source observability challenges 
What is open source observability?
The Cloud Native Computing Foundation (CNCF), home of open source projects such as Kubernetes, OpenTelemetry, and Prometheus, among many others, defines observability as:
... the capability to continuously generate and discover actionable insights based on signals from the system under observation. In other words, observability allows users to understand a system’s state from its external output and take (corrective) action.
Based on this definition of observability by the CNCF, we can define open source observability as anything that delivers actionable insights via open source software, and that is based on open standards, such as OpenTelemetry, for collecting signals like logs, metrics, and traces.
While this definition of OSO focuses on open source tooling and standards, I will point out that it is only of secondary relevance if you run the open source software yourself or outsource the operation to a vendor or cloud provider that offers APIs and web-based UIs. In fact, as we’ll discuss later, outsourcing this aspect may even be advisable. Beyond that, I think it’s also important to clarify a key observation, which may not be so obvious if you’re new to the space: Open source is not a business model. Rather, it is a way to collaborate and distribute software. Again, this point will become more important as we go on, because the ways that open source projects choose to collaborate and govern themselves can have downstream effects on you as a user.
First, let’s have a look at where OSO excels. Then we can talk about its limitations and I will share a few insights that I’ve gathered over the past decade working at several open source start-ups, Red Hat, and now at AWS on a service team that offers open source software as a service.
Open source observability is good for all of us
Larger enterprises that have workloads both on-premise and across one or more cloud providers often bet on open source as a strategy to minimize vendor lock-in and to use in negotiations with a vendor. This is certainly the case when it comes to OSO. With an OSO standard and toolset, companies can run the OSO stack themselves, which means they are less dependent on a specific vendor. In addition, the open standard ensures that there will be no need to reinstrument or reconfigure their code if they do need to move vendors. All of this together adds up to a better position for OSO users when negotiating pricing with vendors.
While open source software and standards can offer some universal benefits, not all open source is equal. Let’s have a look at a cloud native open source stack that you might end up choosing, and examine how this is the case:
- Linux and Kubernetes as the operating systems. 
- Jaeger, Prometheus, ClickHouse, and Grafana as the observability backends/frontends. 
- OpenTelemetry to instrument your apps, and to collect and ingest logs, metrics, and traces into backends. 
Now, let’s have a closer look at the tools in our example OSO stack. Although all of the above mentioned projects and tools are available as open source, they differ in some details:
- Some of the projects, such as Kubernetes and Prometheus, are governed by a vendor-neutral foundation, while others, like ClickHouse and Grafana, are mostly owned or driven by a single vendor. 
- Some, such as Kubernetes and OpenTelemetry, have a wide variety of vendor contributors, while others, such as Prometheus and Jaeger, value individual contributors over corporate contributors. 
There are pros and cons for each of the above approaches, so I won’t suggest that one way is objectively better than another. Still, my recommendation is to be aware of the different ways open source collaboration and distribution can take place and how they might affect your business downstream.
One aspect that sometimes is overlooked in the OSO context is the role that open standards play, so let’s move on to this topic.
The role of open standards
I started as a Java developer some 20-odd years ago, mostly working with XML (yes, that was the hot thing back then 😉) in the multimedia domain. One of the standards I had the questionable pleasure to work with was MPEG-7—a universal standard to describe any multimedia content. Learning about MPEG-7 was super hard, though, because it wasn’t an open standard. Everything was behind paywalls. Without buying the reference book on the topic, I’d argue, one would be unable to produce anything of value. Contrast this with the plethora of IETF RFCs or W3C recommendations, all freely available online. Learning how to use those standards is a relative breeze. Thankfully, information about the OSO space is similarly in a good place, with all the relevant standards available in the open.
Beyond easy access to information, open standards also offer a variety of other benefits. To give you an example, in August 2022 I ran a survey on the adoption of OpenTelemetry, the open standard for representing telemetry signals and their processing. A total of 120 people responded, offering some (not too surprising) insights. The majority stated that they were considering adopting OpenTelemetry, or were already in the process of doing so, because it is an industry standard.
For end-users, a standard like OpenTelemetry offers interoperability between vendors, making it easier to switch providers and avoid vendor lock-in. For vendors, a standard like OpenTelemetry is also a win because telemetry—that is, the collection and processing of signals—is now table stakes. Vendors can compete on the consumption of and interaction with metrics, logs, and traces via back-end and front-end interfaces, rather than on the undifferentiated space of instrumentation and agents.
So, OSO is awesome and there is not a single downside to it, right? Well, not so fast.
Open source observability challenges
OSO has a number of desirable properties, as we’ve outlined, but you're well served to have the complete picture. First, you should understand that in open source you typically deal with an asymmetric relation between producers and consumers. In other words, many people benefit from using open source, but way fewer actively contribute to it. As a user of open source software, don’t expect free support. Make sure that you can either get paid support from a vendor in the space or that you’re ready and willing to invest in building up that engineering muscle in-house. Second, keeping up with release cycles presents a challenge. Open source projects in the cloud native and OSO world can move especially fast and you want to make sure that you don’t fall behind.
The elephant in the room around OSO, however, might be the poor user experience (UX), as pointed out by Vlad Ionescu. Commercial vendors offering proprietary software control the stack end to end, and usually have an advantage when it comes to UX. Yes, you may pay a premium, but you also get things that work out of the box and are tightly integrated. This is not to say that it’s impossible to build a well-integrated and usable offering based on open source. For example, both SigNoz and Uptrace are building out an OSO solution on top of the popular columnar database Clickhouse using OpenTelemetry, providing an improved UX in the process.
Lessons learned and good practices
So, what’s the takeaway?
First off, there’s no reason to be leery of open source: Nowadays, open source is widely adopted, spanning everywhere from mobile devices to server operating systems to OSO-based SaaS offerings.
When you decide to go all-in on open source, however, be aware of the governance models and licenses of the OSO project(s) you pick to build your solution. These characteristics can affect how the project moves forward, what features will be prioritized, and how you can use it internally. Finally, consider off-loading operations to managed offerings, where available. This allows you to reap the benefits of open source while freeing you from the burdens of maintenance.
Adopting open standards comes with its own set of considerations but, if done properly, it provides a sustainable option for observability that you can use in any and all environments.
Michael Hausenblas is the OpenTelemetry product owner and solution engineering lead in the AWS Open Source Observability Service Team. Before Amazon, Michael worked at Red Hat, Mesosphere (now D2iQ), MapR (now part of HPE), and prior to that he spent a decade in applied research.
More stories
Provisioning self-hosted GitHub Actions runners on demand
Secure cloud deployment and delivery
Turbulent times call for adaptive leadership
About The 
ReadME Project
 Coding is usually seen as a solitary activity, but it’s actually the world’s largest community effort led by open source maintainers, contributors, and teams. These unsung heroes put in long hours to build software, fix issues, field questions, and manage communities.
The ReadME Project is part of GitHub’s ongoing effort to amplify the voices of the developer community. It’s an evolving space to engage with the community and explore the stories, challenges, technology, and culture that surround the world of open source.
Nominate a developer
Nominate inspiring developers and projects you think we should feature in The ReadME Project.
Support the community
Recognize developers working behind the scenes and help open source projects get the resources they need.
Sign up for the newsletter
Sharpen your open source skills with tips, tools, and insights. Delivered monthly.