Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Browser Tracing Roadmap #17741

Lms24 started this conversation in RFCs
Discussion options

Hey folks, this discussion is a request for comments and feedback on how we envision browser (and in a broader sense frontend-) tracing. Things discussed in this post might or might not make it into the Sentry SDKs but this discussion should serve as a common ground. We’d love to hear your thoughts and concerns!

Fair warning: This post touches a lot of different parts around tracing in the browser. This is on purpose because we have a lot of ideas for improvements in this space. Concrete projects or changes within this doc will of course be written down in greater details in issues or separate RFCs (depending on applicability).

Apologies for the length in advance! If you're only interested in the proposed improvements, skip ahead to Future Plans

Current Status and Pain Points

Let's take a look at the current state of all things browser tracing and what's cumbersome and bothering us.

Tracing Model

Ever since we introduced v8 of the SDK, we made the decision to keep one traceId around for the entire duration of users being on one route. On a navigation to another route, or a hard page load, we’d start a new trace.

image

Our develop docs contain more details on the current tracing model.

While today’s browser tracing model works well in most cases, it surfaces some clear pain points all over the Sentry product surface as well as on a more conceptual level:

  1. Long-running traces are not displayed well in the trace view. Navigating a waterfall is not easy, nor intuitive, especially because of the vast amount time between spans where nothing happens being displayed on the linear time axis.
  2. Related, the trace duration became largely meaningless. This surfaces not only in the waterfall view but also in explore and any other place where we show trace durations or base calculations or deductions on it.
  3. Traces with multiple root spans are not expected on a conceptual level. A trace must have a clear hierarchy and multiple root spans are violating this. While our trace view handles multiple root spans with it being able to display them, things like trace duration, a lot of empty space between individual root span trees, etc are still sub-optimal. We could (should) fix this in the product but the question remains: Should we rather not send multiple root spans in the first place?

SSR Traces

Whenever we start a trace in the backend and propagate the trace data via <meta> tag to the browser, we continue the trace in the browser as a child of the server trace. This poses a lot of problems some of which we tried to address by fixing the symptoms:

  1. Conceptually, as well as time-data-wise, the trace starts in the browser, with the browser making the first request to the server. Also it makes more sense intuitively, given that the browser starts the entire lifecycle (e.g. by users opening the page). Technically, the trace starts on the server though. We tried to reparent and emulate the conceptual model in the Sentry UI for a while but it caused more problems than it solved and hence we’re reverting back to the more "honest" (but harder to grasp) technically correct trace, where the trace root is the http.server SSR span.
  2. The technically correct trace is still flawed: As a result of the SSR time being much shorter than the entire pageload time, the child pageload span sub-tree has a longer duration than it’s parent span. This violates the trace definition (as much or more as multiple root spans mentioned above).
  3. The combination of the technically correct trace display + multiple root spans on one page will cause additional confusion because even interaction spans on the browser are are child spans of the SSR server span, despite them happening minutes (hours) after the initial SSR cycle.
  4. Because we strictly continue traces, we also strictly continue the sampling decision. If the HTML with injected <meta> tags is cached (e.g. by ISR or a CDN between application and users), the sampling decision is still carried over to the client.

Sampling Consistency

Related to the last problem, sampling consistency is a larger problem:

Currently, by default, whenever we start a new trace (i.e. right now on navigation or page load), we make a new sampling decision. This is not always what you want, especially not in frontend. Some users want a more fine-grained option (within the same page) other users want a consistent sampling decision for the entire user journey.

With the Linked Traces project (more on that later), we also introduced consistentTraceSampling which is a first response towards enabling longer-lived sampling decisions across subsequent traces. This works well but is hard to find, not super configurable and not always fine-grained enough.

Request Spans

Today, http.client request spans for fetch and XmlHttpRequest requests are started and sent if a parent span is active. In default configuration this means that any request happening while no pageload or navigation span is active, would not be tracked with a span (tracing headers are still attached). This causes a lot of confusion for users who don’t understand why some requests are not traced with a span, while the backend request is traced.

Historically, we couldn’t send http.client root spans because their name wouldn’t be low-cardinality enough to fit our transaction name requirements.

Bundle Size and Modularity

All tracing functionality in the browser SDKs is added to the SDK via one integration, browserTracingIntegration. This has some advantages:

  • If you don’t want tracing functionality, there’s no bundle size impact for tracing code. It’s simply not included in your JS bundle(s).
  • If you do want tracing, it’s as simple as adding browserTracingIntegration to your SDK. The exception here are meta frameworks, where for the best OOTB experience, we already include browserTracingIntegration by default.
  • For meta frameworks or the best possible bundle size reduction in total, users can configure our bundler plugins to tree-shake out even more tracing-related code.

However, there are still some considerable disadvantages with the one-integration approach:

  1. Either you get all tracing functionality or none. There’s no in-between. By default, all telemetry is collected (which often is overwhelmingly much and contains a lot of noise).
  2. There’s no in-between in terms of bundle size. While you can disable e.g. the collection of resource spans, you can’t tree-shake out the respective code.
  3. Even worse, Tracing without spans (performance) is coupled to browserTracingIntegration. So even if you only want backend and frontend errors connected but no spans, you have to pull in the entire bundle size weight of browserTracingIntegration.

Future Plans

Now that I ranted a bunch about the current pain points, let’s address how we plan to fix or at least improve them.

SSR Traces

Right now, we always continue the trace and the sampling decision of the server-side SSR trace when we find <meta> tags with Sentry tracing data. This was a semi-conscious decision at the time but we realized an important thing here: It is "just" SDK behavior and we can change this at any time. Therefore, we propose to do the following instead:

  • We keep both traces, the server side http.server span tree and the pageload trace separate. We no longer continue the trace on the client. This means, server- and client traces will have separate ids (continue reading before you protest 🙂)
  • Instead, the pageload span tree is its own trace, meaning it is a root span. It’s browser.request child span will have a span link linking to the http.server trace root span. Span links are designed for this very purpose: Connect spans (traces) that are related (causally) but do not fit into the strict trace hierarchy. Since our SDK supports them and we can ingest them (in a basic form at least), let’s make use of them.
  • By default, we continue the sampling decision of the SSR trace on the client. This means by default, things are still nicely connected as today and we can ensure that both server- and client-side are consistently sampled and ingested. For cases like ISR or CDNs caching rendered pages, users can opt out of the continuous sampling decision.

⇒ We can take an active stance on how SSR traces should be handled. We do not have to adhere to strict trace continuation, as it clearly doesn’t make sense.

Before:
image

After:
image

Span Links instead of Long Traces

As already mentioned, our SDK supports span links and we can ingest and display them (in a basic form) in Sentry. So, let’s make use of them more! We already send a linked list of traces by including a sentry.previous_trace span link on the root span of the currently active trace. This works somewhat well today but is limited by sampling consistency and the capabilities of the product (UI- and data-wise). If we can address all the SDK-external limitations, we can and should send more links instead of traces with multiple root spans.

Untitled-2025年04月25日-1141

⇒ The goal is: Make traces more organic, avoid trace violations, augment trace connections via span links

To get to this point with good UX, here’s a (probably inconclusive) list of things we need to change:

  • [Product] Proper support for span links in our storage layer, so that we can query for spans with specific links
  • [Product] Proper UI navigation to navigate from one span to a span it links to.
  • [Product] Augmented UI navigation that allows easy navigation between:
    • The previous/next traces
    • The initial pageload span of the user journey
    • (The last navigation span of the user journy)
    • The SSR span (if available)
  • [SDK] Proper sampling behaviour
  • [SDK] Each root span starts its own, new trace (as opposed to the long-lived traceId per route we have today)
  • [SDK] Browser root spans properly link to their predecessors (previous trace, initial pageload, last navigation, SSR trace

Always send request spans

Today, the Sentry product is in a much better state to deal with child-less, single http.client (root) spans. Soon, it will be in an even better state. This means, we will be able to lift the restriction of only sending request spans as children of active root spans. Instead, we will send request spans whenever fetch/XHR requests are made. We can introduce additional APIs to configure this behaviour for anyone who has span quota concerns.

⇒ This we should just do, it really is a no-brainer

Choice over Sampling Consistency

We should provide easy alternatives to our current sampling decision semantics: Users should have the choice how long a sampling decision lasts in browser (frontend) SDKs:

  • Have a CDN set up to cache initial page requests? opt-out of SSR trace sampling decision being continued on the client!

  • Decide how long the sampling decision should remain consistent vs. when a new one is made:

    • consistentTraceSampling: false - sampling decision is made on per-trace basis (default today)
    • consistentTraceSampling: true - sampling decision remains consistent until the next hard page reload, or even longer than that if users opt into sessionStorage
    • Or allow for more fine-grained control via:
    consistentTraceSampling: {
     // true by default,
     // false is good for CDN-cached and -served initial page responses
     ssr: boolean 
     
     // - none: each trace sampled independently
     // - page: save sampling decision (SD) for current page/route (until next navigation)
     // - in-memory: save SD while SDK is initialized (i.e. entire SPA lifecycle)
     // - session-storage: save SDK in session storage (+ max time window) 
     frontend: 'none' | 'page' | 'in-memory' | 'session-storage'
    }

The API suggestions are not set in stone yet but they should serve the illustrative purpose: Give users the choice for consistent sampling.

⇒ browser applications are too diverse to provide good defaults for everyone. So let's settle with reasonable defaults for the majority (e.g. SPA apps) and provide enough options and granularity for everyone else

Splitting Up browserTracingIntegration

To accommodate bundle-size conscious users, we should split up browserTracingIntegration into its distinct functionalities to give users the option of pulling in only the parts they need.

To be clear: We will continue to ship one browserTracingIntegration to maintain ease of setup but this integration will be made up of sub integrations that users can pull in individually optionally.

This is how we envision the split:

  1. Trace continuation & propagation:
    1. reads tags from SSR applications for trace continuation
    2. propagates trace data via sentry-trace, baggage (and optionally traceparent) HTTP headers on XHR/fetch requests
  2. Routing instrumentation
    1. handles pageload and navigation spans
    2. Probably also adds browser.* spans for initial pageload request data
  3. Web vitals
    1. collects and sends web vitals via pageload or standalone spans (depending on vital)
  4. Request spans
    1. collects and sends fetch and XHR request spans
  5. Browser performance spans
    1. collects and sends resource.* , longanimationframe, mark, measure , etc spans

Besides bundle-size optimization, another key benefit is separation of concerns. Today, the browserTracingIntegration is massive and has a lot of code. Splitting it up into logical units helps cleaning up the intertwined logic and will make the integrations more maintainable.

For Tracing without spans (performance), only the trace continuation and propagation sub integration is necessary. Everything else can be stripped away, resulting in minimal bundle size impact.

Out of Scope (for now)

Async Context

As of today, Async context in browser is still not implemented, which means that we can’t reliably establish correct parent↔child span relationships within async operations. Consequently, we’ll for the forseeable future still default to attaching child spans always directly to the root span. As previously, this can be opted-out of by setting parentSpanIsAlwaysRootSpan: false in Sentry.init but with the caveat that hierarchies in async operations might be incorrect.

Sessions and Session-based Sampling

In the long term, we're strongly advocating for a serious session model and product experience in Sentry. From SDK perspective, This most importantly includes assigning a session id to all events and data points set from the SDK to Sentry, and potentially also sending more data within the session envelopes we send today (for example, direct event id mappings to better adjust session health metrics post-ingest).

Related, we're advocating for session-based sampling to more uniformly sample distinct data points (errors, traces, replays, etc) within sessions.

Both of these features are out of scope for this RFC but its important to highlight that anything proposed here does not stand in the way of introducing session improvements on top of them. Span links have their place, even if traces can be queried by session id. Fine-grained control over trace sampling decision lifetime must be compatible in the same way that sampleRate, profilesSampleRate, etc are compatible with session sampling controls.

WDYT?

Do you have strong opinions on frontend tracing? Are we missing key pain points you're currently experiencing in this roadmap? Do you like what you're reading and where we're heading? Please let us know! We appreciate feedback a lot, regardless of 🌶️-level :)

You must be logged in to vote

Replies: 1 comment

Comment options

I agree that the long-lasting traces were a bad choice in the past, and it’s good we’re getting rid of them. To me, it seemed that the long-lasting trace ID was essentially our session ID, which the product doesn’t handle well. Span Links are a step in the right direction. A significant issue with only having span links without a session ID is that building the complete tree of connected traces is typically more complex, as it requires traversing numerous spans of the tree to identify all connections. But even if we add a session ID later, span links will still be helpful in linking specific spans and traces together. Furthermore, OTEL also has SpanLinks. I think we still need a session ID to link traces, spans, logs, etc., all to one user session, but that’s a bigger discussion.

I like the idea of consistentTraceSampling . We must develop a simpler model for sampling that also minimizes the problem of data holes resulting from unsampled data. I also think that session-based sampling would be easier for our users to understand. Still, believing that this will be the silver bullet to solve all our problems would be naive. I’m pretty sure there are many unknown unknowns, but it’s worth investigating the idea.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
RFCs
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /