-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add Kafka Cluster Monitoring #21736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kafka Cluster Monitoring #21736
Conversation
Codecov Report
❌ Patch coverage is 79.60644% with 114 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.00%. Comparing base (bcd706c) to head (03ccd9c).
Additional details and impacted files
🚀 New features to boost your workflow:
- ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not broker id as well? Cardinality? No info available? Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most metrics are cluster wide metrics, so not specific to only one broker. I will make sure that metrics specific to one broker are tagged with broker id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intentional to not collect, for example, topic metadata if collecting broker metadata failed?
I would suggest to split into four try/except (each with the proper error logged) unless:
- we deliberately want to have no topic data if broker fails (and so on); or
- it is deterministically know that if one fails also the others will fail; or
- we do not want half-collected data
Uh oh!
There was an error while loading. Please reload this page.
What does this PR do?
Adds Kafka cluster monitoring capabilities to the
kafka_consumerintegration (preview feature). Whenenable_cluster_monitoring: trueis set, the integration collects:Motivation
While the existing
kafka_consumerintegration provides consumer lag monitoring, customers need deeper visibility into their Kafka clusters without relying solely on JMX-based monitoring. This feature enables:This complements the existing JMX-based
kafkaintegration by providing Admin API-based metadata collection.Review checklist (to be filled by reviewers)
qa/skip-qalabel if the PR doesn't need to be tested during QA.backport/<branch-name>label to the PR and it will automatically open a backport PR once this one is merged