-
Notifications
You must be signed in to change notification settings - Fork 181
New evals + eval tagging #1015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New evals + eval tagging #1015
Conversation
WalkthroughAdds pytest markers Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (3)tests/**📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/llm/**/*.yaml📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/test_case.yaml📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml (2)
18-18
: Use neutral, unique pod names in tests
The manifest defines domain-specific pods (orders-app
,invoices-app
,finance-app
,accounting-app
) that hint at the business logic and risk name collisions across tests. Per LLM test guidelines, rename these (and any other pods/services) to neutral, unique identifiers (e.g.app-a
,app-b
,app-c
, etc.) intests/llm/fixtures/test_ask_holmes/55_kafka_runbook/app/kafka-manifest.yaml
.
14-14
: Useapp-55
as the Kubernetes namespace.
- In test_case.yaml (line 14), change
-n ask-holmes-namespace-55
to-n app-55
.- In kafka-manifest.yaml, update all
namespace: ask-holmes-namespace-55
entries tonamespace: app-55
.tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (1)
1-1
: Replacenamespace-66
withapp-66
in the fixture.
In tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (line 1 and corresponding kubectl commands), update all occurrences ofnamespace-66
toapp-66
to adhere to theapp-<testid>
namespace convention for LLM tests.tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml (1)
16-22
: Critical: Namespace does not follow required pattern.The namespace "nr" does not follow the required pattern
app-<testid>
. Based on the test ID (120), this should beapp-120
.As per coding guidelines.
Apply this diff to fix the namespace:
before_test: | [ -n "${NEW_RELIC_ACCOUNT_ID:-}" ] && [ -n "${NEW_RELIC_API_KEY:-}" ] && [ -n "${NEW_RELIC_LICENSE_KEY:-}" ] || { for v in NEW_RELIC_ACCOUNT_ID NEW_RELIC_API_KEY NEW_RELIC_LICENSE_KEY; do [ -n "${!v:-}" ] || echo "Missing env var: $v"; done; exit 1; } - kubectl create namespace nr - kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n nr + kubectl create namespace app-120 + kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n app-120 kubectl apply -f ./deployment.yaml after_test: | - kubectl delete secret newrelickey -n nr + kubectl delete secret newrelickey -n app-120 kubectl delete -f ./deployment.yamltests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml (1)
21-30
: Missing namespace specification in kubectl commands.The before_test and after_test scripts don't specify a namespace, which means resources will be created in the default namespace. Per coding guidelines, LLM tests must use a dedicated namespace of the form
app-<testid>
. This test should useapp-119
.As per coding guidelines.
Apply this diff to add the required namespace:
before_test: | [ -n "${NEW_RELIC_ACCOUNT_ID:-}" ] && [ -n "${NEW_RELIC_API_KEY:-}" ] && [ -n "${NEW_RELIC_LICENSE_KEY:-}" ] || { for v in NEW_RELIC_ACCOUNT_ID NEW_RELIC_API_KEY NEW_RELIC_LICENSE_KEY; do [ -n "${!v:-}" ] || echo "Missing env var: $v"; done; exit 1; } - kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" + kubectl create namespace app-119 || true + kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n app-119 - kubectl apply -f ./sales.yaml + kubectl apply -f ./sales.yaml -n app-119 sleep 120 after_test: | - kubectl delete secret newrelickey - kubectl delete -f ./sales.yaml + kubectl delete namespace app-119 --ignore-not-found=truetests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml (1)
6-10
: Add missing pytest marker "datadog"
Thedatadog
tag in tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml isn’t declared in pyproject.toml; either add it undertool.pytest.ini_options.markers
or remove it.tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1)
12-18
: Add undeclared pytest markers ‘prometheus’ and ‘embeds’ to pyproject.tomltests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml (1)
10-10
: Update namespace toapp-70
.
In tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml at line 10, replacekubectl create namespace namespace-70 || true
with
kubectl create namespace app-70 || true
to comply with the
app-<testid>
naming convention.tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml (1)
11-22
: Namespace naming violates coding guidelines.The test uses
namespace-74
but according to coding guidelines, each LLM test must use a dedicated Kubernetes namespace of the formapp-<testid>
. This test should useapp-74
instead ofnamespace-74
.As per coding guidelines: Each LLM test must use a dedicated Kubernetes namespace of the form app-.
Apply this diff to fix the namespace naming:
before_test: | # Create namespace first since the secret depends on it (|| true ignores if it already exists) - kubectl create namespace namespace-74 || true + kubectl create namespace app-74 || true kubectl create secret generic cache-service-logs-script \ --from-file=generate_logs.py=./generate_logs.py \ - -n namespace-74 --dry-run=client -o yaml | kubectl apply -f - + -n app-74 --dry-run=client -o yaml | kubectl apply -f - kubectl apply -f ./manifest.yaml sleep 40 after_test: | kubectl delete -f ./manifest.yaml - kubectl delete secret cache-service-logs-script -n namespace-74 --ignore-not-found - kubectl delete namespace namespace-74 --ignore-not-found + kubectl delete secret cache-service-logs-script -n app-74 --ignore-not-found + kubectl delete namespace app-74 --ignore-not-foundAdditionally, verify that
./manifest.yaml
uses the same namespace:#!/bin/bash # Check the namespace used in the manifest file echo "=== Checking namespace in manifest.yaml for test 74 ===" fd -e yaml manifest.yaml tests/llm/fixtures/test_ask_holmes/74_config_change_impact/ --exec cat {} | rg 'namespace:' -B 2 -A 1tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml (1)
17-26
: Missing dedicated namespace for test.According to the coding guidelines, "Each LLM test must use a dedicated Kubernetes namespace of form app-." This test (ID: 118) should create and use namespace
app-118
, but thebefore_test
andafter_test
scripts do not specify a namespace, defaulting to thedefault
namespace.Apply this diff to use a dedicated namespace:
before_test: | [ -n "${NEW_RELIC_ACCOUNT_ID:-}" ] && [ -n "${NEW_RELIC_API_KEY:-}" ] && [ -n "${NEW_RELIC_LICENSE_KEY:-}" ] || { for v in NEW_RELIC_ACCOUNT_ID NEW_RELIC_API_KEY NEW_RELIC_LICENSE_KEY; do [ -n "${!v:-}" ] || echo "Missing env var: $v"; done; exit 1; } - kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" - kubectl apply -f ./nrlogger.yaml + kubectl create namespace app-118 || true + kubectl create secret generic newrelickey --from-literal=key="${NEW_RELIC_LICENSE_KEY}" -n app-118 + kubectl apply -f ./nrlogger.yaml -n app-118 sleep 300 after_test: | - kubectl delete secret newrelickey - kubectl delete -f ./nrlogger.yaml + kubectl delete secret newrelickey -n app-118 --ignore-not-found + kubectl delete -f ./nrlogger.yaml -n app-118 --ignore-not-found + kubectl delete namespace app-118 --ignore-not-foundAs per coding guidelines
tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml (1)
11-20
: Namespace violates naming convention.The test uses "namespace-75" but the coding guideline requires namespaces of the form "app-" (app-75 in this case).
As per coding guidelines.
Apply this diff to fix the namespace:
- kubectl create namespace namespace-75 || true + kubectl create namespace app-75 || true kubectl create secret generic frontend-logs-script \ --from-file=generate_logs.py=./generate_logs.py \ - -n namespace-75 --dry-run=client -o yaml | kubectl apply -f - + -n app-75 --dry-run=client -o yaml | kubectl apply -f - kubectl apply -f ./manifest.yaml sleep 40 after_test: | kubectl delete -f ./manifest.yaml - kubectl delete secret frontend-logs-script -n namespace-75 --ignore-not-found + kubectl delete secret frontend-logs-script -n app-75 --ignore-not-found - kubectl delete namespace namespace-75 --ignore-not-found + kubectl delete namespace app-75 --ignore-not-foundNote: You'll also need to update the manifest.yaml to reference app-75.
♻️ Duplicate comments (3)
tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml (1)
7-7
: Verify that the "medium" tag is a declared pytest marker.Similar to the concern in file
160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml
, ensure thatmedium
corresponds to a declared pytest marker inpyproject.toml
.Based on learnings
tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml (1)
22-22
: Verify that the "medium" tag is a declared pytest marker.Same concern as in previous files: ensure that
medium
corresponds to a declared pytest marker inpyproject.toml
.Based on learnings
tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml (1)
12-12
: Verify that the "medium" tag is a declared pytest marker.Same concern as in previous files: ensure that
medium
corresponds to a declared pytest marker inpyproject.toml
.Based on learnings
🧹 Nitpick comments (2)
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml (1)
188-249
: Consider adding securityContext for test isolation.The deployment runs containers as root without
allowPrivilegeEscalation: false
orrunAsNonRoot: true
. While this may be acceptable for test fixtures, adding basic security constraints improves test isolation and follows best practices.Apply this diff to add security context:
spec: containers: - name: main image: python:3.11-slim command: ["python", "-u", "/scripts/main.py"] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 volumeMounts: - name: scripts mountPath: /scripts resources: requests: memory: "64Mi" cpu: "10m" limits: memory: "128Mi" cpu: "100m" - name: connection-manager image: python:3.11-slim command: ["python", "-u", "/scripts/sidecar.py"] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 volumeMounts: - name: scripts mountPath: /scripts resources: requests: memory: "32Mi" cpu: "10m" limits: memory: "64Mi" cpu: "50m" - name: metrics-monitor image: python:3.11-slim command: ["python", "-u", "/scripts/monitor.py"] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 volumeMounts: - name: scripts mountPath: /scriptstests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml (1)
155-190
: Consider adding a security context for best practices.The Deployment uses neutral naming that doesn't hint at the problem, which is good. However, the container lacks a security context. While this is a test fixture, adding basic security controls is recommended.
Consider adding a security context to the container:
- name: api image: python:3.11-slim command: ["python", "-u", "/scripts/app.py"] + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 + capabilities: + drop: + - ALL volumeMounts: - name: scripts mountPath: /scripts
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (65)
pyproject.toml
(1 hunks)tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml
(2 hunks)tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml
(2 hunks)tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id_2.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_traces_by_deployment.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_backend_deployment.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_checkout.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_fraud.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_depl.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_namespace.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_lineage_children.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_logs_checkout.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_memory_requests_namespace.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_nodes.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_pods.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/test_case.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/35_tempo/toolsets.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_describe_pod_long-logs-app-84fbcbfb5f-57zrx_ask-holmes-namespace-47.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_find_resource_long-logs-app_pod.txt
(0 hunks)tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/manifest.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/test_case.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml
(0 hunks)tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml
(1 hunks)tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml
(1 hunks)
💤 Files with no reviewable changes (23)
- tests/llm/fixtures/test_ask_holmes/35_tempo/test_case.yaml
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_nodes.txt
- tests/llm/fixtures/test_ask_holmes/50_logs_since_specific_date/test_case.yaml
- tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id.txt
- tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/test_case.yaml
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_namespace.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_fraud.txt
- tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_describe_pod_long-logs-app-84fbcbfb5f-57zrx_ask-holmes-namespace-47.txt
- tests/llm/fixtures/test_ask_holmes/71_connection_pool_starvation/test_case.yaml
- tests/llm/fixtures/test_ask_holmes/50a_logs_since_last_specific_month/test_case.yaml
- tests/llm/fixtures/test_ask_holmes/35_tempo/toolsets.yaml
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_checkout.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_logs_checkout.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_memory_requests_namespace.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_describe_backend_deployment.txt
- tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/manifest.yaml
- tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_traces_by_deployment.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_lineage_children.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/fetch_tempo_trace_by_id_2.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_get_by_kind_in_namespace_depl.txt
- tests/llm/fixtures/test_ask_holmes/47_truncated_logs_context_window/kubectl_find_resource_long-logs-app_pod.txt
- tests/llm/fixtures/test_ask_holmes/35_tempo/kubectl_top_pods.txt
🧰 Additional context used
📓 Path-based instructions (4)
tests/llm/**/*.yaml
📄 CodeRabbit inference engine (CLAUDE.md)
tests/llm/**/*.yaml
: Each LLM test must use a dedicated Kubernetes namespace of the form app-
All pod names in LLM tests must be unique and use neutral naming that does not hint at the problem
Files:
tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml
tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml
tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml
tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml
tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml
tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml
tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml
tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml
tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml
tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml
tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml
tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml
tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml
tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml
tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml
tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml
tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml
tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml
tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml
tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml
tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml
tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml
tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml
tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml
tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml
tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml
tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test files should mirror the source structure under tests/
Files:
tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml
tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml
tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml
tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml
tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml
tests/llm/fixtures/test_ask_holmes/118_new_relic_logs/test_case.yaml
tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml
tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml
tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml
tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml
tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml
tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml
tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml
tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml
tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml
tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml
tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml
tests/llm/fixtures/test_ask_holmes/49_logs_since_last_week/test_case.yaml
tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml
tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml
tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml
tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml
tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml
tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml
tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml
tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml
tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml
tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml
tests/llm/fixtures/test_ask_holmes/91h_datadog_logs_empty_query_with_url/test_case.yaml
tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml
tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml
tests/**/toolsets.yaml
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/toolsets.yaml
: In evals, ALL toolset-specific configuration must be under the config field in toolsets.yaml (do not place toolset config in test_case.yaml)
toolsets.yaml may only use the allowed top-level fields: enabled, name, description, additional_instructions, prerequisites, tools, docs_url, icon_url, installation_instructions, config, url (MCP only)
Files:
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
Define and maintain the canonical list of pytest markers in pyproject.toml
Files:
pyproject.toml
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025年09月28日T19:51:49.611Z
Learning: Applies to tests/**/*.py : Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
📚 Learning: 2025年09月28日T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025年09月28日T19:51:49.611Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets MUST return detailed error messages to enable LLM self-correction
Applied to files:
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025年09月28日T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025年09月28日T19:51:49.611Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : Toolsets must be defined as YAML under holmes/plugins/toolsets (either as {name}.yaml or within a {name}/ directory)
Applied to files:
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025年09月28日T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025年09月28日T19:51:49.611Z
Learning: Applies to tests/**/toolsets.yaml : In evals, ALL toolset-specific configuration must be under the config field in toolsets.yaml (do not place toolset config in test_case.yaml)
Applied to files:
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025年08月05日T00:42:23.792Z
Learnt from: vishiy
PR: robusta-dev/holmesgpt#782
File: config.example.yaml:31-49
Timestamp: 2025年08月05日T00:42:23.792Z
Learning: In robusta-dev/holmesgpt config.example.yaml, the azuremonitorlogs toolset configuration shows "enabled: true" as an example of how to enable the toolset, not as a default setting. The toolset is disabled by default and requires explicit enablement in user configurations.
Applied to files:
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml
📚 Learning: 2025年09月28日T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025年09月28日T19:51:49.611Z
Learning: Applies to pyproject.toml : Define and maintain the canonical list of pytest markers in pyproject.toml
Applied to files:
pyproject.toml
📚 Learning: 2025年09月28日T19:51:49.611Z
Learnt from: CR
PR: robusta-dev/holmesgpt#0
File: CLAUDE.md:0-0
Timestamp: 2025年09月28日T19:51:49.611Z
Learning: Applies to tests/**/*.py : Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
Applied to files:
pyproject.toml
🪛 Checkov (3.2.334)
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml
[medium] 155-190: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 155-190: Minimize the admission of root containers
(CKV_K8S_23)
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml
[medium] 188-249: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 188-249: Minimize the admission of root containers
(CKV_K8S_23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
🔇 Additional comments (60)
tests/llm/fixtures/test_ask_holmes/117b_new_relic_block_embed/test_case.yaml (1)
12-12
: LGTM!The addition of the
embeds
tag is appropriate. Based on the PR changes, this marker has been properly declared inpyproject.toml
.tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/manifest.yaml (5)
1-4
: LGTM!The namespace
app-47
correctly follows the requiredapp-<testid>
naming convention.As per coding guidelines.
13-68
: LGTM!The main application script generates realistic verbose logs with red herring errors (line 57-59) to test the LLM's ability to filter noise. The use of
flush=True
ensures logs are immediately visible.
70-126
: LGTM!The sidecar script effectively simulates a connection pool exhaustion scenario. The critical error appears only once at iteration 61 (lines 112-113), making it a good test of the LLM's ability to identify root causes in verbose multi-container logs.
128-185
: LGTM!The monitoring script generates high-volume verbose logs that add realistic noise to the test scenario, helping validate the LLM's ability to filter relevant information.
191-191
: LGTM!The deployment name
data-processor
uses neutral naming that doesn't hint at the problem being tested.As per coding guidelines.
tests/llm/fixtures/test_ask_holmes/47_multicontainer_logs/test_case.yaml (2)
2-5
: All tags are valid pytest markers. The markerslogs
,context_window
, andkubernetes
are declared inpyproject.toml
.
9-10
: Confirm or align manifest path resolution in tests
Most fixtures usekubectl ... -f manifest.yaml
(no./
), so either remove the./
prefix here for consistency or ensure the test harness runs in the fixture’s directory so that./manifest.yaml
resolves correctly.tests/llm/fixtures/test_ask_holmes/35_error_log_flood/toolsets.yaml (1)
1-5
: LGTM!The toolsets configuration correctly enables the Kubernetes core and logs toolsets needed for this log analysis test. The structure follows the required format.
tests/llm/fixtures/test_ask_holmes/35_error_log_flood/test_case.yaml (4)
1-1
: LGTM!The user prompt correctly references namespace
app-35
, which matches the requiredapp-<testid>
pattern for test ID 35.
7-8
: LGTM!The expected output is clear and specific, correctly identifying the root cause that the LLM should discover (migration failure due to missing locations table).
10-13
: LGTM!The test lifecycle commands correctly apply and clean up the manifest, and the description clearly explains the test objective.
2-5
: No action necessary: tags "logs", "context_window", and "database" are declared in pyproject.toml.tests/llm/fixtures/test_ask_holmes/35_error_log_flood/manifest.yaml (2)
1-4
: LGTM!The namespace correctly follows the required
app-<testid>
naming pattern for test ID 35.
6-153
: LGTM!The Secret correctly contains the test scenario:
- Python app generates a realistic log flood with database errors
- SQL migration deliberately references a non-existent
locations
table (line 129), creating the root cause that the test expects the LLM to identifyThe Secret name uses neutral naming as required.
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml (1)
9-9
: LGTM! Clearer skip reason.The expanded skip_reason provides better context about the prerequisites (port-forward setup) and future improvements needed (Kafka timeout handling). This improves test maintainability.
tests/llm/fixtures/test_ask_holmes/123_new_relic_checkout_errors_tracing/test_case.yaml (2)
20-81
: LGTM: Namespace and pod naming comply with guidelines.The test correctly uses the
app-123
namespace format and employs neutral pod names (billing, traffic-generator) that don't hint at the underlying problem.
13-13
: newrelic pytest marker is declared The "newrelic" tag is already listed under [tool.pytest.ini_options] markers in pyproject.toml (line 122).tests/llm/fixtures/test_ask_holmes/69_rate_limit_exhaustion/test_case.yaml (2)
1-1
: Verify namespace naming convention.The user prompt references "namespace-69", but coding guidelines require the format
app-<testid>
. This test should useapp-69
instead ofnamespace-69
.As per coding guidelines.
5-5
: Verify that the "medium" tag is a declared pytest marker.Please ensure the "medium" tag is properly declared in pyproject.toml.
Based on learnings.
tests/llm/fixtures/test_ask_holmes/68_cascading_failures/test_case.yaml (2)
1-1
: Verify namespace naming convention.The user prompt references "namespace-68", but coding guidelines require the format
app-<testid>
. This test should useapp-68
instead ofnamespace-68
.As per coding guidelines.
6-6
: Verify that the "medium" tag is a declared pytest marker.Please ensure the "medium" tag is properly declared in pyproject.toml.
Based on learnings.
tests/llm/fixtures/test_ask_holmes/66_http_error_needle/test_case.yaml (1)
5-5
: No action needed: 'medium' marker is declared
The "medium" marker is already listed under tool.pytest.ini_options.markers in pyproject.toml.tests/llm/fixtures/test_ask_holmes/45_fetch_deployment_logs_simple/test_case.yaml (1)
7-8
: pytest markers validated — Both "logs" and "kubernetes" are declared under[tool.pytest.ini_options]
in pyproject.toml.tests/llm/fixtures/test_ask_holmes/122_new_relic_checkout_latency_tracing_rebuild/test_case.yaml (2)
20-23
: LGTM! Namespace follows coding guidelines.The namespace
app-122
correctly follows the required patternapp-<testid>
for LLM tests.As per coding guidelines.
8-13
: All tags are valid pytest markers. All five tags—kubernetes, hard, chain-of-causation, traces, and newrelic—are declared in pyproject.toml.tests/llm/fixtures/test_ask_holmes/159_prometheus_high_cardinality_cpu/test_case.yaml (1)
30-38
: LGTM! Namespace follows coding guidelines.The namespace
app-159
correctly follows the required patternapp-<testid>
for LLM tests.As per coding guidelines.
tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml (2)
1-1
: LGTM! Description accurately reflects test purpose.The description change from "pulls logs" to "pulls metrics" correctly aligns with the test's actual functionality.
12-15
: Declare the missing pytest marker "newrelic"
The tag "newrelic" in tests/llm/fixtures/test_ask_holmes/119_new_relic_metrics/test_case.yaml isn’t listed undertool.pytest.ini_options.markers
in pyproject.toml. Add it there (or remove the tag) to comply with the canonical marker list.⛔ Skipped due to learnings
Learnt from: CR PR: robusta-dev/holmesgpt#0 File: CLAUDE.md:0-0 Timestamp: 2025年09月28日T19:51:49.611Z Learning: Applies to pyproject.toml : Define and maintain the canonical list of pytest markers in pyproject.toml
Learnt from: CR PR: robusta-dev/holmesgpt#0 File: CLAUDE.md:0-0 Timestamp: 2025年09月28日T19:51:49.611Z Learning: Applies to tests/**/*.py : Only use pytest markers that are declared in pyproject.toml; never introduce undeclared markers/tags
tests/llm/fixtures/test_ask_holmes/160a_cpu_per_namespace_graph/test_case.yaml (1)
12-14
: LGTM! Tags added for test categorization.The addition of
embeds
,metrics
, andmedium
tags expands test metadata for better categorization and filtering. Based on the PR summary, these correspond to new pytest markers declared in pyproject.toml.tests/llm/fixtures/test_ask_holmes/70_memory_leak_detection/test_case.yaml (1)
5-5
: LGTM! Tag added for test categorization.The addition of the
medium
tag expands test metadata for better categorization.tests/llm/fixtures/test_ask_holmes/33_cpu_metrics_discovery/test_case.yaml (1)
10-10
: LGTM! Tag added for test categorization.The addition of the
metrics
tag appropriately categorizes this CPU metrics discovery test.tests/llm/fixtures/test_ask_holmes/67_performance_degradation/test_case.yaml (2)
5-5
: LGTM! Tag added for test categorization.The addition of the
medium
tag expands test metadata for better categorization.
10-10
: Verify namespace naming convention.The namespace
namespace-67
does not follow the requiredapp-<testid>
format specified in the coding guidelines for LLM tests. This should likely beapp-67
instead for consistency with the guidelines.As per coding guidelines.
tests/llm/fixtures/test_ask_holmes/160c_cpu_per_namespace_graph_with_global_truncation/test_case.yaml (2)
17-19
: LGTM! Tags added for test categorization.The addition of
embeds
,metrics
, andmedium
tags expands test metadata consistently with related fixtures (160a, 160b).
30-31
: Test skip is properly documented.The skip reason clearly explains that the
TOOL_MAX_ALLOCATED_CONTEXT_WINDOW_PCT
environment variable override doesn't work as expected because it's read once at startup. This is good documentation for future reference.tests/llm/fixtures/test_ask_holmes/30_basic_promql_graph_cluster_memory/test_case.yaml (2)
12-16
: Port forward configuration is now explicit and clear.The expanded port_forwards section with explicit
local_port
andremote_port
improves clarity over the previous implicit configuration.
6-10
: All tags declared in pytest markers Verified thatprometheus
,embeds
,metrics
, andmedium
are all listed undermarkers
inpyproject.toml
.tests/llm/fixtures/test_ask_holmes/74_config_change_impact/test_case.yaml (2)
6-10
: Enhanced expected output improves test specification.The addition of specific expected output items about configuration-change impact and performance patterns makes the test expectations clearer and more verifiable.
3-5
: No action needed: pytest markers are declared
The tagslogs
,context_window
, andmedium
appear undertool.pytest.ini_options.markers
inpyproject.toml
.tests/llm/fixtures/test_ask_holmes/48_logs_since_thursday/test_case.yaml (1)
2-5
: All tags are declared pytest markers Thelogs
,datetime
, andmedium
markers are listed under[tool.pytest.ini_options].markers
in pyproject.toml, so no changes are needed.tests/llm/fixtures/test_ask_holmes/57_cluster_name_confusion/test_case.yaml (2)
3-6
: All pytest markers declared
Bothlogs
andmedium
appear under themarkers
list in pyproject.toml; no changes needed.
7-11
: Verify external manifest’s namespace isolation
The external manifest athttps://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/image_pull_backoff/no_such_image.yaml
contains nometadata.namespace
fields. Per LLM test guidelines, resources must be scoped to the dedicated namespaceapp-57
. Confirm that these resources will indeed be applied intoapp-57
—for example, by addingmetadata.namespace: app-57
to the manifest or by invokingkubectl -n app-57 apply ...
.tests/llm/fixtures/test_ask_holmes/121_new_relic_checkout_errors_tracing/test_case.yaml (2)
13-13
: Verify that the "newrelic" tag is a declared pytest marker.Ensure that
newrelic
corresponds to a declared pytest marker inpyproject.toml
. This tag is not mentioned in the PR summary as one of the newly added markers.Based on learnings
1-2
: Consider more neutral pod naming.The pod name
checkout
(lines 26, and referenced throughout the setup) directly identifies the service type and is explicitly mentioned in the user prompt at line 2. According to the coding guidelines, "pod names must be unique and use neutral naming that does not hint at the problem."Since the test investigates errors in the checkout service, using the name
checkout
may hint at which service is problematic. Consider using a more neutral name likeservice-a
,web-app
, orapi-service
to better align with the guideline.As per coding guidelines
Also applies to: 20-26, 50-51
tests/llm/fixtures/test_ask_holmes/98_logs_transparency_default_time/test_case.yaml (1)
19-19
: LGTM: loki tag addition.The
loki
tag is mentioned in the PR summary as one of the newly introduced pytest markers, so this addition is appropriate.tests/llm/fixtures/test_ask_holmes/160b_cpu_per_namespace_graph_with_prom_truncation/test_case.yaml (1)
15-17
: All added tags are declared pytest markers. No changes needed:embeds
,metrics
, andmedium
all appear under[tool.pytest.ini_options.markers]
in pyproject.toml.tests/llm/fixtures/test_ask_holmes/32_basic_promql_graph_pod_cpu/test_case.yaml (3)
3-4
: LGTM! Clear validation criteria for PromQL execution.The expected_output explicitly validates that the tool_name and random_key are present in the response, which provides concrete verification criteria for the test.
12-16
: LGTM! Port forward configuration is now explicit.Adding explicit local_port and remote_port values improves clarity and makes the port forwarding configuration more maintainable.
6-10
: All pytest markers declared. Verified that "embeds", "metrics", and "medium" are present in pyproject.toml.pyproject.toml (1)
121-124
: LGTM! New pytest markers properly declared.The three new markers (loki, embeds, no-cicd) are well-documented and follow the established format. This maintains the canonical marker list as required.
Based on learnings.
tests/llm/fixtures/test_ask_holmes/103_logs_transparency_default_limit/test_case.yaml (1)
18-22
: LGTM! Loki tag addition aligns with pyproject.toml.The "loki" tag is properly added and matches the new marker declared in pyproject.toml. The test also correctly uses the "app-103" namespace following the required naming convention.
tests/llm/fixtures/test_ask_holmes/57_wrong_namespace/test_case.yaml (2)
7-9
: Resolved: 'logs' and 'medium' markers are declared in pyproject.toml
1-5
: Remove namespace naming guideline check for this fixture.
This test deliberately usestest-57
to validate the "wrong namespace" suggestion logic—no changes needed tomanifest.yaml
.Likely an incorrect or invalid review comment.
tests/llm/fixtures/test_ask_holmes/75_network_flapping/test_case.yaml (1)
2-6
: 'medium' tag already declared
Themedium
marker exists in pyproject.toml (line 108).tests/llm/fixtures/test_ask_holmes/100b_historical_logs_nonstandard_label/test_case.yaml (1)
14-14
: LGTM! Tag addition aligns with PR objectives.The "loki" tag correctly categorizes this test fixture as Loki-related, consistent with the metadata expansion across test fixtures.
tests/llm/fixtures/test_ask_holmes/100a_historical_logs/test_case.yaml (2)
1-1
: Minor: Empty line at file start.This is a minor formatting change with no functional impact.
15-15
: LGTM! Tag addition aligns with PR objectives.The "loki" tag correctly categorizes this test fixture as Loki-related.
tests/llm/fixtures/test_ask_holmes/102_loki_label_discovery/test_case.yaml (1)
27-27
: LGTM! Tag addition aligns with PR objectives.The "loki" tag correctly categorizes this Loki label discovery test fixture.
tests/llm/fixtures/test_ask_holmes/101_historical_logs_pod_deleted/test_case.yaml (1)
18-18
: Approve test marker addition
The "loki" marker is declared in pyproject.toml under [tool.pytest.ini_options] (e.g., line 121), so the new tag aligns with the existing pytest configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify all tags are declared pytest markers.
The test adds four new tags: datadog
, metrics
, medium
, and embeds
. While embeds
is confirmed as a new marker in this PR, ensure all other tags (datadog
, metrics
, medium
) are properly declared in pyproject.toml
.
Based on learnings: Only use pytest markers that are declared in pyproject.toml.
🏁 Script executed:
#!/bin/bash # Verify that all tags used in test fixtures are declared as pytest markers in pyproject.toml # Extract markers from pyproject.toml echo "=== Declared pytest markers in pyproject.toml ===" rg -A 50 '^\[tool\.pytest\.ini_options\]' pyproject.toml | rg 'markers\s*=' -A 20 echo "" echo "=== Tags used in this test case ===" echo "datadog, metrics, medium, embeds"
Length of output: 1836
Add missing pytest markers datadog
and embeds
Add both datadog
and embeds
to the markers
list under [tool.pytest.ini_options]
in pyproject.toml so all tags used in tests are declared.
🤖 Prompt for AI Agents
In
tests/llm/fixtures/test_ask_holmes/110_cpu_graph_robusta_runner/test_case.yaml
around lines 28 to 32, the test tags include "datadog" and "embeds" but those
markers are not declared; open pyproject.toml and under
[tool.pytest.ini_options] add "datadog" and "embeds" to the markers list so
every tag used in tests is declared (ensure syntax matches existing marker
entries and save).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify that "medium" tag is a declared pytest marker.
The "medium" tag is being added. Per the learnings, only pytest markers declared in pyproject.toml should be used.
Based on learnings.
Run the following script to verify this tag is declared:
🏁 Script executed:
#!/bin/bash # Description: Verify "medium" is a declared pytest marker rg -A 1 'markers\s*=' pyproject.toml | grep -i medium || echo "⚠️ 'medium' marker not found in pyproject.toml"
Length of output: 46
Declare the "medium" pytest marker or remove its usage
The "medium" tag isn’t listed under [tool.pytest.ini_options] markers
in pyproject.toml. Add it there or eliminate the tag from test_case.yaml
to comply with pytest marker declarations.
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/120_new_relic_traces2/test_case.yaml
around lines 9 to 12 the test uses the "medium" tag but that marker is not
declared in [tool.pytest.ini_options] markers in pyproject.toml; either add a
"medium" marker entry to pyproject.toml under markers (e.g. 'medium:
description') or remove the "medium" tag from this test_case.yaml so pytest
marker declarations remain consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that all tags are declared pytest markers.
The tags prometheus
, embeds
, metrics
, and medium
are being added to this test fixture. According to project learnings, only pytest markers declared in pyproject.toml
should be used.
Based on learnings.
Run the following script to verify these markers are declared:
🏁 Script executed:
#!/bin/bash # Description: Check if the tags used in this test fixture are declared as pytest markers in pyproject.toml # Search for pytest marker declarations rg -A 20 '^\[tool\.pytest\.ini_options\]|^markers\s*=' pyproject.toml
Length of output: 1742
🏁 Script executed:
#!/bin/bash # Show lines around port_forwards in the YAML file rg -n 'port_forwards' -C3 tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml
Length of output: 152
Fix undeclared pytest markers and namespace
- Remove or declare the
prometheus
andembeds
tags in pyproject.toml—onlymetrics
andmedium
are valid markers. - Change
port_forwards.namespace
fromdefault
to the required dedicated namespaceapp-34_memory_graph
(must useapp-<testid>
).
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml around
lines 5 to 10, the YAML uses undeclared pytest markers and the wrong namespace:
remove or declare the prometheus and embeds tags (only metrics and medium are
valid markers per pyproject.toml) — either delete the prometheus and embeds
lines or add them to pyproject.toml under [tool.pytest.ini_options].markers;
also change port_forwards.namespace from default to the required dedicated
namespace app-34_memory_graph (must use app-<testid>) so the test uses the
correct namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a dedicated namespace following the required naming convention.
The namespace default
violates the coding guideline requiring each LLM test to use a dedicated Kubernetes namespace of the form app-<testid>
. For this test, the namespace should be app-34_memory_graph
.
As per coding guidelines.
Apply this diff to fix the namespace:
- - namespace: default + - namespace: app-34_memory_graph service: robusta-kube-prometheus-st-prometheus local_port: 9090 remote_port: 9090
📝 Committable suggestion
!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/34_memory_graph/test_case.yaml around line
12, the namespace is set to "default" which violates the guideline requiring a
dedicated namespace per test; update the namespace value to
"app-34_memory_graph" (i.e., replace the existing namespace entry with
namespace: app-34_memory_graph) so the test uses the correct dedicated
Kubernetes namespace following the naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that the "medium" tag is declared in pyproject.toml.
Per the learnings, only pytest markers declared in pyproject.toml should be used. Confirm that "medium" is a declared marker.
Based on learnings.
🏁 Script executed:
#!/bin/bash # Description: Check if "medium" is declared as a pytest marker in pyproject.toml # Search for pytest marker declarations including "medium" rg -A 20 '^\[tool\.pytest\.ini_options\]' pyproject.toml | rg -i 'markers.*=|medium'
Length of output: 104
Next: Verify namespace and pod naming in kafka-manifest.yaml
🏁 Script executed:
#!/bin/bash # Locate the manifest file and inspect namespace and pod names manifest=$(fd --extension yaml tests/llm/fixtures/test_ask_holmes/55_kafka_runbook | grep -v "test_case.yaml") echo "Inspecting manifest: $manifest" grep -n "namespace:" "$manifest" grep -nP "metadata:\s*name:" "$manifest"
Length of output: 599
Next: Inspect namespace and pod names in all YAML manifests
🏁 Script executed:
#!/bin/bash # List all YAML files in the kafka runbook fixture directory files=$(fd . tests/llm/fixtures/test_ask_holmes/55_kafka_runbook -e yaml) echo "Found YAML files:" echo "$files" # Inspect each for namespace and pod naming for f in $files; do echo "---- $f ----" grep -n "namespace:" "$f" || echo "No namespace found" grep -nP "metadata:\s*$" -A2 "$f" | grep -n "name:" || echo "No metadata.name found" done
Length of output: 1107
Fix namespace and duplicate pod names in kafka-manifest.yaml
- Namespace must follow app- (e.g. app-55) instead of ask-holmes-namespace-55 (in tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/app/kafka-manifest.yaml).
- Duplicate metadata.name "zookeeper" found; ensure each pod/service name is unique.
🤖 Prompt for AI Agents
In tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/test_case.yaml around
lines 6-6, the test references a manifest that contains incorrect namespace and
duplicate pod/service names; open
tests/llm/fixtures/test_ask_holmes/55_kafka_runbook/app/kafka-manifest.yaml and
change the namespace to the required pattern app-55 (replace
ask-holmes-namespace-55), then ensure every metadata.name in pods/services is
unique (rename duplicate "zookeeper" resources to distinct names like
zookeeper-0 / zookeeper-service or similar) and update any matching selectors,
service names, env vars, and references accordingly so names remain consistent
across manifests.
No description provided.