fix: simplify event buffer #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

lindesvard wants to merge 1 commit into main

from feature/simplify-event-buffer

Open

fix: simplify event buffer #218

lindesvard wants to merge 1 commit into main from feature/simplify-event-buffer

Conversation

@lindesvard

Copy link

Contributor

@lindesvard lindesvard commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

Summary by CodeRabbit

Release Notes

Tests
- Reorganized event buffer test suite with new coverage for buffering, processing order, and event ordering logic.
Performance
- Increased database request timeout and connection idle time limits.
- Added asynchronous insert batching for improved ClickHouse performance.
- Optimized metrics queries with 7-day session lookback for better efficiency.
Changes
- Removed duration from chart property list.
- Standardized duration calculations across metrics and realtime queries.
- Simplified event buffer storage and processing.

@lindesvard


 fix: simplify event buffer

@vercel

Copy link

vercel bot commented Oct 22, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
openpanel-public	Error Error	Oct 22, 2025 10:33am

@coderabbitai

Copy link

Contributor

coderabbitai bot commented Oct 22, 2025 •

edited

Loading

Walkthrough

This refactor simplifies the event buffering system by replacing a complex Lua-script-driven Redis implementation with a single-list approach, introduces centralized duration calculation as a helper function, updates ClickHouse client configuration for improved performance, and refactors service-layer queries to use inline SQL builders for consistency and maintainability.

Changes

Cohort / File(s)	Summary
Event Buffer Implementation `packages/db/src/buffers/event-buffer.ts`, `packages/db/src/buffers/event-buffer.test.ts`	Replaced complex Lua-script-driven buffering with simplified single-list Redis approach. Tests restructured to validate new ordering, chunking, and processing behavior. Added private redis and redisKey fields. Updated `getLastScreenView` to simplified overloads accepting `{projectId, profileId}` or `{projectId, sessionId}`.
ClickHouse Client Configuration `packages/db/src/clickhouse/client.ts`	Added `events_imports` to `TABLE_NAMES`. Increased `request_timeout` (60000 → 300000) and `idle_socket_ttl` (8000 → 60000). Wrapped insert operations with retry logic and augmented clickhouse_settings with async insert, timeouts, and parsing defaults.
Duration Calculation `packages/db/src/services/event.service.ts`	Introduced exported helper `getDurationSql(field?: string)` computing window-function-based duration across sessions. Updated `getEventList` to use dynamic duration expression via `getDurationSql()` instead of static selection.
Service Layer Query Refactoring `packages/db/src/services/overview.service.ts`	Removed public `getTotalSessions`. Added private helpers: `toStartOfInterval`, `toInterval`, `buildWhereClause`. Rewrote `getMetrics`, `getTopPages`, `getTopEntryExit`, `getTopGeneric` to use inline SQL with unified filtering logic and 7-day session lookback.
Profile Service Metrics `packages/db/src/services/profile.service.ts`	Refactored `getProfileMetrics` to source duration from sessions table instead of events. Updated aliases: `sessions` → `sessionsCount`, `duration` → `sessionDuration`. Adjusted P90 and inter-session calculations accordingly.
API Router Updates `packages/trpc/src/routers/chart.ts`, `packages/trpc/src/routers/realtime.ts`	Removed `'duration'` from chart router properties endpoint. Added `getDurationSql` import in realtime router and replaced hardcoded duration calculations with dynamic `getDurationSql()` expressions in paths, referrals, and geo queries.

Sequence Diagram(s)

sequenceDiagram
 participant App
 participant EventBuffer
 participant Redis
 participant ClickHouse
 
 rect rgb(220, 240, 255)
 Note over App,ClickHouse: Event Processing Flow (New)
 
 App->>EventBuffer: addEvent(event)
 EventBuffer->>Redis: push to single list
 EventBuffer->>Redis: increment counter
 
 App->>EventBuffer: processBuffer()
 EventBuffer->>Redis: read all events from list
 EventBuffer->>EventBuffer: parse & sort by creation_time
 
 loop chunk by EVENT_BUFFER_CHUNK_SIZE
 EventBuffer->>ClickHouse: insert batch with retry
 ClickHouse-->>EventBuffer: success
 end
 
 EventBuffer->>App: emit "saved" events
 EventBuffer->>Redis: trim/clear buffer list
 end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Multiple files with heterogeneous changes spanning buffering logic, configuration, SQL query construction, and API endpoints. Event buffer rewrite eliminates complex Lua scripts but introduces new ordering/chunking logic. Overview service refactor replaces previous query patterns with inline SQL builders. Changes are interdependent and require understanding how duration calculations, filtering, and data sources interact across layers.

Poem

🐰 Hops through buffers lean and light,
No Lua scripts to haunt the night—
Duration helpers, queries clean,
The fastest hop you've ever seen!
ClickHouse hums a grateful song, 🎵
When events flow right where they belong!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "fix: simplify event buffer" directly and accurately reflects the primary change across this pull request. The raw_summary confirms that the event buffer implementation has been substantially refactored from a complex, Lua-script-driven Redis buffering system to a simplified, single-list approach. The title is concise, uses clear and specific language, and avoids vague terminology. A reviewer scanning the commit history would immediately understand that this changeset simplifies the event buffer system, which aligns with the supporting changes made to related services and tests throughout the codebase.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/simplify-event-buffer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai[bot]

coderabbitai bot reviewed

Oct 22, 2025

View reviewed changes

Copy link

Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

packages/db/src/buffers/event-buffer.ts (2)
48-96: Verify event publication timing and Redis operation success.

The publishEvent call at line 85 occurs after multi.exec() but doesn't verify the Redis operations succeeded. If the multi execution fails (e.g., Redis connection issue), the event is still published as "received", creating inconsistency between what's buffered and what's announced.

Additionally, in bulk-add scenarios (when _multi is provided), getBufferSize() at line 88 reads the buffer counter before the bulk multi is executed, returning stale data. This could prevent auto-flush when the buffer actually exceeds batchSize.

Consider checking the multi.exec() result before publishing:
 if (!_multi) {
- await multi.exec();
+ const result = await multi.exec();
+ if (!result || result.some(([err]) => err)) {
+ this.logger.error('Failed to execute Redis operations', { result });
+ return;
+ }
 }
 await publishEvent('events', 'received', transformEvent(event));
And reconsider buffer size checking in bulk scenarios:
- // Check buffer length using counter
- const bufferLength = await this.getBufferSize();
-
- if (bufferLength >= this.batchSize) {
- await this.tryFlush();
- }
+ // Only check buffer size if not in bulk mode (counter would be stale)
+ if (!_multi) {
+ const bufferLength = await this.getBufferSize();
+ if (bufferLength >= this.batchSize) {
+ await this.tryFlush();
+ }
+ }
153-212: ClickHouse events table uses MergeTree (no deduplication)—atomicity issue is real.

The concern is valid: ClickHouse events table is defined with MergeTree (append-only, no deduplication), not ReplacingMergeTree. With ORDER BY: [project_id, toDate(created_at), profile_id, name], the event.id is not part of any deduplication key.

If ClickHouse insertion succeeds but Redis trimming (line 200-204) fails, the same events will be reprocessed on the next flush and inserted as true duplicates. API-level deduplication (100ms lock) only protects against immediate duplicate HTTP requests—not buffer reprocessing after failures.

Consider marking events as "processing" before insertion, or wrap ClickHouse insert + Redis trim in a retry loop with exponential backoff to ensure atomicity.

🧹 Nitpick comments (8)

packages/db/src/services/event.service.ts (1)
573-574: Don’t log raw SQL in production.

Gate behind an env flag or remove to avoid noisy logs and potential PII leakage.
- console.log('getSql() ----> ', getSql());
+ if (process.env.DEBUG_SQL === '1') {
+ // eslint-disable-next-line no-console
+ console.log('getSql() ----> ', getSql());
+ }
packages/db/src/services/profile.service.ts (1)
43-45: Prefer counting sessions from the sessions table for consistency.

Counting session_start events can drift; use sessions with sign = 1.
- sessionsCount AS (
- SELECT count(*) as sessions FROM ${TABLE_NAMES.events} WHERE name = 'session_start' AND profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}
- ),
+ sessionsCount AS (
+ SELECT countIf(sign = 1) as sessions 
+ FROM ${TABLE_NAMES.sessions} 
+ WHERE profile_id = ${sqlstring.escape(profileId)} 
+ AND project_id = ${sqlstring.escape(projectId)}
+ ),
packages/db/src/clickhouse/client.ts (1)
91-129: Broaden retryable error detection.

Consider common network errors to reduce transient failures.
- if (
- error.message.includes('Connect') ||
- error.message.includes('socket hang up') ||
- error.message.includes('Timeout error')
- ) {
+ const msg = String(error?.message || '');
+ if (
+ /Connect|socket hang up|Timeout error|ETIMEDOUT|ECONNRESET|EPIPE|ECONNREFUSED|ENOTFOUND/i.test(
+ msg,
+ )
+ ) {
packages/db/src/services/overview.service.ts (4)
99-125: Support ‘minute’ in toStartOfInterval.

Add toStartOfMinute to avoid defaulting to day for minute bins.
 switch (interval) {
+ case 'minute':
+ return `toStartOfMinute(${field}${tzPart})`;
 case 'hour':
 return `toStartOfHour(${field}${tzPart})`;
555-570: Apply the same fix for the non-page-filter branch.

Aligns logic and avoids including tombstones.
 SELECT
 ${mode}_origin AS origin,
 ${mode}_path AS path,
- round(avg(duration * sign) / 1000, 2) AS avg_duration,
- round(sum(sign * is_bounce) * 100.0 / sum(sign), 2) AS bounce_rate,
- sum(sign) AS sessions
+ round(avgIf(duration, duration > 0 AND sign > 0) / 1000, 2) AS avg_duration,
+ round(sumIf(is_bounce, sign = 1) * 100.0 / nullIf(sumIf(sign, sign = 1), 0), 2) AS bounce_rate,
+ sumIf(sign, sign = 1) AS sessions
480-481: Remove or gate SQL debug log.

Avoid logging full queries in production.
- console.log('sql', sql);
+ if (process.env.DEBUG_SQL === '1') {
+ // eslint-disable-next-line no-console
+ console.log('sql', sql);
+ }
304-309: ROLLUP totals row: don’t assume first position.

Find the rollup row via WHERE date IS NULL (or isNull(date)), not res[0], to be robust across CH versions/orderings.

If helpful, I can provide a small post-processing helper to split totals vs. series reliably.
packages/db/src/buffers/event-buffer.ts (1)
109-141: Reconsider the sessionId overload if it's not supported.

The method provides an overload accepting sessionId (lines 109-115), but the implementation always returns null for session-based lookups with a comment stating it's not supported (lines 138-140). This creates a misleading API surface—callers might expect it to work based on the type signature.

Consider either:

Removing the sessionId overload entirely if it's not planned for implementation

If it's a temporary limitation, add @deprecated JSDoc or mark it clearly in documentation
- public async getLastScreenView({
- projectId,
- sessionId,
- }: {
- projectId: string;
- sessionId: string;
- }): Promise<IServiceEvent | null>;
 public async getLastScreenView({
 projectId,
 profileId,
- sessionId,
 }: {
 projectId: string;
 profileId?: string;
- sessionId?: string;
 }): Promise<IServiceEvent | null> {
Note: Line 126 uses getRedisCache() instead of this.redis—verify this is intentional or consolidate to this.redis for consistency.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49a4f5b and 9453373.

📒 Files selected for processing (8)

packages/db/src/buffers/event-buffer.test.ts (3 hunks)
packages/db/src/buffers/event-buffer.ts (3 hunks)
packages/db/src/clickhouse/client.ts (2 hunks)
packages/db/src/services/event.service.ts (3 hunks)
packages/db/src/services/overview.service.ts (7 hunks)
packages/db/src/services/profile.service.ts (3 hunks)
packages/trpc/src/routers/chart.ts (0 hunks)
packages/trpc/src/routers/realtime.ts (4 hunks)

💤 Files with no reviewable changes (1)

packages/trpc/src/routers/chart.ts

🧰 Additional context used

🧬 Code graph analysis (5)

packages/db/src/buffers/event-buffer.test.ts (2)

packages/db/src/clickhouse/client.ts (1)

ch (131-161)

packages/db/src/buffers/event-buffer.ts (1)

EventBuffer (16-249)

packages/trpc/src/routers/realtime.ts (1)

packages/db/src/services/event.service.ts (1)

getDurationSql (1020-1021)

packages/db/src/buffers/event-buffer.ts (4)

packages/redis/redis.ts (3)

Redis (9-9)

getRedisCache (66-72)

getRedisPub (84-90)

packages/db/src/services/event.service.ts (3)

IClickhouseEvent (68-102)

IServiceEvent (145-182)

transformEvent (104-138)

packages/json/index.ts (1)

getSafeJson (3-9)

packages/redis/publisher.ts (1)

publishEvent (28-36)

packages/db/src/services/profile.service.ts (1)

packages/db/src/clickhouse/client.ts (1)

TABLE_NAMES (48-60)

packages/db/src/services/overview.service.ts (3)

packages/validation/src/types.validation.ts (1)

IChartEventFilter (28-28)

packages/db/src/services/chart.service.ts (1)

getEventFiltersWhereClause (219-470)

packages/db/src/clickhouse/client.ts (3)

TABLE_NAMES (48-60)

formatClickhouseDate (208-219)

chQuery (201-206)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-and-push-dashboard
GitHub Check: build-and-push-worker
GitHub Check: build-and-push-api

🔇 Additional comments (2)

packages/db/src/services/event.service.ts (1)

1020-1021: leadInFrame and lead are not semantically equivalent; proposed changes require careful testing.

The proposed diff has both improvements and semantic changes:

Valid improvements:

Null-safety via coalesce(..., 0) and greatest(..., 0) prevents NULL results and negative durations

Semantic concern:
leadInFrame respects the frame bounds specified in the window clause, while lead always uses an unbounded following frame regardless of explicit frame specification. The current code uses ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING with leadInFrame; the proposed code removes this frame and uses lead instead. While functionally similar for typical forward-offset cases, this IS a behavioral change that could affect edge cases or NULL handling.

GROUP BY warning is valid: The note about avoiding window functions inside aggregate arguments is correct ClickHouse constraint, though getDurationSql is not currently used inside aggregates—only in SELECT clauses.

Recommendation: Verify that replacing leadInFrame(...) OVER (... ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) with lead(...) OVER (...) produces identical results in your test cases before deploying. The null-safety improvements are solid; the function substitution needs validation.

packages/db/src/buffers/event-buffer.test.ts (1)

1-363: Excellent test coverage for the refactored buffer implementation.

The test suite comprehensively validates the new single-list buffering approach:

Buffer operations (add, bulkAdd, size tracking)

Processing flow (ordering, chunking, batch sizing)

Edge cases (empty buffer, environment configuration)

Integration points (ClickHouse insertion, active visitors)

Data integrity (full event fields, serialization)

The tests properly isolate state with beforeEach cleanup, restore environment variables, and use appropriate mocks. This gives good confidence in the refactored implementation.

packages/db/src/services/event.service.ts

Comment on lines 512 to 514

if (select.duration) {

sb.select.duration = 'duration';

sb.select.duration = `${getDurationSql()} as duration`;

}

Copy link

Contributor

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duration can be NULL/negative; coalesce and clamp to zero.

Lead-based diff returns NULL for session tail and can go negative with clock skew. That breaks consumers expecting a number.

Apply this by hardening the helper (below); the call here can stay unchanged once the helper is fixed.

🤖 Prompt for AI Agents

In packages/db/src/services/event.service.ts around lines 512 to 514, the
duration expression can be NULL or negative (lead-based diff for session tail
and clock skew), so update the helper used by getDurationSql() to return SQL
that clamps negatives to zero and coalesces NULLs to 0 (e.g., wrap the computed
duration in GREATEST(..., 0) and COALESCE(..., 0) or equivalent for the SQL
dialect). Keep the call here unchanged — just harden the helper so
sb.select.duration receives a non-null, non-negative numeric duration.

packages/db/src/services/overview.service.ts

Comment on lines +262 to +279

SELECT

${startOfInterval} AS date,

any(ss.bounce_rate) AS bounce_rate,

uniq(e.profile_id) AS unique_visitors,

uniq(e.session_id) AS total_sessions,

any(ss.avg_session_duration) AS avg_session_duration,

count(*) AS total_screen_views,

round(count(*) * 1.0 / nullIf(uniq(e.session_id), 0), 2) AS views_per_session

FROM ${TABLE_NAMES.events} e

LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date

WHERE e.project_id = ${sqlstring.escape(projectId)}

AND e.created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})

AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})

AND e.name = 'screen_view'

${eventsWhere ? `AND ${eventsWhere}` : ''}

GROUP BY date

WITH ROLLUP

ORDER BY date ASC WITH FILL

Copy link

Contributor

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

bounce_stats CTE not joined; wrong alias used.

any(ss.bounce_rate) references a non-existent column; join bounce_stats and use its alias.

- SELECT
- ${startOfInterval} AS date,
- any(ss.bounce_rate) AS bounce_rate,
+ SELECT
+ ${startOfInterval} AS date,
+ any(bs.bounce_rate) AS bounce_rate,
 uniq(e.profile_id) AS unique_visitors,
 uniq(e.session_id) AS total_sessions,
 any(ss.avg_session_duration) AS avg_session_duration,
 count(*) AS total_screen_views,
 round(count(*) * 1.0 / nullIf(uniq(e.session_id), 0), 2) AS views_per_session
 FROM ${TABLE_NAMES.events} e
- LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date
+ LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date
+ LEFT JOIN bounce_stats bs ON ${startOfInterval} = bs.date

📝 Committable suggestion

!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SELECT

${startOfInterval} AS date,

any(ss.bounce_rate) AS bounce_rate,

uniq(e.profile_id) AS unique_visitors,

uniq(e.session_id) AS total_sessions,

any(ss.avg_session_duration) AS avg_session_duration,

count(*) AS total_screen_views,

round(count(*) * 1.0 / nullIf(uniq(e.session_id), 0), 2) AS views_per_session

FROM ${TABLE_NAMES.events} e

LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date

WHERE e.project_id = ${sqlstring.escape(projectId)}

AND e.created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})

AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})

AND e.name = 'screen_view'

${eventsWhere ? `AND ${eventsWhere}` : ''}

GROUP BY date

WITH ROLLUP

ORDER BY date ASC WITH FILL

SELECT

${startOfInterval} AS date,

any(bs.bounce_rate) AS bounce_rate,

uniq(e.profile_id) AS unique_visitors,

uniq(e.session_id) AS total_sessions,

any(ss.avg_session_duration) AS avg_session_duration,

count(*) AS total_screen_views,

round(count(*) * 1.0 / nullIf(uniq(e.session_id), 0), 2) AS views_per_session

FROM ${TABLE_NAMES.events} e

LEFT JOIN session_stats ss ON ${startOfInterval} = ss.date

LEFT JOIN bounce_stats bs ON ${startOfInterval} = bs.date

WHERE e.project_id = ${sqlstring.escape(projectId)}

AND e.created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})

AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})

AND e.name = 'screen_view'

${eventsWhere ? `AND ${eventsWhere}` : ''}

GROUP BY date

WITH ROLLUP

ORDER BY date ASC WITH FILL

🤖 Prompt for AI Agents

In packages/db/src/services/overview.service.ts around lines 262 to 279, the
query is referencing any(ss.bounce_rate) but the bounce_stats CTE is never
joined and uses the wrong alias; fix by adding a LEFT JOIN bounce_stats bs ON
${startOfInterval} = bs.date (matching how session_stats is joined) and change
any(ss.bounce_rate) to any(bs.bounce_rate) so the bounce_rate column comes from
the correct CTE while keeping session_stats (ss) for avg_session_duration.

packages/db/src/services/overview.service.ts

Comment on lines +526 to +543

SELECT

${mode}_origin AS origin,

${mode}_path AS path,

round(avg(duration * sign) / 1000, 2) AS avg_duration,

round(sum(sign * is_bounce) * 100.0 / sum(sign), 2) AS bounce_rate,

sum(sign) AS sessions

FROM ${TABLE_NAMES.sessions}

WHERE project_id = ${sqlstring.escape(projectId)}

AND created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})

AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})

AND id IN (SELECT session_id FROM distinct_sessions)

${sessionsWhere ? `AND ${sessionsWhere}` : ''}

GROUP BY ${mode}_origin, ${mode}_path

HAVING sum(sign) > 0

ORDER BY sessions DESC

LIMIT ${limit} OFFSET ${offset}

Copy link

Contributor

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Correct session aggregations with Collapsing tables.

Use avgIf(duration, duration > 0 AND sign > 0) and sumIf-based bounce/sessions.

 SELECT
 ${mode}_origin AS origin,
 ${mode}_path AS path,
- round(avg(duration * sign) / 1000, 2) AS avg_duration,
- round(sum(sign * is_bounce) * 100.0 / sum(sign), 2) AS bounce_rate,
- sum(sign) AS sessions
+ round(avgIf(duration, duration > 0 AND sign > 0) / 1000, 2) AS avg_duration,
+ round(sumIf(is_bounce, sign = 1) * 100.0 / nullIf(sumIf(sign, sign = 1), 0), 2) AS bounce_rate,
+ sumIf(sign, sign = 1) AS sessions

📝 Committable suggestion

!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SELECT

${mode}_origin AS origin,

${mode}_path AS path,

round(avg(duration*sign) / 1000, 2) AS avg_duration,

round(sum(sign *is_bounce) * 100.0 / sum(sign), 2) AS bounce_rate,

sum(sign) AS sessions

FROM ${TABLE_NAMES.sessions}

WHERE project_id = ${sqlstring.escape(projectId)}

AND created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})

AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})

AND id IN (SELECT session_id FROM distinct_sessions)

${sessionsWhere ? `AND ${sessionsWhere}` : ''}

GROUP BY ${mode}_origin, ${mode}_path

HAVING sum(sign) > 0

ORDER BY sessions DESC

LIMIT ${limit} OFFSET ${offset}

SELECT

${mode}_origin AS origin,

${mode}_path AS path,

round(avgIf(duration,duration>0ANDsign>0) / 1000, 2) AS avg_duration,

round(sumIf(is_bounce,sign =1) * 100.0 / nullIf(sumIf(sign,sign=1),0), 2) AS bounce_rate,

sumIf(sign,sign=1) AS sessions

FROM ${TABLE_NAMES.sessions}

WHERE project_id = ${sqlstring.escape(projectId)}

AND created_at BETWEEN toDateTime(${sqlstring.escape(formatClickhouseDate(startDate))})

AND toDateTime(${sqlstring.escape(formatClickhouseDate(endDate))})

AND id IN (SELECT session_id FROM distinct_sessions)

${sessionsWhere ? `AND ${sessionsWhere}` : ''}

GROUP BY ${mode}_origin, ${mode}_path

HAVING sum(sign) > 0

ORDER BY sessions DESC

LIMIT ${limit} OFFSET ${offset}

🤖 Prompt for AI Agents

In packages/db/src/services/overview.service.ts around lines 526 to 543, the
current aggregates use plain avg/sum which miscounts when using
CollapsingMergeTree semantics; replace avg(duration * sign) / 1000 with
avgIf(duration, duration > 0 AND sign > 0) / 1000 (or round(avgIf(duration,
duration > 0 AND sign > 0) / 1000, 2)), replace round(sum(sign * is_bounce) *
100.0 / sum(sign), 2) with round(sumIf(is_bounce, duration > 0 AND sign > 0) *
100.0 / sumIf(sign, duration > 0 AND sign > 0), 2), and replace sum(sign) AS
sessions with sumIf(sign, duration > 0 AND sign > 0) AS sessions; keep same
GROUP BY, HAVING and ORDER clauses but ensure HAVING uses sumIf(sign, duration >
0 AND sign > 0) > 0.

packages/db/src/services/profile.service.ts

Comment on lines +46 to 52

sessionDuration AS (

SELECT

round(avg(duration) / 1000 / 60, 2) as durationAvg,

round(quantilesExactInclusive(0.9)(duration)[1] / 1000 / 60, 2) as durationP90

FROM ${TABLE_NAMES.events}

WHERE name = 'session_end' AND duration != 0 AND profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}

avg(duration) / 1000 as durationAvg,

quantilesExactInclusive(0.9)(duration)[1] / 1000 as durationP90

FROM ${TABLE_NAMES.sessions}

WHERE profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}

Copy link

Contributor

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Filter sessions by sign=1 for duration stats.

Without sign = 1, collapsed rows (or tombstones) skew avg and P90.

- sessionDuration AS (
+ sessionDuration AS (
 SELECT 
- avg(duration) / 1000 as durationAvg, 
- quantilesExactInclusive(0.9)(duration)[1] / 1000 as durationP90 
- FROM ${TABLE_NAMES.sessions} 
- WHERE profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}
+ avgIf(duration, sign > 0) / 1000 as durationAvg, 
+ quantilesExactInclusive(0.9)(multiIf(sign > 0, duration, NULL)) [1] / 1000 as durationP90 
+ FROM ${TABLE_NAMES.sessions} 
+ WHERE profile_id = ${sqlstring.escape(profileId)} 
+ AND project_id = ${sqlstring.escape(projectId)}
+ AND sign = 1
 ),

📝 Committable suggestion

!!️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sessionDuration AS (

SELECT

round(avg(duration) / 1000 / 60, 2) as durationAvg,

round(quantilesExactInclusive(0.9)(duration)[1] / 1000 / 60, 2) as durationP90

FROM ${TABLE_NAMES.events}

WHERE name = 'session_end' AND duration != 0 AND profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}

avg(duration) / 1000 as durationAvg,

quantilesExactInclusive(0.9)(duration)[1] / 1000 as durationP90

FROM ${TABLE_NAMES.sessions}

WHERE profile_id = ${sqlstring.escape(profileId)} AND project_id = ${sqlstring.escape(projectId)}

sessionDuration AS (

SELECT

avgIf(duration, sign > 0) / 1000 as durationAvg,

quantilesExactInclusive(0.9)(multiIf(sign > 0, duration, NULL))[1] / 1000 as durationP90

FROM ${TABLE_NAMES.sessions}

WHERE profile_id = ${sqlstring.escape(profileId)}

AND project_id = ${sqlstring.escape(projectId)}

AND sign = 1

🤖 Prompt for AI Agents

In packages/db/src/services/profile.service.ts around lines 46 to 52, the
sessionDuration CTE’s WHERE clause is missing a filter on sign=1 which allows
collapsed/tombstone rows to skew duration stats; update the WHERE clause to
include AND sign = 1 (keeping existing escaped profileId and projectId) so
avg(duration) and quantilesExactInclusive(0.9)(duration) only consider
positive-sign sessions.

packages/trpc/src/routers/realtime.ts

Comment on lines 67 to 72

'origin',

'path',

'COUNT(*) as count',

'round(avg(duration)/1000, 2) as avg_duration',

`round(avg(${getDurationSql()})/1000, 2) as avg_duration`,

])

.from(TABLE_NAMES.events)

Copy link

Contributor

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Window function inside avg will fail; precompute duration in a subquery.

ClickHouse doesn’t allow window funcs inside aggregate args with GROUP BY. Compute duration first, then aggregate.

Apply these diffs:

Paths

- .from(TABLE_NAMES.events)
- .where('project_id', '=', input.projectId)
- .where('path', '!=', '')
- .where('created_at', '>=', formatClickhouseDate(subMinutes(new Date(), 30)))
- .groupBy(['path', 'origin'])
+ .from(
+ clix.exp(`(
+ SELECT
+ origin,
+ path,
+ session_id,
+ ${getDurationSql()} AS duration
+ FROM ${TABLE_NAMES.events}
+ WHERE project_id = ${sqlstring.escape(input.projectId)}
+ AND path != ''
+ AND created_at >= '${formatClickhouseDate(subMinutes(new Date(), 30))}'
+ )`)
+ )
+ .groupBy(['path', 'origin'])
 .orderBy('count', 'DESC')
 .limit(100)
 .execute();

Referrals

- .from(TABLE_NAMES.events)
- .where('project_id', '=', input.projectId)
- .where('referrer_name', 'IS NOT NULL')
- .where('created_at', '>=', formatClickhouseDate(subMinutes(new Date(), 30)))
- .groupBy(['referrer_name'])
+ .from(
+ clix.exp(`(
+ SELECT
+ referrer_name,
+ session_id,
+ ${getDurationSql()} AS duration
+ FROM ${TABLE_NAMES.events}
+ WHERE project_id = ${sqlstring.escape(input.projectId)}
+ AND referrer_name IS NOT NULL
+ AND created_at >= '${formatClickhouseDate(subMinutes(new Date(), 30))}'
+ )`)
+ )
+ .groupBy(['referrer_name'])
 .orderBy('count', 'DESC')
 .limit(100)
 .execute();

Geo

- .from(TABLE_NAMES.events)
- .where('project_id', '=', input.projectId)
- .where('created_at', '>=', formatClickhouseDate(subMinutes(new Date(), 30)))
- .groupBy(['country', 'city'])
+ .from(
+ clix.exp(`(
+ SELECT
+ country,
+ city,
+ session_id,
+ ${getDurationSql()} AS duration
+ FROM ${TABLE_NAMES.events}
+ WHERE project_id = ${sqlstring.escape(input.projectId)}
+ AND created_at >= '${formatClickhouseDate(subMinutes(new Date(), 30))}'
+ )`)
+ )
+ .groupBy(['country', 'city'])
 .orderBy('count', 'DESC')
 .limit(100)
 .execute();

Optional: use round(avgIf(duration, duration > 0)/1000, 2) to ignore zero/NULL durations.

Confirm whether these should be restricted to name = 'screen_view'. If yes, add it inside the subqueries.

Also applies to: 96-100, 124-130

I'll verify the technical claim and code locations for this review comment. Let me gather the necessary information.

🏁 Script executed:

#!/bin/bash
# Find getDurationSql implementation and check what it returns
rg -n "getDurationSql" --type=ts -B2 -A5 | head -50

Length of output: 3252

🏁 Script executed:

#!/bin/bash
# View the actual code at the specified line ranges in realtime.ts
fd "realtime.ts" --type=f | xargs -I {} sh -c 'echo "=== File: {} ===" && sed -n "60,75p" {} && echo "" && sed -n "90,105p" {} && echo "" && sed -n "118,135p" {}'

Length of output: 1651

🏁 Script executed:

#!/bin/bash
# Check what clix is and how exp() works - verify pattern used
rg -n "clix\.exp|from.*clix\.exp" --type=ts -B2 -A2 | head -30

Length of output: 2250

🌐 Web query:

ClickHouse window functions inside aggregate functions GROUP BY

💡 Result:

Short answer: Yes — ClickHouse supports using aggregate functions as window functions (e.g. sum(...) OVER (...)), but you cannot directly nest window expressions inside other aggregate/GROUP BY clauses. Compute aggregates either as window functions with an OVER() clause, or do the GROUP BY aggregation first and then apply window functions in an outer query (or vice‐versa depending on intent).

Key points and examples:

ClickHouse supports aggregate_function(column) OVER (PARTITION BY ... ORDER BY ... [ROWS|RANGE ...]). Use this when you want an aggregate computed per row across a frame/partition. [1]
Example:
SELECT
id,
value,
sum(value) OVER (PARTITION BY id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM events; [1]
You cannot mix a plain GROUP BY aggregate and a window expression in invalid contexts (e.g. putting a window call inside another aggregate or using aggregates incorrectly in WHERE/GROUP BY) — that produces illegal aggregation errors. If you need both, do one in a subquery and the other in an outer query. [2][3]
Example (aggregate then window):
SELECT id, value, avg_value, value - avg_value AS diff
FROM (
SELECT id, value, avg(value) OVER (PARTITION BY id) AS avg_value
FROM events
); [1][3]

References:

ClickHouse window functions docs (syntax, aggregate functions as window functions). [1]
Common ILLEGAL_AGGREGATION explanation / patterns and fixes (mixing aggregates and window usage). [2]
ClickHouse aggregate functions reference. [3]

Would you like a concrete example using your table schema so I can show the subquery vs. OVER() patterns?

I need to find the actual implementation of getDurationSql() to confirm whether it returns a window function that cannot be nested in aggregate functions.


---
🏁 Script executed:
```shell
#!/bin/bash
# Find the getDurationSql implementation in the db package
fd "event.service.ts" --type=f | xargs rg -n "export.*getDurationSql|function getDurationSql|const getDurationSql" -A10

Length of output: 250

🏁 Script executed:

#!/bin/bash
# Also search for any getDurationSql definition/export more broadly
rg -n "getDurationSql\s*=" --type=ts -A5 | head -40

Length of output: 334

🏁 Script executed:

#!/bin/bash
# Check the imports in realtime.ts to confirm the source
rg -n "from '@openpanel/db'" packages/trpc/src/routers/realtime.ts -A10 -B2

Length of output: 435

Now I need to verify the specific implementation details of the proposed diffs and whether they match codebase patterns.


---
🏁 Script executed:
```shell
#!/bin/bash
# Check how sqlstring is used with project_id in codebase
rg -n "sqlstring.escape.*projectId|sqlstring.escape.*project_id" --type=ts -B2 -A2 | head -30

Length of output: 2369

🏁 Script executed:

#!/bin/bash
# Check if paths/referrals/geo queries in realtime.ts filter by event name elsewhere
rg -n "name.*=|name.*IN" packages/trpc/src/routers/realtime.ts -B5 -A5 | grep -A5 -B5 "paths\|referrals\|coordinates"

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Check overview.ts for similar queries to see if name filter is used
rg -n "\.where.*'name'" packages/trpc/src/routers/overview.ts -B3 -A3 | head -40

Length of output: 1300

I found one critical issue with the proposed diffs. Let me verify how clix.exp() works with .from():


---
🏁 Script executed:
```shell
#!/bin/bash
# Check if clix.exp() is used in .from() anywhere in codebase
rg -n "\.from\s*\(" --type=ts -A1 packages/trpc/src/routers/ | grep -B1 "clix.exp\|exp\("

Length of output: 23

🏁 Script executed:

#!/bin/bash
# Check how clix handles subqueries or raw SQL in from clause
rg -n "clix.*\.from" --type=ts -B2 -A3 | head -50

Length of output: 959

🏁 Script executed:

#!/bin/bash
# Look for any existing examples of using raw SQL as a table/subquery with clix
rg -n "SELECT|from.*\(" packages/trpc/src/routers/realtime.ts -B1 -A1 | head -40

Length of output: 613

I found a critical issue: the proposed diffs use .from(clix.exp()) which doesn't appear in the codebase. Let me check the correct pattern:


---
🏁 Script executed:
```shell
#!/bin/bash
# Find how the coordinates query (which uses raw SQL) is implemented
sed -n "22,35p" packages/trpc/src/routers/realtime.ts

Length of output: 545

🏁 Script executed:

#!/bin/bash
# Search for any clix subquery or CTE/WITH patterns
rg -n "WITH\s|CTE|subquery" packages/db/src/clickhouse/ --type=ts -B2 -A2 | head -40

Length of output: 2447

🏁 Script executed:

#!/bin/bash
# Check clix query builder for .from() signature - look at type definitions or examples
rg -n "from.*\(.*\)" packages/db/src/clickhouse/query-builder.ts -B1 -A3 | head -60

Length of output: 639

Found it! The clix builder supports Expression objects in .from(). Let me verify the complete solution:


---
🏁 Script executed:
```shell
#!/bin/bash
# Find clix.exp definition to confirm it returns Expression
rg -n "exp\s*\(" packages/db/src/clickhouse/ --type=ts -B1 -A3 | grep -B1 -A3 "export.*exp\|function exp"

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Check if paths/referrals/geo are meant to track only screen_view events
# Look for any documentation or related queries that might filter by event type
rg -n "screen_view|session_start" packages/trpc/src/routers/realtime.ts -B3 -A3

Length of output: 0

🏁 Script executed:

#!/bin/bash
# Verify the Expression type and clix.exp usage pattern
rg -n "class Expression|interface Expression|export.*clix" packages/db/src/clickhouse/client.ts -A5 | head -30

Length of output: 0

Fix window function aggregation error by precomputing duration in subqueries before GROUP BY.

The critical issue is confirmed: getDurationSql() returns a ClickHouse window function (dateDiff(...) OVER (PARTITION BY session_id ...)), which cannot be nested inside avg() with GROUP BY. This causes an ILLEGAL_AGGREGATION error at runtime.

The proposed subquery approach is technically sound and follows codebase patterns (clix with Expression objects, sqlstring.escape parameterization), but the exact syntax should be verified during implementation. The three locations requiring fixes are at lines 70, 98, and 128 in packages/trpc/src/routers/realtime.ts.

Regarding the optional screen_view filter: verify whether paths, referrals, and geo queries should be constrained to name = 'screen_view' like similar queries in overview.ts. If yes, add the filter inside the subqueries.

🤖 Prompt for AI Agents

In packages/trpc/src/routers/realtime.ts around lines 67-72 (and also update
analogous blocks at ~98 and ~128), the avg(round(... getDurationSql() ...)) call
is illegal because getDurationSql() returns a ClickHouse window function that
cannot be aggregated directly; rewrite each query to compute the windowed
duration in a subquery (select the window function as duration along with
origin/path/etc), then in the outer query perform COUNT(*) and
round(avg(duration)/1000,2) as avg_duration; keep using the codebase pattern of
clix Expression objects and sqlstring.escape for parameters when building the
subquery and outer query; also verify whether the paths/referrals/geo queries
should include the filter name = 'screen_view' like overview.ts and add that
filter inside the subquery if required.

Labels

None yet

1 participant

@lindesvard

Uh oh!

fix: simplify event buffer #218

Are you sure you want to change the base?

fix: simplify event buffer #218

Uh oh!

Conversation

@lindesvard lindesvard commented Oct 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

vercel bot commented Oct 22, 2025

Uh oh!

coderabbitai bot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

@coderabbitai coderabbitai bot Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

@lindesvard lindesvard commented Oct 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2025 •

edited

Loading