Node.js Performance Profiling: Finding the Bottleneck Before Your Users Do

DEV Community

npm install -g clinic 0x autocannon

Step 1 — clinic doctor (Diagnose First)

clinic doctor runs your app under load and produces a report that tells you which category of problem you have before you spend time on the wrong kind of profiling.

# Start your app under clinic doctor
clinic doctor -- node src/server.js
# In another terminal, apply load
autocannon -c 100 -d 30 http://localhost:3000/api/orders

After the load stops, clinic doctor opens a report in your browser showing:

Event loop delay — if high, you have synchronous blocking or very heavy async operations
CPU usage — if consistently at 100%, you have CPU-bound work
Memory — if climbing, you have a leak
Handles/requests — if handles grow without requests growing proportionally, something is not being cleaned up

The report recommends which clinic tool to use next. Follow it.

Step 2 — clinic flame (CPU Profiling)

When doctor indicates a CPU problem, clinic flame produces a flame graph — a visualization where the width of each bar represents how much CPU time that function consumed.

clinic flame -- node src/server.js
# Apply focused load on the slow endpoint
autocannon -c 50 -d 20 http://localhost:3000/api/search?q=laptop

Reading a flame graph:

The bottom of the graph is the call stack entry point
Each bar above represents a function called by the one below it
Width = time spent in that function (wider = more time)
The top of each stack is where execution was when the sample was taken
Look for wide bars near the top — these are the expensive functions

Common patterns in Node.js flame graphs:

Wide bar in JSON.parse or JSON.stringify — you are serializing large objects frequently. Consider streaming responses or reducing payload size.

Wide bar in a regex function — a regex is more expensive than expected, often because it is catastrophically backtracking. Test your regexes with rexploit or similar.

Wide bar in bcrypt or crypto — expected for hashing, but if it is in the hot path (every request, not just login), something is wrong.

Wide bar in your own business logic — investigate that function. Is it doing a computation that could be cached? Is it called more often than expected?

# Profile a specific script rather than a server
0x --open -- node src/scripts/generate-report.js

Step 3 — clinic bubbleprof (Async/I/O Profiling)

When doctor indicates I/O or event loop problems — not CPU — use bubbleprof. It shows where your code is waiting, not where it is running.

clinic bubbleprof -- node src/server.js
autocannon -c 50 -d 20 http://localhost:3000/api/orders

Bubbleprof shows a graph of async operations — database queries, HTTP calls, file I/O — and how long each one takes. Wide nodes are long waits.

What to look for:

Sequential awaits that could be parallel:

// SLOW — these run one after another
const user = await getUser(userId);
const orders = await getOrders(userId);
const profile = await getProfile(userId);
// Total time: getUser + getOrders + getProfile
// FAST — these run in parallel
const [user, orders, profile] = await Promise.all([
 getUser(userId),
 getOrders(userId),
 getProfile(userId),
]);
// Total time: max(getUser, getOrders, getProfile)

Missing connection pool configuration: If database operations show up as long waits, check your pool size. The default pg pool is 10 connections. Under 100 concurrent requests, requests queue waiting for a connection.

const pool = new Pool({
 connectionString: env.DATABASE_URL,
 max: 20, // Increase pool size
 idleTimeoutMillis: 30_000,
 connectionTimeoutMillis: 5_000, // Fail fast if no connection available
});

Step 4 — Event Loop Monitoring in Production

Flame graphs are taken in controlled environments. Production can behave differently. Add event loop lag monitoring to your metrics so you see regressions as they happen:

// src/lib/metrics.ts
import { monitorEventLoopDelay } from 'perf_hooks';
const histogram = monitorEventLoopDelay({ resolution: 20 });
histogram.enable();
// Gauge for Prometheus
const eventLoopLag = new client.Gauge({
 name: 'nodejs_event_loop_lag_p99_ms',
 help: 'Node.js event loop lag 99th percentile in milliseconds',
});
// Sample every 10 seconds
setInterval(() => {
 // histogram values are in nanoseconds
 const p99Ms = histogram.percentile(99) / 1_000_000;
 eventLoopLag.set(p99Ms);
 histogram.reset();
}, 10_000);

Event loop lag above 100ms is a warning. Above 500ms, users are noticeably affected. Above 1000ms, requests are timing out.

Step 5 — The Common Fixes

Synchronous Operations in the Hot Path

// BLOCKS the event loop — no other requests can be handled during this
const data = fs.readFileSync('/large/file.json');
const parsed = JSON.parse(data);
// Non-blocking — event loop stays free
const data = await fs.promises.readFile('/large/file.json', 'utf-8');
const parsed = JSON.parse(data);

Never use *Sync functions (readFileSync, execSync, writeFileSync) in request handlers. They block the entire Node.js event loop for their duration.

Expensive Computations

Move CPU-intensive work off the main thread:

import { Worker, isMainThread, parentPort, workerData } from 'worker_threads';
// In your route handler
function runInWorker(scriptPath: string, data: unknown): Promise {
 return new Promise((resolve, reject) => {
 const worker = new Worker(scriptPath, { workerData: data });
 worker.on('message', resolve);
 worker.on('error', reject);
 worker.on('exit', (code) => {
 if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
 });
 });
}
// For truly CPU-intensive work (report generation, image processing):
router.post('/reports/generate', authenticate, async (req, res) => {
 const report = await runInWorker('./workers/reportGenerator.js', {
 tenantId: req.tenant.id,
 params: req.body,
 });
 res.json(report);
});

Reducing Allocations in Hot Paths

Object and array allocations in tight loops create GC pressure. Reuse objects where possible:

// Creates a new object on every request — GC pressure at scale
app.use((req, res, next) => {
 req.context = {
 requestId: randomUUID(),
 startTime: Date.now(),
 user: null,
 };
 next();
});
// Acceptable — the allocation is necessary. But avoid allocating
// inside loops or functions called thousands of times per second.
// Profile first. Optimize only what the flame graph shows is hot.

Caching Repeated Computations

// Recomputed on every request — if this is slow, cache it
async function getMenuItems(tenantId: string) {
 return db.query('SELECT * FROM menu_items WHERE tenant_id = 1ドル', [tenantId]);
}
// Cache with LRU — computed once, served from memory
import { LRUCache } from 'lru-cache';
const menuCache = new LRUCache({
 max: 100,
 ttl: 5 * 60 * 1000, // 5 minutes
});
async function getMenuItems(tenantId: string) {
 const cached = menuCache.get(tenantId);
 if (cached) return cached;
 const result = await db.query(
 'SELECT * FROM menu_items WHERE tenant_id = 1ドル',
 [tenantId]
 );
 menuCache.set(tenantId, result.rows);
 return result.rows;
}

The Profiling Workflow in One Sequence

1. Detect: Grafana shows event loop lag or p99 latency spike
 ↓
2. Reproduce: Identify which endpoint or operation is slow
 ↓
3. Diagnose: clinic doctor → which category (CPU, I/O, memory, event loop)
 ↓
4. Profile:
 CPU issue → clinic flame or 0x
 I/O issue → clinic bubbleprof
 Memory → heap snapshots (see memory leaks guide)
 ↓
5. Identify: Find the wide bar in the flame graph or the long wait in bubbleprof
 ↓
6. Fix: Apply the appropriate pattern (parallel awaits, caching, worker thread, remove sync op)
 ↓
7. Verify: Run autocannon before and after. Compare p99 latency.
 ↓
8. Monitor: Confirm event loop lag drops in Grafana after deploy

Profiling is not something you do once. Set up the metrics, watch them in production, and run a profiling session when they degrade. The regression that would have taken days to diagnose by reading code takes 30 minutes when you can see exactly where time is being spent.

Originally published on ZyVOP