Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Worker process memory leak: gradual heap growth leads to OOM crash after 3–12 hours #365

Open
Assignees
Labels
bugSomething isn't working
@dahlia

Description

Summary

The worker process (NODE_TYPE=worker) experiences a gradual memory leak that causes an OOM crash (exit code 134) after 3–12 hours of normal operation. This is not triggered by any specific request—it occurs during routine inbox/outbox message queue processing.

Environment

  • Hollo version: 0.8.0-dev.290
  • Runtime: Docker (linux/arm64)
  • Node.js options: --max-old-space-size=1536
  • Container memory limit: 2 GB
  • Replicas: 2 (both exhibit the same behavior)

Symptoms

The V8 heap grows steadily over time until it reaches the --max-old-space-size limit, at which point Mark-Compact GC fails to reclaim enough memory and the process is killed with SIGABRT:

<--- Last few GCs --->
[78:0xffff79760000] 29322263 ms: Mark-Compact 1493.0 (1555.0) -> 1477.9 (1552.1) MB, pooled: 2 MB, 2819.41 / 0.34 ms (average mu = 0.251, current mu = 0.186) task; scavenge might not succeed
[78:0xffff79760000] 29324463 ms: Mark-Compact 1491.8 (1552.9) -> 1478.1 (1552.9) MB, pooled: 1 MB, 1695.06 / 0.12 ms (average mu = 0.243, current mu = 0.229) task; scavenge might not succeed
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 ELIFECYCLE Command failed with exit code 134.

Observed crash times

Worker Uptime at crash Heap usage at crash
worker-2 ~11.9 hours 1481 MB / 1536 MB
worker-1 ~2.7 hours 1442 MB / 1536 MB
worker-1 ~7 hours 1307 MB / 1536 MB
worker-2 ~8.1 hours 1478 MB / 1536 MB

Notes

  • This is distinct from OOM when accessing my profile page #207 , which was a sudden OOM caused by the search API (fixed in v0.6.8). This issue is a gradual leak during normal worker operation.
  • Increasing --max-old-space-size only delays the inevitable crash; it does not prevent it.
  • As a workaround, we have configured the restart policy to always restart the worker on failure, so it recovers automatically within seconds after each OOM crash.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AltStyle によって変換されたページ (->オリジナル) /