Memory Exhaustion in `get_feed_data` Causes Application Crashes (Claude Code Review) #1941

New issue

Open

@SueValente

Description

@SueValente

SueValente

opened

on Jan 8, 2026

P0 Critical: Memory Exhaustion in `get_feed_data` Causes Application Crashes

Summary

The get_feed_data method in Admin::SubmissionsController loads all forms, submissions, and questions into memory simultaneously, causing out-of-memory (OOM) crashes during feed exports.

Priority: P0 - Critical
Component: app/controllers/admin/submissions_controller.rb
Lines: 238-267
Affected Endpoints: /admin/submissions/feed, /admin/submissions/export_feed

Problem Description

When users or scheduled jobs trigger the feed export functionality, the application attempts to load the entire dataset into memory before processing. This causes:

Application memory to spike to several GB
OOM kills in production
Sidekiq workers crashing during background exports
Degraded performance for all users during export operations

Reproduction Steps

Navigate to Admin > Submissions > Feed
Set days_limit to a large value (e.g., 30+ days)
Click Export
Observe memory spike and potential timeout/crash

Root Cause Analysis

Current Implementation

# app/controllers/admin/submissions_controller.rb:238-267
def get_feed_data(days_limit)
 all_question_responses = []
 Form.all.each do |form| # Problem 1: Loads ALL forms into memory
 submissions = form.submissions.ordered # Problem 2: N+1 query per form
 submissions = submissions.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
 submissions.each do |submission| # Problem 3: Loads ALL submissions per form
 form.ordered_questions.each do |question| # Problem 4: N+1 query per submission
 question_text = question.text.to_s
 answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
 @hash = {
 organization_id: form.organization_id,
 organization_name: form.organization.name, # Problem 5: N+1 for organization
 form_id: form.id,
 form_name: form.name,
 submission_id: submission.id,
 question_id: question.id,
 user_id: submission.user_id,
 question_text:,
 response_text: answer_text,
 question_with_response_text: "#{question_text}: #{answer_text}",
 created_at: submission.created_at,
 }
 all_question_responses << @hash # Problem 6: Unbounded array growth
 end
 end
 end
 all_question_responses # Problem 7: Returns massive array
end

Memory Impact Calculation

Metric	Typical Value	Memory Per Item	Total
Forms	500	~2 KB	1 MB
Submissions (30 days)	50,000	~1 KB	50 MB
Questions	5,000	~0.5 KB	2.5 MB
Result Hashes	500 ×ばつ 50,000 ×ばつ 10 = 250,000,000	~0.5 KB	125 GB

Even with more conservative numbers (100 forms ×ばつ 1,000 submissions ×ばつ 10 questions), this creates 1,000,000 hash objects consuming hundreds of MB.

Issues Identified

Form.all.each - Loads entire forms table into memory
Triple-nested loops - O(forms ×ばつ submissions ×ばつ questions) complexity
No batching - All records loaded before any processing
N+1 queries - Missing eager loading for organization, questions
Unbounded array - all_question_responses grows without limit
Synchronous processing - Blocks request thread during entire operation

Proposed Solution

Option A: Batched Processing with `find_each` (Recommended)

# app/controllers/admin/submissions_controller.rb
def get_feed_data(days_limit)
 Enumerator.new do |yielder|
 # Batch forms with eager loading
 Form.includes(:organization, :questions)
 .find_each(batch_size: 100) do |form|
 # Build submissions query with date filter
 submissions_scope = form.submissions
 submissions_scope = submissions_scope.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
 # Batch submissions
 submissions_scope.find_each(batch_size: 1000) do |submission|
 # Questions already eager loaded
 form.questions.each do |question|
 question_text = question.text.to_s
 answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
 yielder << {
 organization_id: form.organization_id,
 organization_name: form.organization.name,
 form_id: form.id,
 form_name: form.name,
 submission_id: submission.id,
 question_id: question.id,
 user_id: submission.user_id,
 question_text: question_text,
 response_text: answer_text,
 question_with_response_text: "#{question_text}: #{answer_text}",
 created_at: submission.created_at,
 }
 end
 end
 end
 end
end
# Update export_feed to stream the response
def export_feed
 @days_limit = (params[:days_limit].present? ? params[:days_limit].to_i : 1)
 respond_to do |format|
 format.csv do
 headers['Content-Type'] = 'text/csv; charset=utf-8'
 headers['Content-Disposition'] = "attachment; filename=touchpoints-feed-#{Date.today}.csv"
 headers['X-Accel-Buffering'] = 'no' # Disable nginx/proxy buffering
 headers['Cache-Control'] = 'no-cache'
 self.response_body = StreamingCsvExporter.new(get_feed_data(@days_limit))
 end
 format.json do
 # For JSON, consider pagination or background job for large datasets
 render json: get_feed_data(@days_limit).take(10_000).to_a
 end
 end
end

Supporting Class: StreamingCsvExporter

# app/services/streaming_csv_exporter.rb
class StreamingCsvExporter
 HEADERS = %w[
 organization_id organization_name form_id form_name submission_id
 question_id user_id question_text response_text
 question_with_response_text created_at
 ].freeze
 def initialize(enumerator)
 @enumerator = enumerator
 end
 def each
 yield CSV.generate_line(HEADERS)
 @enumerator.each do |row|
 yield CSV.generate_line(HEADERS.map { |h| row[h.to_sym] })
 end
 end
end

Option B: Background Job for Large Exports

For very large datasets, move to async processing:

# app/jobs/feed_export_job.rb
class FeedExportJob < ApplicationJob
 queue_as :exports
 def perform(user_email, days_limit)
 file_path = Rails.root.join('tmp', "feed-export-#{SecureRandom.uuid}.csv")
 CSV.open(file_path, 'wb') do |csv|
 csv << StreamingCsvExporter::HEADERS
 Form.includes(:organization, :questions).find_each(batch_size: 100) do |form|
 # ... batched processing, write directly to file
 end
 end
 # Upload to S3 and email user
 url = S3Uploader.upload(file_path)
 UserMailer.export_ready(user_email, url).deliver_later
 ensure
 FileUtils.rm_f(file_path)
 end
end

Expected Impact

Metric	Before	After	Improvement
Peak Memory	2-4 GB	50-100 MB	~90% reduction
Memory Growth	Unbounded	Constant	Stable under load
N+1 Queries	O(forms ×ばつ submissions)	O(1)	99% fewer queries
Request Timeout Risk	High	Low	Streaming prevents timeout
OOM Crash Risk	High	Minimal	Batching prevents spikes

Testing Checklist

Unit Tests

get_feed_data returns Enumerator (not Array)
Enumerator yields correct hash structure
Empty forms/submissions handled gracefully
days_limit = 0 returns all submissions
days_limit > 0 filters correctly

Integration Tests

CSV export streams without loading all data
Response headers set correctly for streaming
Large dataset (10,000+ submissions) completes without OOM
JSON endpoint respects pagination/limits

Performance Tests

Memory usage stays below 200 MB during export
Export of 50,000 submissions completes in < 60 seconds
No N+1 queries in logs (check with Bullet gem)
Database connection pool not exhausted

Manual QA

CSV file downloads correctly in browser
CSV file opens in Excel without corruption
All expected columns present
Data matches database records
Special characters (UTF-8) handled correctly

Rollout Plan

Phase 1: Deploy behind feature flag
Phase 2: Enable for admin users only
Phase 3: Monitor memory metrics for 48 hours
Phase 4: Enable for all users
Phase 5: Remove old implementation

Related Issues

Update gems #2 - Stream CSV exports in Form#to_csv methods
Add Programs and Services for an Organization #3 - Add .includes() to fix N+1 queries in forms index
Develop #4 - Batch bulk update operations in submissions controller
Develop #5 - Cache question options in A11 export methods

References

Labels

priority:p0 type:bug area:performance area:memory component:submissions

001-memory-exhaustion-get-feed-data.md
002-stream-csv-exports.md
003-fix-n-plus-one-queries.md
004-batch-bulk-updates.md
005-cache-question-options.md

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Exhaustion in `get_feed_data` Causes Application Crashes (Claude Code Review) #1941

Description

P0 Critical: Memory Exhaustion in `get_feed_data` Causes Application Crashes

Summary

Problem Description

Reproduction Steps

Root Cause Analysis

Current Implementation

Memory Impact Calculation

Issues Identified

Proposed Solution

Option A: Batched Processing with `find_each` (Recommended)

Supporting Class: StreamingCsvExporter

Option B: Background Job for Large Exports

Expected Impact

Testing Checklist

Unit Tests

Integration Tests

Performance Tests

Manual QA

Rollout Plan

Related Issues

References

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Memory Exhaustion in get_feed_data Causes Application Crashes (Claude Code Review) #1941

Description

P0 Critical: Memory Exhaustion in get_feed_data Causes Application Crashes

Summary

Problem Description

Reproduction Steps

Root Cause Analysis

Current Implementation

Memory Impact Calculation

Issues Identified

Proposed Solution

Option A: Batched Processing with find_each (Recommended)

Supporting Class: StreamingCsvExporter

Option B: Background Job for Large Exports

Expected Impact

Testing Checklist

Unit Tests

Integration Tests

Performance Tests

Manual QA

Rollout Plan

Related Issues

References

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Memory Exhaustion in `get_feed_data` Causes Application Crashes (Claude Code Review) #1941

P0 Critical: Memory Exhaustion in `get_feed_data` Causes Application Crashes

Option A: Batched Processing with `find_each` (Recommended)