Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Memory Exhaustion in get_feed_data Causes Application Crashes (Claude Code Review) #1941

Open
@SueValente

Description

P0 Critical: Memory Exhaustion in get_feed_data Causes Application Crashes

Summary

The get_feed_data method in Admin::SubmissionsController loads all forms, submissions, and questions into memory simultaneously, causing out-of-memory (OOM) crashes during feed exports.

Priority: P0 - Critical
Component: app/controllers/admin/submissions_controller.rb
Lines: 238-267
Affected Endpoints: /admin/submissions/feed, /admin/submissions/export_feed


Problem Description

When users or scheduled jobs trigger the feed export functionality, the application attempts to load the entire dataset into memory before processing. This causes:

  • Application memory to spike to several GB
  • OOM kills in production
  • Sidekiq workers crashing during background exports
  • Degraded performance for all users during export operations

Reproduction Steps

  1. Navigate to Admin > Submissions > Feed
  2. Set days_limit to a large value (e.g., 30+ days)
  3. Click Export
  4. Observe memory spike and potential timeout/crash

Root Cause Analysis

Current Implementation

# app/controllers/admin/submissions_controller.rb:238-267
def get_feed_data(days_limit)
 all_question_responses = []
 Form.all.each do |form| # Problem 1: Loads ALL forms into memory
 submissions = form.submissions.ordered # Problem 2: N+1 query per form
 submissions = submissions.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
 submissions.each do |submission| # Problem 3: Loads ALL submissions per form
 form.ordered_questions.each do |question| # Problem 4: N+1 query per submission
 question_text = question.text.to_s
 answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
 @hash = {
 organization_id: form.organization_id,
 organization_name: form.organization.name, # Problem 5: N+1 for organization
 form_id: form.id,
 form_name: form.name,
 submission_id: submission.id,
 question_id: question.id,
 user_id: submission.user_id,
 question_text:,
 response_text: answer_text,
 question_with_response_text: "#{question_text}: #{answer_text}",
 created_at: submission.created_at,
 }
 all_question_responses << @hash # Problem 6: Unbounded array growth
 end
 end
 end
 all_question_responses # Problem 7: Returns massive array
end

Memory Impact Calculation

Metric Typical Value Memory Per Item Total
Forms 500 ~2 KB 1 MB
Submissions (30 days) 50,000 ~1 KB 50 MB
Questions 5,000 ~0.5 KB 2.5 MB
Result Hashes 500 ×ばつ 50,000 ×ばつ 10 = 250,000,000 ~0.5 KB 125 GB

Even with more conservative numbers (100 forms ×ばつ 1,000 submissions ×ばつ 10 questions), this creates 1,000,000 hash objects consuming hundreds of MB.

Issues Identified

  1. Form.all.each - Loads entire forms table into memory
  2. Triple-nested loops - O(forms ×ばつ submissions ×ばつ questions) complexity
  3. No batching - All records loaded before any processing
  4. N+1 queries - Missing eager loading for organization, questions
  5. Unbounded array - all_question_responses grows without limit
  6. Synchronous processing - Blocks request thread during entire operation

Proposed Solution

Option A: Batched Processing with find_each (Recommended)

# app/controllers/admin/submissions_controller.rb
def get_feed_data(days_limit)
 Enumerator.new do |yielder|
 # Batch forms with eager loading
 Form.includes(:organization, :questions)
 .find_each(batch_size: 100) do |form|
 # Build submissions query with date filter
 submissions_scope = form.submissions
 submissions_scope = submissions_scope.where('created_at >= ?', days_limit.days.ago) if days_limit.positive?
 # Batch submissions
 submissions_scope.find_each(batch_size: 1000) do |submission|
 # Questions already eager loaded
 form.questions.each do |question|
 question_text = question.text.to_s
 answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s)
 yielder << {
 organization_id: form.organization_id,
 organization_name: form.organization.name,
 form_id: form.id,
 form_name: form.name,
 submission_id: submission.id,
 question_id: question.id,
 user_id: submission.user_id,
 question_text: question_text,
 response_text: answer_text,
 question_with_response_text: "#{question_text}: #{answer_text}",
 created_at: submission.created_at,
 }
 end
 end
 end
 end
end
# Update export_feed to stream the response
def export_feed
 @days_limit = (params[:days_limit].present? ? params[:days_limit].to_i : 1)
 respond_to do |format|
 format.csv do
 headers['Content-Type'] = 'text/csv; charset=utf-8'
 headers['Content-Disposition'] = "attachment; filename=touchpoints-feed-#{Date.today}.csv"
 headers['X-Accel-Buffering'] = 'no' # Disable nginx/proxy buffering
 headers['Cache-Control'] = 'no-cache'
 self.response_body = StreamingCsvExporter.new(get_feed_data(@days_limit))
 end
 format.json do
 # For JSON, consider pagination or background job for large datasets
 render json: get_feed_data(@days_limit).take(10_000).to_a
 end
 end
end

Supporting Class: StreamingCsvExporter

# app/services/streaming_csv_exporter.rb
class StreamingCsvExporter
 HEADERS = %w[
 organization_id organization_name form_id form_name submission_id
 question_id user_id question_text response_text
 question_with_response_text created_at
 ].freeze
 def initialize(enumerator)
 @enumerator = enumerator
 end
 def each
 yield CSV.generate_line(HEADERS)
 @enumerator.each do |row|
 yield CSV.generate_line(HEADERS.map { |h| row[h.to_sym] })
 end
 end
end

Option B: Background Job for Large Exports

For very large datasets, move to async processing:

# app/jobs/feed_export_job.rb
class FeedExportJob < ApplicationJob
 queue_as :exports
 def perform(user_email, days_limit)
 file_path = Rails.root.join('tmp', "feed-export-#{SecureRandom.uuid}.csv")
 CSV.open(file_path, 'wb') do |csv|
 csv << StreamingCsvExporter::HEADERS
 Form.includes(:organization, :questions).find_each(batch_size: 100) do |form|
 # ... batched processing, write directly to file
 end
 end
 # Upload to S3 and email user
 url = S3Uploader.upload(file_path)
 UserMailer.export_ready(user_email, url).deliver_later
 ensure
 FileUtils.rm_f(file_path)
 end
end

Expected Impact

Metric Before After Improvement
Peak Memory 2-4 GB 50-100 MB ~90% reduction
Memory Growth Unbounded Constant Stable under load
N+1 Queries O(forms ×ばつ submissions) O(1) 99% fewer queries
Request Timeout Risk High Low Streaming prevents timeout
OOM Crash Risk High Minimal Batching prevents spikes

Testing Checklist

Unit Tests

  • get_feed_data returns Enumerator (not Array)
  • Enumerator yields correct hash structure
  • Empty forms/submissions handled gracefully
  • days_limit = 0 returns all submissions
  • days_limit > 0 filters correctly

Integration Tests

  • CSV export streams without loading all data
  • Response headers set correctly for streaming
  • Large dataset (10,000+ submissions) completes without OOM
  • JSON endpoint respects pagination/limits

Performance Tests

  • Memory usage stays below 200 MB during export
  • Export of 50,000 submissions completes in < 60 seconds
  • No N+1 queries in logs (check with Bullet gem)
  • Database connection pool not exhausted

Manual QA

  • CSV file downloads correctly in browser
  • CSV file opens in Excel without corruption
  • All expected columns present
  • Data matches database records
  • Special characters (UTF-8) handled correctly

Rollout Plan

  1. Phase 1: Deploy behind feature flag
  2. Phase 2: Enable for admin users only
  3. Phase 3: Monitor memory metrics for 48 hours
  4. Phase 4: Enable for all users
  5. Phase 5: Remove old implementation

Related Issues


References


Labels

priority:p0 type:bug area:performance area:memory component:submissions

001-memory-exhaustion-get-feed-data.md
002-stream-csv-exports.md
003-fix-n-plus-one-queries.md
004-batch-bulk-updates.md
005-cache-question-options.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /