-
Notifications
You must be signed in to change notification settings - Fork 35
Description
P0 Critical: Memory Exhaustion in get_feed_data Causes Application Crashes
Summary
The get_feed_data method in Admin::SubmissionsController loads all forms, submissions, and questions into memory simultaneously, causing out-of-memory (OOM) crashes during feed exports.
Priority: P0 - Critical
Component: app/controllers/admin/submissions_controller.rb
Lines: 238-267
Affected Endpoints: /admin/submissions/feed, /admin/submissions/export_feed
Problem Description
When users or scheduled jobs trigger the feed export functionality, the application attempts to load the entire dataset into memory before processing. This causes:
- Application memory to spike to several GB
- OOM kills in production
- Sidekiq workers crashing during background exports
- Degraded performance for all users during export operations
Reproduction Steps
- Navigate to Admin > Submissions > Feed
- Set
days_limitto a large value (e.g., 30+ days) - Click Export
- Observe memory spike and potential timeout/crash
Root Cause Analysis
Current Implementation
# app/controllers/admin/submissions_controller.rb:238-267 def get_feed_data(days_limit) all_question_responses = [] Form.all.each do |form| # Problem 1: Loads ALL forms into memory submissions = form.submissions.ordered # Problem 2: N+1 query per form submissions = submissions.where('created_at >= ?', days_limit.days.ago) if days_limit.positive? submissions.each do |submission| # Problem 3: Loads ALL submissions per form form.ordered_questions.each do |question| # Problem 4: N+1 query per submission question_text = question.text.to_s answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s) @hash = { organization_id: form.organization_id, organization_name: form.organization.name, # Problem 5: N+1 for organization form_id: form.id, form_name: form.name, submission_id: submission.id, question_id: question.id, user_id: submission.user_id, question_text:, response_text: answer_text, question_with_response_text: "#{question_text}: #{answer_text}", created_at: submission.created_at, } all_question_responses << @hash # Problem 6: Unbounded array growth end end end all_question_responses # Problem 7: Returns massive array end
Memory Impact Calculation
| Metric | Typical Value | Memory Per Item | Total |
|---|---|---|---|
| Forms | 500 | ~2 KB | 1 MB |
| Submissions (30 days) | 50,000 | ~1 KB | 50 MB |
| Questions | 5,000 | ~0.5 KB | 2.5 MB |
| Result Hashes | 500 ×ばつ 50,000 ×ばつ 10 = 250,000,000 | ~0.5 KB | 125 GB |
Even with more conservative numbers (100 forms ×ばつ 1,000 submissions ×ばつ 10 questions), this creates 1,000,000 hash objects consuming hundreds of MB.
Issues Identified
Form.all.each- Loads entire forms table into memory- Triple-nested loops - O(forms ×ばつ submissions ×ばつ questions) complexity
- No batching - All records loaded before any processing
- N+1 queries - Missing eager loading for
organization,questions - Unbounded array -
all_question_responsesgrows without limit - Synchronous processing - Blocks request thread during entire operation
Proposed Solution
Option A: Batched Processing with find_each (Recommended)
# app/controllers/admin/submissions_controller.rb def get_feed_data(days_limit) Enumerator.new do |yielder| # Batch forms with eager loading Form.includes(:organization, :questions) .find_each(batch_size: 100) do |form| # Build submissions query with date filter submissions_scope = form.submissions submissions_scope = submissions_scope.where('created_at >= ?', days_limit.days.ago) if days_limit.positive? # Batch submissions submissions_scope.find_each(batch_size: 1000) do |submission| # Questions already eager loaded form.questions.each do |question| question_text = question.text.to_s answer_text = Logstop.scrub(submission.send(question.answer_field.to_sym).to_s) yielder << { organization_id: form.organization_id, organization_name: form.organization.name, form_id: form.id, form_name: form.name, submission_id: submission.id, question_id: question.id, user_id: submission.user_id, question_text: question_text, response_text: answer_text, question_with_response_text: "#{question_text}: #{answer_text}", created_at: submission.created_at, } end end end end end # Update export_feed to stream the response def export_feed @days_limit = (params[:days_limit].present? ? params[:days_limit].to_i : 1) respond_to do |format| format.csv do headers['Content-Type'] = 'text/csv; charset=utf-8' headers['Content-Disposition'] = "attachment; filename=touchpoints-feed-#{Date.today}.csv" headers['X-Accel-Buffering'] = 'no' # Disable nginx/proxy buffering headers['Cache-Control'] = 'no-cache' self.response_body = StreamingCsvExporter.new(get_feed_data(@days_limit)) end format.json do # For JSON, consider pagination or background job for large datasets render json: get_feed_data(@days_limit).take(10_000).to_a end end end
Supporting Class: StreamingCsvExporter
# app/services/streaming_csv_exporter.rb class StreamingCsvExporter HEADERS = %w[ organization_id organization_name form_id form_name submission_id question_id user_id question_text response_text question_with_response_text created_at ].freeze def initialize(enumerator) @enumerator = enumerator end def each yield CSV.generate_line(HEADERS) @enumerator.each do |row| yield CSV.generate_line(HEADERS.map { |h| row[h.to_sym] }) end end end
Option B: Background Job for Large Exports
For very large datasets, move to async processing:
# app/jobs/feed_export_job.rb class FeedExportJob < ApplicationJob queue_as :exports def perform(user_email, days_limit) file_path = Rails.root.join('tmp', "feed-export-#{SecureRandom.uuid}.csv") CSV.open(file_path, 'wb') do |csv| csv << StreamingCsvExporter::HEADERS Form.includes(:organization, :questions).find_each(batch_size: 100) do |form| # ... batched processing, write directly to file end end # Upload to S3 and email user url = S3Uploader.upload(file_path) UserMailer.export_ready(user_email, url).deliver_later ensure FileUtils.rm_f(file_path) end end
Expected Impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| Peak Memory | 2-4 GB | 50-100 MB | ~90% reduction |
| Memory Growth | Unbounded | Constant | Stable under load |
| N+1 Queries | O(forms ×ばつ submissions) | O(1) | 99% fewer queries |
| Request Timeout Risk | High | Low | Streaming prevents timeout |
| OOM Crash Risk | High | Minimal | Batching prevents spikes |
Testing Checklist
Unit Tests
-
get_feed_datareturns Enumerator (not Array) - Enumerator yields correct hash structure
- Empty forms/submissions handled gracefully
-
days_limit = 0returns all submissions -
days_limit > 0filters correctly
Integration Tests
- CSV export streams without loading all data
- Response headers set correctly for streaming
- Large dataset (10,000+ submissions) completes without OOM
- JSON endpoint respects pagination/limits
Performance Tests
- Memory usage stays below 200 MB during export
- Export of 50,000 submissions completes in < 60 seconds
- No N+1 queries in logs (check with Bullet gem)
- Database connection pool not exhausted
Manual QA
- CSV file downloads correctly in browser
- CSV file opens in Excel without corruption
- All expected columns present
- Data matches database records
- Special characters (UTF-8) handled correctly
Rollout Plan
- Phase 1: Deploy behind feature flag
- Phase 2: Enable for admin users only
- Phase 3: Monitor memory metrics for 48 hours
- Phase 4: Enable for all users
- Phase 5: Remove old implementation
Related Issues
- Update gems #2 - Stream CSV exports in
Form#to_csvmethods - Add Programs and Services for an Organization #3 - Add
.includes()to fix N+1 queries in forms index - Develop #4 - Batch bulk update operations in submissions controller
- Develop #5 - Cache question options in A11 export methods
References
- Rails
find_eachdocumentation - Streaming responses in Rails
- Memory profiling with
memory_profilergem
Labels
priority:p0 type:bug area:performance area:memory component:submissions
001-memory-exhaustion-get-feed-data.md
002-stream-csv-exports.md
003-fix-n-plus-one-queries.md
004-batch-bulk-updates.md
005-cache-question-options.md