4
\$\begingroup\$

I wrote this together for a test I'm doing that parses the Heroku access log. What I'm mainly interested is if this code follow common best practices and if there are any improvements I can make in terms of memory efficiency.

Here is a line from the Heroku from their docs:

2010年09月16日T15:13:46.723498+00:00 heroku[router]: at=info method=GET path="/posts" host=myapp.herokuapp.com" fwd="204.204.204.204" dyno=web.1 connect=1ms service=18ms status=200 bytes=975

And an endpoint is a line that looks like this:

GET /api/users/{user_id}/count_pending_messages

class FileReader
 def initialize(filename, start_at = nil, end_at = nil, endpoints = nil)
 @filename = filename
 @start_at = start_at
 @end_at = end_at
 @endpoints = endpoints
 end
 def read
 File.foreach(@filename).map { |l|
 # saving some memory here by only loading data into memory that we need
 unless @start_at.nil? or @end_at.nil?
 start = l.index(@start_at)
 finish = l.index(@end_at) - 2
 l[start..finish]
 else
 l
 end
 }
 end
 def endpoints
 File.foreach(@endpoints).map { |line| line.split }
 end
end
require 'forwardable'
class AccessParser
 extend Forwardable
 def_delegators :@reader, :read, :endpoints
 REGEX_MATCH_DIGITS = /\d+/
 PLACEHOLDER = '{user_id}'
 Uris = Struct.new(:method, :uri, :stats)
 Stats = Struct.new(:dyno, :response_time)
 def initialize(reader)
 @reader = reader
 end
 def parse
 uris = endpoints_to_uris
 read.each.map { |line|
 data = line_to_hash line
 method = data['method']
 uri = data['path'].sub(REGEX_MATCH_DIGITS, PLACEHOLDER)
 uris.each.map { |e|
 if uri == e.uri.to_s and method == e.method
 (e.stats.dyno ||= []) << data['dyno']
 response_time = data['connect'].to_i + data['service'].to_i
 (e.stats.response_time ||= []) << response_time
 end
 }
 }
 uris
 end
 private
 def endpoints_to_uris
 # using structs because it's faster than hashes
 endpoints.each.map { |e| Uris.new(*e, Stats.new) }
 end
 def line_to_hash(line)
 # using split because it's faster than regex
 Hash[line.split.each.map { |a| a.split('=') }]
 end
end
module Enumerable
 def sum
 inject(0.0) { |result, el| result + el }
 end
 def mean
 sum / size
 end
 def median
 len = sort.length
 (sort[(len - 1) / 2] + sort[len / 2]) / 2.0
 end
 def mode
 counter = Hash.new(0)
 entries.each.map { |i| counter[i] += 1 }
 mode_array = []
 counter.each.map { |k, v| mode_array << k if v == counter.values.max }
 mode_array.sort.first
 end
end
class AccessConsoleWriter
 extend Forwardable
 def_delegators :@parser, :parse
 def initialize(parser)
 @parser = parser
 end
 def call
 parse.each do |d|
 puts "# #{d.method} #{d.uri}"
 puts "# " + "-" * (d.uri.length + d.method.length + 1)
 if d.stats.dyno.nil?
 puts "# Calls: 0"
 else
 puts "# Calls: #{d.stats.dyno.count}"
 puts "# Mean: #{d.stats.response_time.mean.round(2)}"
 puts "# Median: #{d.stats.response_time.median.round(2)}"
 puts "# Mode: #{d.stats.response_time.mode}"
 puts "# Dyno: #{d.stats.dyno.mode}"
 end
 puts ""
 end
 end
end
reader = FileReader.new('sample.log', 'method=', 'status=', 'endpoints.txt')
parser = AccessParser.new(reader)
AccessConsoleWriter.new(parser).call
asked Mar 20, 2016 at 7:27
\$\endgroup\$
2
  • \$\begingroup\$ Could you include a sample of what the log looks like? \$\endgroup\$ Commented Mar 20, 2016 at 9:21
  • \$\begingroup\$ @200_success: I've added two examples. \$\endgroup\$ Commented Mar 20, 2016 at 9:24

1 Answer 1

1
\$\begingroup\$

I don't think that reading in entire files into memory, File.foreach(@filename).map { }.each.map { }.each { }, is going to be memory efficient. Which appears to be what would happen with AccessConsoleWriter.new(parser).call.

What you probably want to do is operate on one line at a time. For instance, you might be able to rewrite FileParser.endpoints to look like:

def endpoints(&block)
 File.foreach(@endpoints) do |line|
 block.call(line)
 end
end

A few other notes:

  • I don't see a lot of state being stored, so instantiating classes seems unnecessary.
  • For multi-line blocks, use do ... end instead of { ... }. https://github.com/bbatsov/ruby-style-guide#syntax
  • each in each.map is redundant.
  • each.map maps the returns of a block to a new array, but you seem to be ignoring that returned array in some places. In which case, using just each would be fine.
  • #call is a little confusing, as that's a method on Proc and others.
  • "...premature optimization is the root of all evil (or at least most of it) in programming."
answered Aug 22, 2016 at 21:39
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.