I came across the question about creating memory-efficient Ruby pipe class with lazy evaluation. Some code is given to effectively create a pipeline of commands using lazy enumerators. I have been messing round with it and have implemented a tee
command shown below. I feel like there is a better way to do it though.
class Pipe
class Tee
def initialize(pipe)
@pipe = pipe
@buffer = []
@pipe.source = self.to_enum
end
def <<(item)
@buffer << item
# puts "Added '#{item}' to buffer: #{@buffer.inspect}"
end
def to_enum
Enumerator.new do |yielder|
item = @buffer.shift
yielder << item if item
end.lazy
end
end
attr_writer :source
def initialize
@commands = []
@source = nil
end
def add(command, opts = {})
@commands << [command, opts]
self
end
def run
enum = @source
@commands.each do |command, options|
enum = method(command).call(enum, options)
end
enum.each {}
enum
end
def cat(enum, options)
Enumerator.new do |yielder|
enum.map { |line| yielder << line } if enum
options[:input].tap do |ios|
ios.each { |line| yielder << line }
end
end.lazy
end
def out(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
puts line
yielder << line
end
end.lazy
end
def cut(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
fields = line.chomp.split(%r{#{options[:delimiter]}})
yielder << fields[options[:field]]
end
end.lazy
end
def upcase(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
yielder << line.upcase
end
end
end
def tee(enum, options)
teed = Tee.new(options.fetch(:other))
Enumerator.new do |yielder|
enum.each do |line|
yielder << line
teed << line
end
end
end
def grep(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
yielder << line if line.match(options[:pattern])
end
end.lazy
end
end
another_pipe = Pipe.new
another_pipe.add(:grep, pattern: /testing/i)
another_pipe.add(:out)
pipe = Pipe.new
pipe.add(:cat, :input => StringIO.new("testing\na new\nline"))
# pipe.add(:grep, pattern: /line/)
pipe.add(:tee, :other => another_pipe)
pipe.add(:upcase)
pipe.add(:out)
pipe.run
puts "================================================="
another_pipe.run
Here's the output:
TESTING A NEW LINE ================================================= testing
Does anyone see a smarter way of doing this?
1 Answer 1
I know it's an old question, but it's an interesting one.
First, a couple small things:
In
Pipe#run
:enum = method(command).call(enum, options)
Calling
method
has a lot of overhead, andsend(command, enum, options)
does the same thing.Also in
Pipe#run
:enum.each {}
The purpose of this is to cause the final object to be enumerated. We can do the same thing cheaper (even an empty block takes CPU cycles) with
enum.to_a
, which is also aliased asenum.force
.
Those things aside, the first thing I noticed is that there's a lot of repetition. grep
and upcase
, for example, only differ on one line. We can extract the shared code out into another method, which I'll call enum_lazy
:
def upcase(enum, options)
enum_lazy(enum) do |line, yielder|
yielder << line.upcase
end
end
def grep(enum, options)
enum_lazy(enum) do |line, yielder|
yielder << line if line.match(options[:pattern])
end
end
private
def enum_lazy(enum, &block)
Enumerator.new do |yielder|
enum.each do |line|
yield(line, yielder)
end
end.lazy
end
...but this made me realize that some of these methods don't need the yielder
pattern at all. For example, upcase
takes one element at a time, performs an operation on it, and yields one value each time—it's just a lazy map, and this is equivalent:
def upcase(enum, options)
enum.lazy.map(&:upcase)
end
Since we called lazy
, this will return an Enumerator::Lazy as expected. We can simplify cut
and out
in the same way:
def out(enum, options)
enum.lazy.map do |line|
puts line
line
end
end
def cut(enum, options)
enum.lazy.map do |line|
fields = line.chomp.split(options[:delimiter])
fields[ options[:field] ]
end
end
grep
is different, because we only don't want output for every input line. As it turns out, though, grep
is just a lazy select
:
def grep(enum, options)
enum.lazy.select {|line| line =~ options[:pattern] }
end
That just leaves cat
. I have a strong inkling that there's a similar optimization we could do for cat
, but I haven't yet been able to actually figure it out. Perhaps someone else can help figure it out.
Here's the final code:
class Pipe
class Tee
# (no changes here)
end
attr_writer :source
def initialize
@commands = []
@source = nil
end
def add(command, opts = {})
@commands << [command, opts]
self
end
def run
enum = @source
@commands.each do |command, options|
enum = send(command, enum, options)
end
enum.force
enum
end
def cat(enum, options)
Enumerator.new do |yielder|
enum.map { |line| yielder << line } if enum
options[:input].tap do |ios|
ios.each { |line| yielder << line }
end
end.lazy
end
def out(enum, options)
enum.lazy.map do |line|
puts line
line
end
end
def cut(enum, options)
enum.lazy.map do |line|
fields = line.chomp.split(/#{options[:delimiter]}/)
fields[ options[:field] ]
end
end
def upcase(enum, options)
enum.lazy.map(&:upcase)
end
def tee(enum, options)
teed = Tee.new(options.fetch(:other))
enum_lazy(enum) do |line, yielder|
yielder << line
teed << line
end
end
def grep(enum, options)
enum.lazy.select {|line| line =~ options[:pattern] }
end
private
def enum_lazy(enum, &block)
Enumerator.new do |yielder|
enum.each do |line|
yield(line, yielder)
end
end.lazy
end
end
-
\$\begingroup\$ Since I just remembered that your real question was whether or not your Tee class could be improved... somewhat anti-climactically, I can't think of any obvious improvements. Good work. :) \$\endgroup\$Jordan– Jordan2014年10月15日 22:11:42 +00:00Commented Oct 15, 2014 at 22:11
Enumerator#lazy
in thepipe
ortee
methods when you do in all of the others? \$\endgroup\$