How to create a 'tee' command in Ruby for a Unix like pipeline?

Question 1

I came across the question about creating memory-efficient Ruby pipe class with lazy evaluation. Some code is given to effectively create a pipeline of commands using lazy enumerators. I have been messing round with it and have implemented a tee command shown below. I feel like there is a better way to do it though.

class Pipe
 class Tee
 def initialize(pipe)
 @pipe = pipe
 @buffer = []
 @pipe.source = self.to_enum
 end
 def <<(item)
 @buffer << item
 # puts "Added '#{item}' to buffer: #{@buffer.inspect}"
 end
 def to_enum
 Enumerator.new do |yielder|
 item = @buffer.shift
 yielder << item if item
 end.lazy
 end
 end
 attr_writer :source
 def initialize
 @commands = []
 @source = nil
 end
 def add(command, opts = {})
 @commands << [command, opts]
 self
 end
 def run
 enum = @source
 @commands.each do |command, options|
 enum = method(command).call(enum, options)
 end
 enum.each {}
 enum
 end
 def cat(enum, options)
 Enumerator.new do |yielder|
 enum.map { |line| yielder << line } if enum
 options[:input].tap do |ios|
 ios.each { |line| yielder << line }
 end
 end.lazy
 end
 def out(enum, options)
 Enumerator.new do |yielder|
 enum.each do |line|
 puts line
 yielder << line
 end
 end.lazy
 end
 def cut(enum, options)
 Enumerator.new do |yielder|
 enum.each do |line|
 fields = line.chomp.split(%r{#{options[:delimiter]}})
 yielder << fields[options[:field]]
 end
 end.lazy
 end
 def upcase(enum, options)
 Enumerator.new do |yielder|
 enum.each do |line|
 yielder << line.upcase
 end
 end
 end
 def tee(enum, options)
 teed = Tee.new(options.fetch(:other))
 Enumerator.new do |yielder|
 enum.each do |line|
 yielder << line
 teed << line
 end
 end
 end
 def grep(enum, options)
 Enumerator.new do |yielder|
 enum.each do |line|
 yielder << line if line.match(options[:pattern])
 end
 end.lazy
 end
end
another_pipe = Pipe.new
another_pipe.add(:grep, pattern: /testing/i)
another_pipe.add(:out)
pipe = Pipe.new
pipe.add(:cat, :input => StringIO.new("testing\na new\nline"))
# pipe.add(:grep, pattern: /line/)
pipe.add(:tee, :other => another_pipe)
pipe.add(:upcase)
pipe.add(:out)
pipe.run
puts "================================================="
another_pipe.run

Here's the output:

TESTING
A NEW
LINE
=================================================
testing

Does anyone see a smarter way of doing this?

Question 2

Is there a reason you don't call Enumerator#lazy in the pipe or tee methods when you do in all of the others?

Question 3

@Jordan I think I left it out by mistake. They should probably both use Enumerator#lazy

Question 4

I know it's an old question, but it's an interesting one.

First, a couple small things:

In Pipe#run:
```
enum = method(command).call(enum, options)
```
Calling method has a lot of overhead, and send(command, enum, options) does the same thing.
Also in Pipe#run:
```
enum.each {}
```
The purpose of this is to cause the final object to be enumerated. We can do the same thing cheaper (even an empty block takes CPU cycles) with enum.to_a, which is also aliased as enum.force.

Those things aside, the first thing I noticed is that there's a lot of repetition. grep and upcase, for example, only differ on one line. We can extract the shared code out into another method, which I'll call enum_lazy:

def upcase(enum, options)
 enum_lazy(enum) do |line, yielder|
 yielder << line.upcase
 end
end
def grep(enum, options)
 enum_lazy(enum) do |line, yielder|
 yielder << line if line.match(options[:pattern])
 end
end
private
def enum_lazy(enum, &block)
 Enumerator.new do |yielder|
 enum.each do |line|
 yield(line, yielder)
 end
 end.lazy
end

...but this made me realize that some of these methods don't need the yielder pattern at all. For example, upcase takes one element at a time, performs an operation on it, and yields one value each time—it's just a lazy map, and this is equivalent:

def upcase(enum, options)
 enum.lazy.map(&:upcase)
end

Since we called lazy, this will return an Enumerator::Lazy as expected. We can simplify cut and out in the same way:

def out(enum, options)
 enum.lazy.map do |line|
 puts line
 line
 end
end
def cut(enum, options)
 enum.lazy.map do |line|
 fields = line.chomp.split(options[:delimiter])
 fields[ options[:field] ]
 end
end

grep is different, because we only don't want output for every input line. As it turns out, though, grep is just a lazy select:

def grep(enum, options)
 enum.lazy.select {|line| line =~ options[:pattern] }
end

That just leaves cat. I have a strong inkling that there's a similar optimization we could do for cat, but I haven't yet been able to actually figure it out. Perhaps someone else can help figure it out.

Here's the final code:

class Pipe
 class Tee
 # (no changes here)
 end
 attr_writer :source
 def initialize
 @commands = []
 @source = nil
 end
 def add(command, opts = {})
 @commands << [command, opts]
 self
 end
 def run
 enum = @source
 @commands.each do |command, options|
 enum = send(command, enum, options)
 end
 enum.force
 enum
 end
 def cat(enum, options)
 Enumerator.new do |yielder|
 enum.map { |line| yielder << line } if enum
 options[:input].tap do |ios|
 ios.each { |line| yielder << line }
 end
 end.lazy
 end
 def out(enum, options)
 enum.lazy.map do |line|
 puts line
 line
 end
 end
 def cut(enum, options)
 enum.lazy.map do |line|
 fields = line.chomp.split(/#{options[:delimiter]}/)
 fields[ options[:field] ]
 end
 end
 def upcase(enum, options)
 enum.lazy.map(&:upcase)
 end
 def tee(enum, options)
 teed = Tee.new(options.fetch(:other))
 enum_lazy(enum) do |line, yielder|
 yielder << line
 teed << line
 end
 end
 def grep(enum, options)
 enum.lazy.select {|line| line =~ options[:pattern] }
 end
 private
 def enum_lazy(enum, &block)
 Enumerator.new do |yielder|
 enum.each do |line|
 yield(line, yielder)
 end
 end.lazy
 end
end

Question 5

Since I just remembered that your real question was whether or not your Tee class could be improved... somewhat anti-climactically, I can't think of any obvious improvements. Good work. :)

Jordan Jordan 9305 silver badges8 bronze badges · Accepted Answer · 2014-10-15 22:02:30Z

I know it's an old question, but it's an interesting one.

First, a couple small things:

In Pipe#run:
```
enum = method(command).call(enum, options)
```
Calling method has a lot of overhead, and send(command, enum, options) does the same thing.
Also in Pipe#run:
```
enum.each {}
```
The purpose of this is to cause the final object to be enumerated. We can do the same thing cheaper (even an empty block takes CPU cycles) with enum.to_a, which is also aliased as enum.force.

Those things aside, the first thing I noticed is that there's a lot of repetition. grep and upcase, for example, only differ on one line. We can extract the shared code out into another method, which I'll call enum_lazy:

def upcase(enum, options)
 enum_lazy(enum) do |line, yielder|
 yielder << line.upcase
 end
end
def grep(enum, options)
 enum_lazy(enum) do |line, yielder|
 yielder << line if line.match(options[:pattern])
 end
end
private
def enum_lazy(enum, &block)
 Enumerator.new do |yielder|
 enum.each do |line|
 yield(line, yielder)
 end
 end.lazy
end

...but this made me realize that some of these methods don't need the yielder pattern at all. For example, upcase takes one element at a time, performs an operation on it, and yields one value each time—it's just a lazy map, and this is equivalent:

def upcase(enum, options)
 enum.lazy.map(&:upcase)
end

Since we called lazy, this will return an Enumerator::Lazy as expected. We can simplify cut and out in the same way:

def out(enum, options)
 enum.lazy.map do |line|
 puts line
 line
 end
end
def cut(enum, options)
 enum.lazy.map do |line|
 fields = line.chomp.split(options[:delimiter])
 fields[ options[:field] ]
 end
end

grep is different, because we only don't want output for every input line. As it turns out, though, grep is just a lazy select:

def grep(enum, options)
 enum.lazy.select {|line| line =~ options[:pattern] }
end

That just leaves cat. I have a strong inkling that there's a similar optimization we could do for cat, but I haven't yet been able to actually figure it out. Perhaps someone else can help figure it out.

Here's the final code:

class Pipe
 class Tee
 # (no changes here)
 end
 attr_writer :source
 def initialize
 @commands = []
 @source = nil
 end
 def add(command, opts = {})
 @commands << [command, opts]
 self
 end
 def run
 enum = @source
 @commands.each do |command, options|
 enum = send(command, enum, options)
 end
 enum.force
 enum
 end
 def cat(enum, options)
 Enumerator.new do |yielder|
 enum.map { |line| yielder << line } if enum
 options[:input].tap do |ios|
 ios.each { |line| yielder << line }
 end
 end.lazy
 end
 def out(enum, options)
 enum.lazy.map do |line|
 puts line
 line
 end
 end
 def cut(enum, options)
 enum.lazy.map do |line|
 fields = line.chomp.split(/#{options[:delimiter]}/)
 fields[ options[:field] ]
 end
 end
 def upcase(enum, options)
 enum.lazy.map(&:upcase)
 end
 def tee(enum, options)
 teed = Tee.new(options.fetch(:other))
 enum_lazy(enum) do |line, yielder|
 yielder << line
 teed << line
 end
 end
 def grep(enum, options)
 enum.lazy.select {|line| line =~ options[:pattern] }
 end
 private
 def enum_lazy(enum, &block)
 Enumerator.new do |yielder|
 enum.each do |line|
 yield(line, yielder)
 end
 end.lazy
 end
end

Since I just remembered that your real question was whether or not your Tee class could be improved... somewhat anti-climactically, I can't think of any obvious improvements. Good work. :)

Stack Exchange Network

How to create a 'tee' command in Ruby for a Unix like pipeline?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to create a 'tee' command in Ruby for a Unix like pipeline?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions